Guardrails
Guardrails define the rules and boundaries for your agent's behavior. They ensure your agent stays on-topic, safe, and aligned with your policies.
Built-in Guardrails
Every agent comes with default protections:
- No harmful content — agents won't generate toxic, violent, or illegal content
- No personal data leaking — agents don't share one user's data with another
- Rate limiting — prevents abuse and controls costs
Custom Rules
Define rules in your config:
guardrails: {
// Topics the agent should never discuss
blockedTopics: ['competitors', 'internal-pricing', 'legal-advice'],
// Trigger escalation to a human
escalateOn: ['refund-request', 'complaint', 'urgent'],
// Response constraints
maxResponseLength: 2000,
requireCitation: true,
tone: 'professional',
// Content filters
piiDetection: true,
profanityFilter: true,
}Escalation Rules
When an agent encounters a topic it shouldn't handle, it can escalate:
escalation: {
rules: [
{ trigger: 'refund', target: 'billing-team', channel: '#billing' },
{ trigger: 'complaint', target: 'manager', channel: '#escalations' },
],
fallback: { target: 'human-queue', message: 'Connecting you with a team member...' },
}Testing Guardrails
Use the testing CLI to verify your guardrails work correctly:
claw test guardrails --agent my-agentThis runs a suite of adversarial prompts to ensure your rules hold.