DocsCore ConceptsGuardrails

Guardrails

Guardrails define the rules and boundaries for your agent's behavior. They ensure your agent stays on-topic, safe, and aligned with your policies.

Built-in Guardrails

Every agent comes with default protections:

  • No harmful content — agents won't generate toxic, violent, or illegal content
  • No personal data leaking — agents don't share one user's data with another
  • Rate limiting — prevents abuse and controls costs

Custom Rules

Define rules in your config:

guardrails: {
  // Topics the agent should never discuss
  blockedTopics: ['competitors', 'internal-pricing', 'legal-advice'],

  // Trigger escalation to a human
  escalateOn: ['refund-request', 'complaint', 'urgent'],

  // Response constraints
  maxResponseLength: 2000,
  requireCitation: true,
  tone: 'professional',

  // Content filters
  piiDetection: true,
  profanityFilter: true,
}

Escalation Rules

When an agent encounters a topic it shouldn't handle, it can escalate:

escalation: {
  rules: [
    { trigger: 'refund', target: 'billing-team', channel: '#billing' },
    { trigger: 'complaint', target: 'manager', channel: '#escalations' },
  ],
  fallback: { target: 'human-queue', message: 'Connecting you with a team member...' },
}

Testing Guardrails

Use the testing CLI to verify your guardrails work correctly:

claw test guardrails --agent my-agent

This runs a suite of adversarial prompts to ensure your rules hold.