Audit-grade compliance review ships under multi-layer guardrails
Defence-in-depth controls for regulated document review.
Regulated financial-services intermediary · India · 95 employees
What the team was actually solving
Manual compliance review of vendor and onboarding documents was the bottleneck for new-customer activation. Every traffic spike threatened SLA breach. Reviewer fatigue led to inconsistent flagging — some weeks too strict, some weeks too loose, with no defensible pattern.
Where the old process broke
- 1Customer activation average of 36+ hours, mostly waiting on compliance
- 2Reviewer fatigue producing inconsistent flag decisions
- 3No machine-readable audit trail — every decision lived in a reviewer's head
- 4Quarterly internal audits surfaced inconsistencies the team could not defend
The AI / technical solution we shipped
A single-agent system wrapped in four guardrail layers: an input filter that detects and redacts PII / strips prompt-injection patterns; a versioned policy registry the agent must cite by clause ID for every conclusion; output validators (schema + LLM-as-judge cross-check); and a human-in-the-loop gate on anything scored above a defined risk threshold. Every decision is appended to an immutable audit log.
Technology stack
Integration approach
Documents flow in from the customer's onboarding system. The agent's draft decision (with clause citations) appears in the existing compliance dashboard. Reviewers see the agent's reasoning and either approve, override, or escalate — all logged.
- Onboarding system webhook → agent service
- Compliance dashboard: existing UI, agent draft inline
- Policy registry: in-repo, versioned, clause-IDed
- Audit log: PostgreSQL append-only ledger, queryable by auditors
Security & scalability
PII minimisation
Sensitive identifiers are redacted before reaching the reasoning model wherever the workflow allows. The audit log stores redacted versions; full versions remain only in the customer's source system.
Prompt-injection defence
Inputs containing instruction-like patterns are detected and quarantined. The agent prompt explicitly tells the model not to act on instructions embedded in user content.
Versioned policy registry
Every conclusion cites a clause ID. Policy versions are explicit; historic decisions can be re-run against the current registry to detect drift.
Append-only audit log
Decisions are written to an immutable ledger. Replay tooling lets auditors reproduce any decision from logged inputs.
Multi-layer controls
- L1Audit logImmutable trace of every decision; replayable; cites policy clauses by ID.
- L2Human gateRisk-scored escalation above threshold; approval queue with sign-off.
- L3Output validatorsPydantic + regex + LLM-judge cross-check before any external action.
- L4Policy registryVersioned policy clauses; agent must cite a clause for any conclusion.
- L5Input filterPII redaction, prompt-injection blocking, jailbreak heuristics.
Delivery process
Threat model (1 wk)
Failure modes enumerated: PII leakage, prompt injection, policy mis-citation, output-schema bypass. Each mapped to the layer that catches it.
Policy registry (2 wks)
Existing policy library externalised as versioned, citable clauses. Format reviewed with compliance team.
Layered implementation (3 wks)
Input filter → policy-grounded prompt → output validator → human gate. Each layer tested adversarially before the next is wired in.
Adversarial eval (2 wks)
Jailbreak attempts, PII smuggling, policy-conflict cases. Eval scoring at every layer.
Audit sign-off (1 wk)
Threat model, policy registry, eval results, decision-trace samples reviewed with internal audit. Sign-off achieved.
Quarterly revalidation
Adversarial eval re-run quarterly; policy registry refreshed; model version log updated.
Observability
Every layer of the guardrail stack emits a structured event. Auditors and compliance reviewers see the full chain — input transformations, policy clauses cited, validator passes, human approver (if any), final ruling — in a single dashboard.
- Per-decision trace with all layer events
- Override + escalation rate as leading drift indicators
- Quarterly adversarial eval results published to the audit team
- Replay tooling: any historic decision can be re-run end-to-end
Before vs after
- Customer activation time36+ hrs
- Documents per reviewer per dayBaseline (1×)
- Audit findings on decisionsMultiple per quarter
- Decision traceabilityIn reviewer's head
- Customer activation time< 6 hrs
- Documents per reviewer per day3.2×
- Audit findings on decisions0 across 4 quarters
- Decision traceability100%, clause-cited
Automation impact
Reviewers no longer do first-pass classification — they review the agent's draft ruling with the cited clauses, confirm or override, and move on. The work that remains is the high-judgement work that genuinely needs human attention.
Business outcomes
Reviewer throughput tripled without adding headcount. Audit findings on AI-assisted decisions: zero across four quarterly reviews. The activation-time reduction also lifted top-of-funnel conversion measurably.
What we'd tell another team building this
- 01Layered guardrails work. A jailbreak that bypassed the input filter still had to pass the policy-citation requirement, the output validator, and the human gate above threshold. No single failure compromised the system.
- 02The policy registry was the most valuable artefact produced. Even outside the agent context, having policy clauses externalised and versioned changed how the team reviewed decisions.
- 03Adversarial eval cases produced the largest accuracy jumps. Happy-path evals gave false confidence; the cases that mattered were the malformed, ambiguous, and conflicting ones.
Future scalability
The guardrail pattern carries across regulated workflows. The same layered architecture now backs the customer's claims-triage pipeline and is being scoped for transaction-review.
- Claims-triage agent reusing input filter + policy registry
- Transaction-review agent in scoping
- Policy registry promoted to a shared org-wide compliance artefact
- Quarterly eval refresh as a permanent governance ritual
Have a regulated workflow you want to safely automate?
Most regulated AI projects fail at audit, not at build. A scoping session walks through the threat model, policy-registry shape, and approval workflow your auditors will ask for.
Read what we publish on this
Eval datasets: stop testing your agents on the happy path
If your eval set is the demos you showed the client, you are testing the wrong thing. How we build evals from production failures and the minimum viable suite to ship.
Read the post ProductionThe agent observability stack we ship to every client
Traces, spans, evals, cost-per-completed-task, and the one dashboard panel that catches 80% of regressions. Vendor-agnostic — covers Langfuse, Honeycomb, and rolling your own.
Read the post Tool DesignTool descriptions are prompts. Fix the registry, not the agent.
When an agent picks the wrong tool, the registry is broken — not the agent. Three rules I now apply before debugging anything in a multi-tool system: precise names, "when to use" triggers, and a curated load list. Anthropic's new tool-selection telemetry finally puts numbers on what changes accuracy.
Read the postSolutions & topics worth reading next
Agentic AI Consulting
Designed, built, and handed off — production agentic systems for enterprise teams.
AI Guardrails
Multi-layer safety, policy, and audit controls for agents in regulated environments.
Enterprise AI Architecture
Reference architectures for organisations standing up an AI platform — not one agent, but the foundation for many.
AI Observability
Tracing, eval, cache-hit telemetry, and cost attribution for production agents.
Agentic AI
Designing, building, and shipping production agents.
AI Observability
Tracing, eval, and telemetry for production agents.
AI Engineering
The discipline of shipping AI systems, not demos.
Enterprise AI Automation
Operational agents for IT services and enterprise teams.