ChallengeRisk analysts were reviewing every flagged transaction from scratch — cross-referencing scattered policy PDFs, sanctions notes, and prior decisions by hand. Backlogs grew faster than headcount, and no one could later prove why a case was approved or declined.
ApproachWe built an agentic review workflow that drafts a recommendation for each flagged case — gathering the signals, retrieving the governing policy, and proposing approve, decline, or escalate with its reasoning — while a human analyst stays the decision-maker. A separate copilot answers policy questions grounded only in the firm's own documents. We worked evaluation-first: a graded eval set and guardrails came before the agent, not after.
What we built- An agentic case-review pipeline (Node/TypeScript) that gathers signals, retrieves the governing policy, and drafts an approve/decline/escalate recommendation with cited reasoning — every action logged for audit
- A RAG policy copilot over the firm's policy, sanctions, and onboarding documents (pgvector, hybrid retrieval) with a cite-or-abstain contract — it names the binding clause or says it doesn't know
- A human-in-the-loop desk where analysts accept, edit, or override each draft; overrides feed back as labelled examples, and the agent never auto-decides high-value or sanctions-adjacent cases
- A guardrail layer — prompt-injection screening on retrieved text, PII redaction, confidence thresholds that force escalation, and a rule that an uncited claim is treated as an abstention
- A graded eval harness (golden decision set + an LLM judge on citation faithfulness) wired into CI, so model, prompt, and retrieval changes are scored before they ship
OutcomeAnalysts moved from reviewing every case cold to confirming or correcting a cited draft — clearing the standing backlog and leaving each decision with a logged, source-linked rationale a reviewer or auditor can reconstruct. Policy answers now arrive with the binding clause attached, or with silence instead of a confident guess.
TypeScript / NodePython (evals)Postgres + pgvectorClaude (agent + RAG)LLM-as-judge CIPayments integration