AI Architecture Memo
Date: 2026-05-14
Scope: The AI stack behind Rōvn, executor model, advisor model, orchestration, document extraction, source verification orchestration, AI Trust Layer.
Posture: Pre-launch · live executor + advisor rails · AI chain: AWS BedrockAI provider chain07.3 AI Architecture · AWS Bedrock under BAA → Anthropic Claude Haiku 4.5 under BAA → Rōvn ECS under BAA → Anthropic Claude (Haiku 4.5)Model identity07.3 AI Architecture · Haiku 4.5 chosen for cost + latency + BAA chain under BAA → Rōvn backend on ECS · ZDR-eligible · ai_runs ledger PARTIAL.
1. Stack summary
| Role | Model | Path | BAA | ZDR | Notes |
|---|---|---|---|---|---|
| Executor | Claude Haiku 4.5 | AWS BedrockAI provider chain07.3 AI Architecture · AWS Bedrock under BAA → Anthropic Claude Haiku 4.5 under BAA → Rōvn ECS (HIPAA-eligible) | YES (AWS BedrockAI provider chain07.3 AI Architecture · AWS Bedrock under BAA → Anthropic Claude Haiku 4.5 under BAA → Rōvn ECS BAA + Anthropic BAA) | YES | OCR + extraction + summary + crosswalk |
| Advisor | Claude Opus 4.7 | Anthropic beta advisor-tool-2026-03-01 |
YES (Anthropic BAA) | YES | Hard-case escalation, no UX round-trip |
| Higher-tier executor | Higher-tier Claude model via Bedrock | AWS BedrockAI provider chain07.3 AI Architecture · AWS Bedrock under BAA → Anthropic Claude Haiku 4.5 under BAA → Rōvn ECS (HIPAA-eligible) | YES | YES | Used for nuanced summarization and crosswalk |
AI chain: AWS BedrockAI provider chain07.3 AI Architecture · AWS Bedrock under BAA → Anthropic Claude Haiku 4.5 under BAA → Rōvn ECS under BAA → Anthropic Claude (Haiku 4.5)Model identity07.3 AI Architecture · Haiku 4.5 chosen for cost + latency + BAA chain under BAA → Rōvn backend on ECS. Both BAAs are signed and on file: - AWS BAA (covers Bedrock and every other AWS service) - Anthropic BAA (covers the Claude model provider relationship plus the advisor beta tool surface)
We do not use OpenAI, Google Vertex Gemini, or any other model vendor in any PHI path. The reason is in ADR_INDEX.md ADR-007.
2. Executor: the day-to-day worker
Anthropic Claude Haiku 4.5 (primary) and a higher-tier Claude model (nuanced summarization and crosswalk), both invoked via AWS BedrockAI provider chain07.3 AI Architecture · AWS Bedrock under BAA → Anthropic Claude Haiku 4.5 under BAA → Rōvn ECS under BAA.
- AWS BedrockAI provider chain07.3 AI Architecture · AWS Bedrock under BAA → Anthropic Claude Haiku 4.5 under BAA → Rōvn ECS invocation under HIPAA-eligible posture; streaming responses.
- Function-call / tool-use enabled for structured outputs.
- Token cost, latency, model name, prompt hash, and output hash are written to the
ai_runsledger on every call (migration029_ai_trust_layer.sql). PARTIAL coverage across all surfaces as of 2026-05-14; the gap is being closed.
Routing rule (executor side):
Haiku 4.5 → extraction, classification, simple crosswalk (cheap path)
Higher-tier Claude → multi-source reconciliation, structured drafts (deeper path)
Opus 4.7 → escalation only (via advisor pattern, not direct executor calls)
Cost controls:
- Per-customer rate limits enforced at the AI gateway (
app/services/ai_gateway.py). - Per-route token caps. A single document-extraction call caps at 8K context.
- Daily cost budget per customer with alerting at 70% / 90% / 100%.
3. Advisor: Opus 4.7 via beta tool
The advisor is a separate, opt-in capability surfaced by Anthropic's beta advisor-tool-2026-03-01 header. It is enabled on the current production ECS task definition (rovn-passport-api:288).
How it differs from a normal "use Opus" call:
- Executor (Haiku/Sonnet) decides during a tool-use loop whether to escalate.
- Escalation is in-conversation: no UX round-trip, no separate API call from the user's session.
- Every escalation writes an
advisor_callrow toai_runswith token cost broken out from the executor call. - ZDR-eligible: Anthropic retains no data per the BAA + ZDR posture.
When advisor fires: - Source-document extraction confidence below threshold for any high-stakes field (license number, NPI, DEA). - Cross-source disagreement between two source receipts (e.g., state BON says active, OIG says excluded). - Privileging packet reasoning where the executor cannot resolve a conflict in OPPE/FPPE data.
Advisor never writes a decision row. Advisor recommendations are advisory only, the human committee member is the actor on every credentialing, privileging, hiring, and clinical decision (see AI_GOVERNANCE_ENGINE.md).
4. Orchestration: in-process, not a framework
We deliberately do not use a heavyweight agent framework. The orchestration layer is a small Python module in the FastAPI app:
┌─────────────────────────────┐
│ FastAPI route handler │
└──────────────┬──────────────┘
│
┌──────────────▼──────────────┐
│ ai_gateway.py │
│ - model routing │
│ - rate limiting │
│ - idempotency key │
│ - PHI scrubber gate │
│ - ai_runs write │
└──────────────┬──────────────┘
│
┌───────────────────────┼────────────────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌───────▼──────┐
│ AWS Bedrock │ │ Anthropic │ │ AWS Bedrock │
│ Claude exec │ │ Claude │ │ higher-tier │
│ (Haiku 4.5) │ │ Opus 4.7 │ │ Claude │
│ under BAA │ │ advisor │ │ under BAA │
└─────────────┘ └─────────────┘ └──────────────┘
- Idempotency keys are required on every AI run. Replay is deterministic given (model, prompt hash, input hash).
- PHI scrubber gate runs before the outbound call. Sensitive identifiers that don't need to leave our perimeter are tokenized; the model sees a stable token, the app rehydrates on return.
- Structured outputs only: we never
evalmodel output. All outputs are validated against a Pydantic schema before any database write.
Reason for in-process orchestration: every AI run must be source-receipted under BAA. A heavyweight framework adds an extra trust boundary we don't need, and our model count is two, not twenty.
5. Document extraction pipeline
Worker uploads document (license, ID, certification)
│
▼
POST /documents/upload → S3 PHI bucket (KMS-encrypted)
│
▼
AWS Textract OCR (BAA-covered, in us-east-2)
│
▼
Claude executor, structured extraction
(Pydantic schema: license_number, name, state,
expiration_date, license_type, issuing_authority)
│
▼
Schema validation + confidence score
│
├──── confidence ≥ threshold ──► auto-queue verification
│
└──── confidence < threshold ──► human review queue
(reviewer corrects,
then verification fires)
Critical rule: AI extraction never marks a tile source-verified. Extraction sets tier = "processed" (Tier 3). Only a successful source-adapter call promotes to tier = "source-verified" (Tier 4). The promotion is enforced at the database layer via a CHECK constraint on credential_source_receipts (migration 032_source_receipts_and_authority_policies.sql, hardened in 069_verification_pipeline_safety_hardening.sql).
6. Source-verification orchestration
The source-authority rail (source authority rails plus the 43-role, 51-jurisdiction coverage map, see SOURCE_AUTHORITY_RAIL.md) is orchestrated, not AI-driven:
Verification request (worker_id, credential_id)
│
▼
Per-source adapter dispatcher
- resolves which authority is canonical for this credential
- checks cache window (per-source TTL)
- if cached + within TTL → cached-replay path
- else → fresh-fetch path
│
▼
Fresh fetch:
- adapter calls source (Nursys, NPDB, OIG, state BON, etc.)
- receipt artifact persisted to S3 source-receipts bucket
- source_receipt row written
- hash chained into audit log
│
▼
AI is invoked only for crosswalk reconciliation
(e.g., name normalization across multiple sources,
or summarization of a state BON HTML page for the receipt UI)
│
▼
Tile promoted to Tier 4 (source-verified) with source URL + timestamp + hash
AI does not decide whether a verification "passes." That decision is a deterministic function of the source-receipt's status field and the source-authority policy's pass_criteria. AI assists in turning vendor responses into structured, human-readable receipts; it does not adjudicate.
7. Truth ladder enforcement (code-level)
The five-tier truth ladder is enforced at the data model, not in policy docs:
| Tier | Label | Setter | Database constraint |
|---|---|---|---|
| 1 | imported | system | field ingested from upload/feed; no worker assertion, no source_receipt FK |
| 2 | attested | worker | worker-affirmed; no source_receipt FK |
| 3 | processed | AI | AI-extracted; extraction confidence stored; tier capped at 3 if no source_receipt FK |
| 4 | source-verified | system | tier 4 only if source_receipt status='match' AND within TTL |
| 5 | approved | system | committee decision row present; references the source-verified tier |
A source-receipt status='conflict' blocks promotion to tier 4 and holds the field at tier 3 pending human resolution. CHECK constraints on worker_trust_records and credential_source_receipts prevent any code path from writing tier 4 (source-verified) without a valid source receipt row. AI cannot set source-verified=true because the column is not writable from the application path, it is derived.
8. ai_runs ledger
Every AI call is logged. Schema highlights:
run_id(UUID)model(claude-3-5-haiku,claude-3-5-sonnet,claude-opus-4-7-advisor, etc.)purpose(document_extraction,crosswalk,summary,advisor_escalation)prompt_hash(SHA-256 of canonicalized prompt)output_hash(SHA-256 of structured output)input_tokens,output_tokens,cost_usdlatency_msactor_user_id(the human in whose session the run occurred)tenant_id(customer scope)idempotency_keyadvisor_calls(count + token rollup, if any)
Status: ledger schema is LIVE (migration 029); call-site coverage is PARTIAL across all 80+ routers. Closing the gap is in the post-close engineering roadmap.
9. Prompt injection defense
- All free-text user input that enters a prompt is wrapped in
<user_input>XML tags and the model is instructed (in system prompt) to treat them as untrusted strings. - Tool-use surface area is narrow: each AI call defines a fixed list of allowed tool calls; the model cannot invent tool names.
- No eval on AI outputs. Structured outputs are parsed into Pydantic, and only validated fields are read.
- We do not let the model issue arbitrary SQL, shell commands, or HTTP requests. Every external action is a typed tool with a server-side allow-list.
10. Failure modes
| Failure | Detection | Response |
|---|---|---|
| Low extraction confidence | confidence < threshold | Queue to human review (clinician_screens migration 066) |
| Model API error (5xx, rate limit) | exception in ai_gateway.py |
Exponential backoff, then queue with status pending_retry |
| Context-window overflow | token count check pre-call | Chunk + sliding window with overlap; reconcile in post-step |
| Cross-source disagreement | adapter status='conflict' | Auto-escalate to advisor; if unresolved, route to human |
| Bedrock invocation error (5xx, throttle) | exception in ai_gateway.py |
Exponential backoff + retry on different model ID; if persistent, queue with status pending_retry |
| Cost budget breach | per-tenant daily counter exceeded | Hard stop AI workloads for that tenant; alert ops |
11. What this memo does not claim
- We do not claim "Rōvn surfaces evidence; humans decide who gets credentialed." It does not. Humans are the actors on every credentialing, privileging, hiring, and clinical decision.
- We do not claim full
ai_runscoverage across every router, coverage is PARTIAL, closing post-close. - We do not claim a non-Bedrock fallback path, Bedrock under BAA is the production AI chain; there is no direct-Anthropic-API path in production.
- We do not claim Opus as an executor, Opus runs only inside the advisor tool surface.
End of memo.