Technical & Architecture

AI Architecture

AI Diligence Console

AI Architecture Memo

Date: 2026-05-14 Scope: The AI stack behind Rōvn, executor model, advisor model, orchestration, document extraction, source verification orchestration, AI Trust Layer. Posture: Pre-launch · live executor + advisor rails · AI chain: AWS Bedrock under BAA → Anthropic Claude (Haiku 4.5) under BAA → Rōvn backend on ECS · ZDR-eligible · ai_runs ledger PARTIAL.

1. Stack summary

Role	Model	Path	BAA	ZDR	Notes
Executor	Claude Haiku 4.5	AWS Bedrock (HIPAA-eligible)	YES (AWS Bedrock BAA + Anthropic BAA)	YES	OCR + extraction + summary + crosswalk
Advisor	Claude Opus 4.7	Anthropic beta `advisor-tool-2026-03-01`	YES (Anthropic BAA)	YES	Hard-case escalation, no UX round-trip
Higher-tier executor	Higher-tier Claude model via Bedrock	AWS Bedrock (HIPAA-eligible)	YES	YES	Used for nuanced summarization and crosswalk

AI chain: AWS Bedrock under BAA → Anthropic Claude (Haiku 4.5) under BAA → Rōvn backend on ECS. Both BAAs are signed and on file: - AWS BAA (covers Bedrock and every other AWS service) - Anthropic BAA (covers the Claude model provider relationship plus the advisor beta tool surface)

We do not use OpenAI, Google Vertex Gemini, or any other model vendor in any PHI path. The reason is in ADR_INDEX.md ADR-007.

2. Executor: the day-to-day worker

Anthropic Claude Haiku 4.5 (primary) and a higher-tier Claude model (nuanced summarization and crosswalk), both invoked via AWS Bedrock under BAA.

AWS Bedrock invocation under HIPAA-eligible posture; streaming responses.
Function-call / tool-use enabled for structured outputs.
Token cost, latency, model name, prompt hash, and output hash are written to the ai_runs ledger on every call (migration 029_ai_trust_layer.sql). PARTIAL coverage across all surfaces as of 2026-05-14; the gap is being closed.

Routing rule (executor side):

Haiku 4.5         →  extraction, classification, simple crosswalk    (cheap path)
Higher-tier Claude →  multi-source reconciliation, structured drafts (deeper path)
Opus 4.7          →  escalation only (via advisor pattern, not direct executor calls)

Cost controls:

Per-customer rate limits enforced at the AI gateway (app/services/ai_gateway.py).
Per-route token caps. A single document-extraction call caps at 8K context.
Daily cost budget per customer with alerting at 70% / 90% / 100%.

3. Advisor: Opus 4.7 via beta tool

The advisor is a separate, opt-in capability surfaced by Anthropic's beta advisor-tool-2026-03-01 header. It is enabled on the current production ECS task definition (rovn-passport-api:288).

How it differs from a normal "use Opus" call:

Executor (Haiku/Sonnet) decides during a tool-use loop whether to escalate.
Escalation is in-conversation: no UX round-trip, no separate API call from the user's session.
Every escalation writes an advisor_call row to ai_runs with token cost broken out from the executor call.
ZDR-eligible: Anthropic retains no data per the BAA + ZDR posture.

When advisor fires: - Source-document extraction confidence below threshold for any high-stakes field (license number, NPI, DEA). - Cross-source disagreement between two source receipts (e.g., state BON says active, OIG says excluded). - Privileging packet reasoning where the executor cannot resolve a conflict in OPPE/FPPE data.

Advisor never writes a decision row. Advisor recommendations are advisory only, the human committee member is the actor on every credentialing, privileging, hiring, and clinical decision (see AI_GOVERNANCE_ENGINE.md).

4. Orchestration: in-process, not a framework

We deliberately do not use a heavyweight agent framework. The orchestration layer is a small Python module in the FastAPI app:

                ┌─────────────────────────────┐
                │   FastAPI route handler     │
                └──────────────┬──────────────┘
                               │
                ┌──────────────▼──────────────┐
                │   ai_gateway.py             │
                │   - model routing           │
                │   - rate limiting           │
                │   - idempotency key         │
                │   - PHI scrubber gate       │
                │   - ai_runs write           │
                └──────────────┬──────────────┘
                               │
       ┌───────────────────────┼────────────────────────┐
       │                       │                        │
┌──────▼──────┐         ┌──────▼──────┐         ┌───────▼──────┐
│ AWS Bedrock │         │ Anthropic   │         │ AWS Bedrock  │
│ Claude exec │         │ Claude      │         │ higher-tier  │
│ (Haiku 4.5) │         │ Opus 4.7    │         │ Claude       │
│ under BAA   │         │ advisor     │         │ under BAA    │
└─────────────┘         └─────────────┘         └──────────────┘

Idempotency keys are required on every AI run. Replay is deterministic given (model, prompt hash, input hash).
PHI scrubber gate runs before the outbound call. Sensitive identifiers that don't need to leave our perimeter are tokenized; the model sees a stable token, the app rehydrates on return.
Structured outputs only: we never eval model output. All outputs are validated against a Pydantic schema before any database write.

Reason for in-process orchestration: every AI run must be source-receipted under BAA. A heavyweight framework adds an extra trust boundary we don't need, and our model count is two, not twenty.

5. Document extraction pipeline

Worker uploads document (license, ID, certification)
            │
            ▼
   POST /documents/upload → S3 PHI bucket (KMS-encrypted)
            │
            ▼
   AWS Textract OCR (BAA-covered, in us-east-2)
            │
            ▼
   Claude executor, structured extraction
   (Pydantic schema: license_number, name, state,
    expiration_date, license_type, issuing_authority)
            │
            ▼
   Schema validation + confidence score
            │
            ├──── confidence ≥ threshold ──► auto-queue verification
            │
            └──── confidence < threshold ──► human review queue
                                              (reviewer corrects,
                                               then verification fires)

Critical rule: AI extraction never marks a tile source-verified. Extraction sets tier = "processed" (Tier 3). Only a successful source-adapter call promotes to tier = "source-verified" (Tier 4). The promotion is enforced at the database layer via a CHECK constraint on credential_source_receipts (migration 032_source_receipts_and_authority_policies.sql, hardened in 069_verification_pipeline_safety_hardening.sql).

6. Source-verification orchestration

The source-authority rail (source authority rails plus the 43-role, 51-jurisdiction coverage map, see SOURCE_AUTHORITY_RAIL.md) is orchestrated, not AI-driven:

Verification request (worker_id, credential_id)
            │
            ▼
   Per-source adapter dispatcher
   - resolves which authority is canonical for this credential
   - checks cache window (per-source TTL)
   - if cached + within TTL → cached-replay path
   - else → fresh-fetch path
            │
            ▼
   Fresh fetch:
   - adapter calls source (Nursys, NPDB, OIG, state BON, etc.)
   - receipt artifact persisted to S3 source-receipts bucket
   - source_receipt row written
   - hash chained into audit log
            │
            ▼
   AI is invoked only for crosswalk reconciliation
   (e.g., name normalization across multiple sources,
    or summarization of a state BON HTML page for the receipt UI)
            │
            ▼
   Tile promoted to Tier 4 (source-verified) with source URL + timestamp + hash

AI does not decide whether a verification "passes." That decision is a deterministic function of the source-receipt's status field and the source-authority policy's pass_criteria. AI assists in turning vendor responses into structured, human-readable receipts; it does not adjudicate.

7. Truth ladder enforcement (code-level)

The five-tier truth ladder is enforced at the data model, not in policy docs:

Tier	Label	Setter	Database constraint
1	imported	system	field ingested from upload/feed; no worker assertion, no source_receipt FK
2	attested	worker	worker-affirmed; no source_receipt FK
3	processed	AI	AI-extracted; extraction confidence stored; tier capped at 3 if no source_receipt FK
4	source-verified	system	tier 4 only if source_receipt status='match' AND within TTL
5	approved	system	committee decision row present; references the source-verified tier

A source-receipt status='conflict' blocks promotion to tier 4 and holds the field at tier 3 pending human resolution. CHECK constraints on worker_trust_records and credential_source_receipts prevent any code path from writing tier 4 (source-verified) without a valid source receipt row. AI cannot set source-verified=true because the column is not writable from the application path, it is derived.

8. `ai_runs` ledger

Every AI call is logged. Schema highlights:

run_id (UUID)
model (claude-3-5-haiku, claude-3-5-sonnet, claude-opus-4-7-advisor, etc.)
purpose (document_extraction, crosswalk, summary, advisor_escalation)
prompt_hash (SHA-256 of canonicalized prompt)
output_hash (SHA-256 of structured output)
input_tokens, output_tokens, cost_usd
latency_ms
actor_user_id (the human in whose session the run occurred)
tenant_id (customer scope)
idempotency_key
advisor_calls (count + token rollup, if any)

Status: ledger schema is LIVE (migration 029); call-site coverage is PARTIAL across all 80+ routers. Closing the gap is in the post-close engineering roadmap.

9. Prompt injection defense

All free-text user input that enters a prompt is wrapped in <user_input> XML tags and the model is instructed (in system prompt) to treat them as untrusted strings.
Tool-use surface area is narrow: each AI call defines a fixed list of allowed tool calls; the model cannot invent tool names.
No eval on AI outputs. Structured outputs are parsed into Pydantic, and only validated fields are read.
We do not let the model issue arbitrary SQL, shell commands, or HTTP requests. Every external action is a typed tool with a server-side allow-list.

10. Failure modes

Failure	Detection	Response
Low extraction confidence	confidence < threshold	Queue to human review (`clinician_screens` migration 066)
Model API error (5xx, rate limit)	exception in `ai_gateway.py`	Exponential backoff, then queue with status `pending_retry`
Context-window overflow	token count check pre-call	Chunk + sliding window with overlap; reconcile in post-step
Cross-source disagreement	adapter status='conflict'	Auto-escalate to advisor; if unresolved, route to human
Bedrock invocation error (5xx, throttle)	exception in `ai_gateway.py`	Exponential backoff + retry on different model ID; if persistent, queue with status `pending_retry`
Cost budget breach	per-tenant daily counter exceeded	Hard stop AI workloads for that tenant; alert ops

11. What this memo does not claim

We do not claim "Rōvn surfaces evidence; humans decide who gets credentialed." It does not. Humans are the actors on every credentialing, privileging, hiring, and clinical decision.
We do not claim full ai_runs coverage across every router, coverage is PARTIAL, closing post-close.
We do not claim a non-Bedrock fallback path, Bedrock under BAA is the production AI chain; there is no direct-Anthropic-API path in production.
We do not claim Opus as an executor, Opus runs only inside the advisor tool surface.

End of memo.

Ask the AI agent about this section, the raise, compliance posture, or any cross-document question. Grounded in Rōvn's deep context, with on-page source citations.

AI queries route through AWS Bedrock under BAA · Anthropic Claude (Haiku 4.5) under BAA · zero-data-retention posture · no PHI in prompts.