Technical & Architecture

Source Authority Rail

AI Diligence Console

Source Authority Rail

Date: 2026-05-14 Scope: The 9-adapter verification rail that powers Rōvn's source-verified tier (tier 4) and the cached-replay pricing model, plus the coverage catalog it serves: 43 roles × 51 jurisdictions = 2,193 role/state coverage cells, 0 unsupported cells. Posture: Mixed, some adapters LIVE, most scaffolded/partial pending source-access credentials; manual primary source verification (PSV) is the fallback where automation is not live. Persistence schemas: 032_source_receipts_and_authority_policies.sql, 062_source_adapter_evidence.sql, 068_source_receipts_s3_artifact.sql. Nursys e-Notify LIVE under DBID 399700000147857 (memory log reference_npdb_registration.md).

1. Why a rail and not one-off integrations

A credentialing platform that wires "an API to NPDB" is one webhook away from being a Vivian / symplr feature. Rōvn instead built a uniform rail: every source is wrapped in an adapter that obeys one contract, query, persist receipt, hash-chain, respect TTL. The result is:

Every source-verified tile in every Passport carries the same receipt shape, regardless of source.
Cached-replay is meaningful, within an authority's published refresh window, a receipt is legally fresh.
Adding a 10th source is an adapter, not a re-architecture.

The rail is the moat. Inventory grows with the network; pricing curves bend with it.

2. The 9 source adapters and the 43-role, 51-jurisdiction coverage catalog

Two distinct things must not be conflated:

Source adapters (9). The integration layer, one adapter per source authority Rōvn queries. This is the engineering count.
Coverage catalog (43 × 51 = 2,193 cells). The role/jurisdiction matrix the rail serves: 43 healthcare roles across 51 jurisdictions (50 states + DC), 2,193 role/state coverage cells, 0 unsupported catalog cells. This is a coverage map, not an adapter count.

The 9 source adapters

#	Adapter	Authority	Status
1	NPDB	National Practitioner Data Bank	LIVE, continuous query, DBID `399700000147857` (migration 052)
2	Nursys	NCSBN Nursys license verification + e-Notify	LIVE, e-Notify subscription active (migration 007)
3	NPPES	NPPES NPI Registry	LIVE, public registry, no credential needed
4	DEA	DEA Controlled Substance Registration	PARTIAL, adapter scaffolded (migration 063); source-access credential pending
5	FSMB	Federation of State Medical Boards	TARGET, adapter scaffolded; source-access agreement pending
6	OIG	OIG List of Excluded Individuals / Entities (LEIE)	LIVE, monthly snapshot sync (migration 005)
7	SAM	SAM.gov federal exclusion check	LIVE, public dataset
8	State board of nursing	State BON license verification (non-Nursys states)	PARTIAL, `state_bon_dispatch_extension` (migration 065); per-state agreements rolling out
9	Verifiable	Verifiable.com state-board federation	PARTIAL, integration scaffolded (migration 067)

Most adapters require source-access credentials and are scaffolded or partial, they are not all live. Where an adapter is not live, manual primary source verification (PSV) is the fallback: a human runs the verification directly against the source and the receipt is captured the same way. An adapter is marked LIVE only where it is actually wired and routing production queries.

The 9-adapter list is encoded in source_authority_policies and is the authoritative source of truth at runtime. The 43-role × 51-jurisdiction coverage catalog (2,193 cells) is what those 9 adapters and the manual-PSV fallback together cover.

3. Adapter contract

Every adapter implements the same interface:

class SourceAdapter:
    name: str                         # e.g., "NPDB", "STATE_BON_CA"
    authority: str                    # e.g., "federal", "state", "specialty", "payer"
    ttl_seconds: int                  # source-published refresh window
    per_query_cost_usd: Decimal       # pass-through cost
    cached_replay_price_usd: Decimal  # what we charge for cache-hit

    def query(self, worker, credential, params) -> SourceReceipt
    def parse(self, raw_payload) -> StructuredReceipt
    def health_check(self) -> bool

SourceReceipt always carries: source, source_url, source_timestamp, payload_hash, status (match / mismatch / not_found / conflict / error), s3_artifact_key, ttl_seconds, actor.

Persistence: credential_source_receipts table + raw payload to S3 (Object Lock 7-year retention).

Hash-chain integration: every receipt write produces an audit_log event (actor_kind system if cadence-driven, human if user-initiated).

4. TTL policy (per-source freshness)

Every source publishes a refresh window. We respect it. Within the window, a cached receipt is legally fresh, we are not pretending a stale receipt is new; we are honoring the source's own statement of validity.

Source class	TTL	Reasoning
NPDB (continuous query)	30 days (default)	NPDB recommends within-30-day refresh for continuous queries; we honor the recommended cadence
NPDB (one-time query)	90 days	One-time queries have a longer published validity
OIG LEIE	Monthly	LEIE publishes monthly; mid-month query returns the latest published snapshot
Nursys e-Notify	Continuous	We hold an e-Notify subscription; any change in license status arrives as a push event within minutes
State board of nursing (non-Nursys)	7-30 days	Per-state; some states publish daily, some weekly
DEA	Per DEA renewal cycle	DEA registration is stable between renewals
FSMB	Per board cycle	State medical board status changes infrequently
Verifiable (state-board federation)	Per source-board cadence	Inherits the underlying state board's refresh window
NPPES NPI registry	Continuous (NPI is permanent)	NPI does not change; refresh only when the worker reports a change

These TTLs are stored in source_authority_policies and consulted on every cache check.

5. Cache-replay legality posture

The legal posture is: cached-replay is bounded by the source authority's own published refresh window.

We do not invent freshness. A cached receipt is shown with the original source_timestamp (when the source actually responded), not a fictional "now."
We do not market a cached result as "freshly verified." API responses include a cached: true flag; UI receipts label the timestamp the source actually responded.
We do not exceed the source's TTL. If the source says 30 days, day-31 forces a fresh fetch (or, at the customer's option, a cadence-driven background refresh that completes before day-31).
We honor source-side change events (Nursys e-Notify, OIG monthly drops) by invalidating cached receipts immediately on push.

This is the difference between Rōvn and a vendor that says "we verify" but actually just stores a stale value. Every cached response carries the receipt of the original source fetch, provenance is preserved, not paved over.

6. Cache-hit economics

The unit-economics math behind the cached-replay pricing:

Worker A's NPDB receipt persists in our system.
Fresh NPDB query: $7.50 pass-through + margin.
Day 14: a second Rōvn customer asks for Worker A's NPDB status.
We are within NPDB's 30-day continuous-query TTL.
We serve the cached receipt: $0.50 (margin only, no pass-through to NPDB).

Per-query margin step: $0.50 / ~$0.05 cost = ~10× on cached.
Versus fresh: ~15% margin on $7.50.

Multiplied by the Verified API's network shape:

Y1 cache-hit rate: ~0% (cold start, every query is fresh).
Y3 cache-hit rate: ~40% (network has accumulated receipts).
Y5 cache-hit rate: ~70% (steady-state).

Gross-margin curve (from 04_data_room/06_financial/3_CASE_MODEL_SUMMARY.md):

Year	Gross margin (base case)	Driver
Y1	40%	Cold-start, mostly pass-through pricing
Y2	58%	Network accumulates first wave of cached receipts
Y3	72%	Multi-tenant cache hits compound
Y4	80%	Most receipts amortized across multiple consumers
Y5	86%	Steady-state cache hit rate

Competitor day-0 cannot match this curve. A new entrant starts with zero inventory; they must charge pass-through-plus-margin on every query, and they have no second-customer-effect.

7. Anti-fraud and conflict signaling

The rail is also a fraud surface. anti_fraud_signals (migration 064_anti_fraud_signals.sql) captures cross-source disagreements:

Worker self-reports active license, state BON says lapsed → conflict signal.
Document OCR says license #123, state BON says #124 → conflict signal.
Nursys e-Notify reports disciplinary action → immediate tile-tier downgrade + audit alarm.
NPI registry name does not match document name → identity-anomaly signal.

Anti-fraud signals are written to audit_log and surfaced in the facility workflow layer cockpit (and to the worker in their Wallet view, with clear next-steps).

8. Adapter operational discipline

Health checks run every 5 minutes per adapter; failures alert ops.
Per-source rate limiting respects each authority's published API limits (and per-tenant fair-use).
Retry with exponential backoff on transient errors; max 3 retries then human-review queue.
Vendor outage runbook documented in RUNBOOK.md: when NPDB API is down, queries pend; when state BON site changes layout, the per-state adapter alerts and we roll a parser fix.
Adapter versioning. Every adapter has a semver. Receipt rows record the adapter version that produced them, replay is bound to the adapter version, not "whatever's current."

9. What this rail does not claim

We do not claim all 9 source adapters are live, NPDB, Nursys, NPPES, OIG, and SAM are LIVE; DEA, FSMB, state board of nursing, and Verifiable are scaffolded/partial pending source-access credentials. Manual PSV is the fallback wherever automation is not live.
We do not claim the 43-role × 51-jurisdiction (2,193-cell) coverage catalog is an adapter count, it is the coverage map the 9 adapters plus manual PSV serve.
We do not claim continuous-query coverage on every source, coverage is per-source, per TTL.
We do not claim a vendor's data is correct upstream of us. We claim we faithfully captured what the source said at that timestamp, and our hash-chain proves it.
We do not claim cached pricing across all customers regardless of consent, consent boundaries are enforced; cached-replay between customers requires worker consent to share that tile.

End of memo.

Ask the AI agent about this section, the raise, compliance posture, or any cross-document question. Grounded in Rōvn's deep context, with on-page source citations.

AI queries route through AWS Bedrock under BAA · Anthropic Claude (Haiku 4.5) under BAA · zero-data-retention posture · no PHI in prompts.