Rōvn · Investor Room
AI agent: checking…
All sections
Technical & Architecture

Deployment Overview

Diligence noticeWorking state of Rōvn as of 2026-06-24 · Pre-launch by designSee 09 for receipts →
AI Diligence Console

Deployment Overview

Date: 2026-05-14 Scope: How code becomes production at Rōvn, environments, CI/CD pipeline, migrations, marketing-site deploys, observability, incident response, rollback, secrets rotation. Posture: LIVE for production deploy pipeline (S3 → CodeBuild → ECR → ECS Fargate). PARTIAL for fully isolated staging environment.


1. Environments

Environment Domains Purpose Status
Production rovn.to, passport.rovn.to, app.rovn.to, passport.rovn.to/mcp The live product surface, workers, facilities, partners LIVE
Investor portal (separate Cloudflare Pages project, gated) Diligence document distribution LIVE
Staging TBD (pre-design-partner) Mirror of prod for pre-launchStage03.1 Company Overview · pre-launch by design, zero paying customers, zero signed pilots or design partners validation PARTIAL
Local dev localhost:8000 + docker-compose Engineer workstations LIVE

The production deploy pipeline is the same one used for the investor portal and marketing site. Staging is intentionally PARTIAL today: pre-launchStage03.1 Company Overview · pre-launch by design, zero paying customers, zero signed pilots or design partners, the design-partner pilots run on production with feature flags gating who sees what. A fully isolated staging environment lights up when Pilot tier flips from "design partner" to "GA pilot."


2. Production deploy pipeline (S3 → CodeBuild → ECR → ECS)

This is the path documented in memory log reference_rovn_deploy_mechanic.md and verified end-to-end on each production deploy (current production revision: rovn-passport-api:288).

Engineer pushes commit to main branch
                │
                ▼
GitHub Actions: build, lint, unit tests
                │
                ▼
Source zip uploaded to S3 (CodeBuild source bucket)
                │
                ▼
CodeBuild: Docker image build
   - multi-stage Dockerfile
   - python3.11-slim base
   - non-root user
   - dependency cache layer
                │
                ▼
ECR: push image tagged prod-YYYYMMDDhhmmss[-suffix]
                │
                ▼
ECS task definition: register new revision
   - image ARN bumped to new ECR tag
   - secret refs unchanged
   - env vars unchanged
   - cpu / memory unchanged unless explicit
                │
                ▼
ECS service: update-service to new task def revision
                │
                ▼
Rolling deploy across AZs (multi-AZ)
   - new tasks come up
   - ALB health checks must pass
   - old tasks drained gracefully
                │
                ▼
Health check on /health passes → traffic shifts
                │
                ▼
Old task definition retained (rollback target)

Critical operational rule (per memory log): force-new-deployment alone is a no-op. Every prod deploy registers a new task definition revision. The deploy mechanic was verified end-to-end most recently on the 2026-05-27 production deploy: rovn-passport-api:288 / prod-202605270526-ai-competitive-fix (git tag prod-ai-competitive-fix-2026-05-27).


3. Database migrations

  • Migration tooling: custom Alembic-style runner (apply_migrations.py in rovn-platform/migrations/).
  • Schema state today: 89+ migration files numbered sequentially (plus the 2026_04_14_audit_log_harden.sql hotfix). The full list is in the migrations folder.
  • Forward-only. No DOWN migrations in production. Reversal happens by writing a new forward migration.
  • Idempotent. Every migration is wrapped to be safely re-runnable (e.g., CREATE TABLE IF NOT EXISTS, ALTER TABLE ... ADD COLUMN IF NOT EXISTS).
  • Order rule: migrations run before the new ECS task definition takes traffic. Deploy script blocks on migration completion + integrity check.
  • Pre-deploy snapshot: every migration deploy is preceded by a manual RDS snapshot (named with the migration filename and timestamp). Snapshot retention is 90 days for these manual snapshots.
  • PHI columns. Migrations that touch PHI columns require two-engineer review per repo CODEOWNERS rule.

4. Marketing site and investor portal

  • Marketing site (rovn.to): Cloudflare Pages, project rovn-design. Build on push to main branch. Zero PHI surface.
  • App route shell (rovn.to/login, /signup/*, /nurse, /hospital, /facility, etc.): also served from the rovn-design Cloudflare Pages project per the 2026-05-11 unified-domain note. Routes call the FastAPI backend at passport.rovn.to for data.
  • Investor portal: separate Cloudflare Pages project, separate domain, gated. Distributes diligence docs.
  • DNS: Cloudflare-managed for rovn.to and app.rovn.to. Apex rovn.to and passport.rovn.to resolve to AWS-side services through Cloudflare orange-cloud for marketing, gray-cloud (DNS-only) for passport.rovn.to so PHI traffic does not route via Cloudflare edge.

5. Monitoring and observability

Layer Tool What we watch
Application errors Sentry Unhandled exceptions, deploy regressions
Logs CloudWatch Logs (structured JSON) Per-request logs, PHI scrubbed before write
Metrics CloudWatch Metrics ECS service health, RDS CPU / connections, ALB 5xx, request RPS
Alarms CloudWatch Alarms → Slack + PagerDuty 5xx > 1% / 5 min, RDS CPU > 85% sustained, ECS running tasks < desired
Distributed trace AWS X-Ray 10% sample steady-state; 100% on /admin/* and /audit/*
Synthetic CloudWatch Synthetics canary Hits /health every 30s from us-east-2
Compliance Drata SOC 2 evidence collection, control drift
Cost AWS Budgets + Anthropic API dashboards Per-tenant token + infra spend

P0 alarms wake the on-call rotation. P1 alarms Slack-only during business hours.


6. Incident response

  • On-call rotation: Giles (primary) · Christian (backup) · engineering on-call (extended-hours tertiary). PagerDuty schedules and overrides documented in RUNBOOK.md.
  • P0 definition: customer-impacting outage, data-integrity event, suspected PHI exposure, suspected security incident. Escalate within 5 minutes.
  • P1 definition: degraded but not down. Ack within 15 minutes during business hours.
  • Post-mortem discipline. Every P0 gets a written post-mortem within 24 hours, distributed to the team and (when relevant) to affected design partners. Drafts use a blameless template.
  • Status page: TARGET. Pre-launch, communication runs through direct partner contact.

7. Rollback

Two rollback paths exist:

  1. ECS rollback (most common). Update ECS service to the previous task definition revision. Image ARN reverts to the prior prod-* tag. Traffic shifts back on health-check pass. RTO ~3 minutes from decision to traffic.

  2. Database migration reversal. Because migrations are forward-only, a "rollback" is a new forward migration that undoes the prior change. For non-PHI columns, this is the standard path. For PHI columns, additional review is required.

Operational rule: never roll back the ECS task without first confirming whether a migration ran in the prior deploy. If a migration changed the schema in a way that the prior image cannot read, the rollback is a forward migration plus ECS rollback. The deploy script logs each migration in audit_log, which is the source of truth.


8. Secrets rotation

  • Store: AWS Secrets Manager only. No secrets in source. No secrets in env-var task-definition fields (refs to Secrets Manager ARNs only).
  • Rotation cadence:
  • Database master keys: 90 days
  • Anthropic API key: 90 days
  • Persona, Checkr, WorkOS, Stripe API keys: 90 days
  • JWT signing keys: 180 days, rolling (old key remains valid for in-flight tokens)
  • MCP server outbound + inbound tokens: 90 days
  • Drift monitoring: Drata watches for new IAM principals granted secretsmanager:GetSecretValue; deviations from the allow-list page ops.
  • Audit: every rotation event writes to audit_log and Slack-notifies the security channel.

9. CodeBuild + Docker hygiene

  • Image base: python:3.11-slim pinned by SHA digest.
  • Non-root user: all containers run as a non-root UID.
  • Read-only root FS in production task def (writable tmpfs mount only).
  • Image scanning: ECR image scan on push (basic) + scheduled Snyk scan in CI.
  • Dependency lockfile: uv.lock / requirements.lock committed; CI fails on lockfile drift.

10. Deploy authorization

Per memory log reference_rovn_deploy_auth.md:

  • IAM user claude-deploy is the only programmatic identity (besides break-glass humans) authorized to ECR push and ECS service update.
  • Scope: PowerUser + IAMFull + ArtifactSync. Trimmed to deploy-only at the SCP level.
  • Protocol: preview → confirm → execute → verify → log on every prod write.
  • Audit chain captures every deploy event.

11. What this overview does not claim

  • We do not claim a fully isolated staging environment today, PARTIAL (feature-flagged production for pre-launchStage03.1 Company Overview · pre-launch by design, zero paying customers, zero signed pilots or design partners).
  • We do not claim public-facing status page is live, TARGET.
  • We do not claim automated cross-region failover, DR is multi-AZ active + cross-region cold standby; failover is documented but manual.
  • We do not claim fully automated dependency-update PRs (Renovate / Dependabot), partial coverage today; full coverage on the post-close roadmap.

End of overview.

Ask the AI agent about this section, the raise, compliance posture, or any cross-document question. Grounded in Rōvn's deep context, with on-page source citations.

AI queries route through AWS BedrockAI provider chain07.3 AI Architecture · AWS Bedrock under BAA → Anthropic Claude Haiku 4.5 under BAA → Rōvn ECS under BAA · Anthropic Claude (Haiku 4.5)Model identity07.3 AI Architecture · Haiku 4.5 chosen for cost + latency + BAA chain under BAA · zero-data-retention posture · no PHI in prompts.