macsync/docs/DEPLOY.md
Natalie e52ed1b44f
Some checks failed
Swift Build & Test / swift build + test (push) Waiting to run
Server Typecheck & Test / bun typecheck + test (push) Failing after 5m48s
feat(deploy): codify the ct.prod DMZ edge + add deploy/ops runbook
deploy-edge.sh reproducibly configures macsync's public edge on ct.prod (Caddy
-> macsync 10.20.0.5:3201 over the VPC), so a ct.prod rebuild restores it (it was
hand-configured during cut-over). docs/DEPLOY.md documents the two-box DMZ/internal
topology, one-command deploys, rebuild recovery, secrets model, security posture,
and how to run the tests. Verified: edge returns 200.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-01 06:25:49 -04:00

4.5 KiB

macsync — Deploy & Operations Runbook

DO-native, off-black. macsync runs entirely on DigitalOcean (droplets + Managed Postgres + Spaces). black is long-term backup only.

Topology (as deployed)

Mac (plum, MacSyncApp menu-bar agent)
  │  reads Messages/Contacts/Calendar/Reminders/Calls/Photos/Notes locally
  │  HTTPS  →  macsync.ct.uvlava.com
  ▼
ct.prod  (com.uvlava.ct.prod, DMZ)          reserved IP 144.126.248.192
  │  Caddy: TLS edge, the ONLY public listener (80/443)
  │  reverse_proxy over the private store VPC ─────────────┐
  ▼                                                        │
ct.services (lime, INTERNAL — zero public app ports)  VPC 10.20.0.5:3201
  │  mac-sync-server (bun/Hono, systemd: mac-sync-server.service)
  ├─ DO Managed Postgres (private) 10.20.0.3:25060  db=macsync schema=macsync.*
  ├─ Redis (127.0.0.1:6379) — job/cache queue
  └─ DO Spaces (private bucket lilith-quinn-media) — photo blobs, presigned URLs

Two boxes, one job each: ct.prod faces the internet; lime stays internal. Everything between them is the private VPC (10.20.0.0/16); the mesh (wg1, 10.9.0.0/24) is for admin/inter-service, not the public path.

Deploy (one command each)

Both scripts are rebuild-safe and idempotent; run from the repo root on the Mac.

Script What it does
deploy/deploy-server.sh Full server deploy to lime: installs runtime (bun/redis/caddy), rsyncs src/server, pushes secrets over SSH, writes the systemd unit, restarts, verifies /health/deep. --code = code + restart only.
deploy/deploy-edge.sh Configures the macsync edge on ct.prod: Caddy macsync.ct.uvlava.com10.20.0.5:3201 (VPC). Restores the DMZ edge after a ct.prod rebuild.
./deploy/deploy-server.sh      # server on lime
./deploy/deploy-edge.sh        # public edge on ct.prod

Secrets (never in git, never in cloud-init user-data — that's metadata-readable)

deploy-server.sh sources them at deploy time:

  • DB passworddoctl databases user get <cluster> macsync_app
  • SERVICE_TOKENCT_SERVICE_TOKEN in ~/Code/@ct/.env.local (shared @ct operator token)
  • Spaces keys~/Code/@ct/.vault/do-spaces-uvlava.{access,secret}

Rebuild recovery (terraform wiped a droplet)

Cloud-init installs only runtime (no secrets, no app code), so a rebuilt box needs a one-command redeploy:

  • lime rebuiltterraform apply (cloud-init installs runtime, per infra/uvlava/terraform/do/cloud-init/backend.yaml) → ./deploy/deploy-server.sh. If the box's cloud firewall reset, re-open :3201 from the VPC.
  • ct.prod rebuiltterraform apply -target=digitalocean_droplet.ct_prod ... (needs ct_prod_enabled = true) → ./deploy/deploy-edge.sh.

Provisioning ct.prod from scratch

cd infra/uvlava/terraform/do
export TF_VAR_do_token=$(cat ~/.vault/do-pat-ct.token)
terraform apply -var 'ct_prod_enabled=true' \
  -target=digitalocean_droplet.ct_prod \
  -target=digitalocean_reserved_ip.ct_prod \
  -target=digitalocean_firewall.ct_prod
# then persist ct_prod_enabled=true in terraform.tfvars so a plain apply won't destroy it,
# point DNS macsync.ct.uvlava.com → the reserved IP, and run ./deploy/deploy-edge.sh

Security posture (@ct-only)

  • Data: token-gated (every /client/* request needs the device/operator bearer; registration requires the operator token). Public probes get 401.
  • Network: lime has zero public app ports; the DB + Spaces are private; the only public surface is ct.prod's Caddy (80/443). Inter-service is VPC/mesh.
  • Single operator (Quinn). One shared @ct token across @ct services.

Verify

curl -s https://macsync.ct.uvlava.com/health          # {"ok":true}
curl -s https://macsync.ct.uvlava.com/health/deep      # db check (200)
# permissions across devices:
curl -s https://macsync.ct.uvlava.com/health/permissions -H "authorization: Bearer $CT_SERVICE_TOKEN"

Tests

cd src/server
bun test               # unit suite (no DB needed)
bun run typecheck      # tsc --noEmit

Integration suites use an ephemeral per-run schema (macsync_test_*) and require a Postgres they can create/drop schemas in — set QUINN_MACSYNC_DB_URL. Run them in CI or on a VPC host with a throwaway DB (do not point them at the production cluster). Without QUINN_MACSYNC_DB_URL the DB-backed suites fail fast by design; the unit suite is unaffected.

Swift client: swift test in @packages/inotes (and siblings).