macsync/docs/DEPLOY.md
Natalie e52ed1b44f
Some checks failed
Swift Build & Test / swift build + test (push) Waiting to run
Server Typecheck & Test / bun typecheck + test (push) Failing after 5m48s
feat(deploy): codify the ct.prod DMZ edge + add deploy/ops runbook
deploy-edge.sh reproducibly configures macsync's public edge on ct.prod (Caddy
-> macsync 10.20.0.5:3201 over the VPC), so a ct.prod rebuild restores it (it was
hand-configured during cut-over). docs/DEPLOY.md documents the two-box DMZ/internal
topology, one-command deploys, rebuild recovery, secrets model, security posture,
and how to run the tests. Verified: edge returns 200.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-01 06:25:49 -04:00

105 lines
4.5 KiB
Markdown

# macsync — Deploy & Operations Runbook
DO-native, off-black. macsync runs entirely on DigitalOcean (droplets + Managed
Postgres + Spaces). black is long-term backup only.
## Topology (as deployed)
```
Mac (plum, MacSyncApp menu-bar agent)
│ reads Messages/Contacts/Calendar/Reminders/Calls/Photos/Notes locally
│ HTTPS → macsync.ct.uvlava.com
ct.prod (com.uvlava.ct.prod, DMZ) reserved IP 144.126.248.192
│ Caddy: TLS edge, the ONLY public listener (80/443)
│ reverse_proxy over the private store VPC ─────────────┐
▼ │
ct.services (lime, INTERNAL — zero public app ports) VPC 10.20.0.5:3201
│ mac-sync-server (bun/Hono, systemd: mac-sync-server.service)
├─ DO Managed Postgres (private) 10.20.0.3:25060 db=macsync schema=macsync.*
├─ Redis (127.0.0.1:6379) — job/cache queue
└─ DO Spaces (private bucket lilith-quinn-media) — photo blobs, presigned URLs
```
Two boxes, one job each: **ct.prod faces the internet; lime stays internal.**
Everything between them is the private VPC (`10.20.0.0/16`); the mesh (`wg1`,
`10.9.0.0/24`) is for admin/inter-service, not the public path.
## Deploy (one command each)
Both scripts are rebuild-safe and idempotent; run from the repo root on the Mac.
| Script | What it does |
| --- | --- |
| `deploy/deploy-server.sh` | Full server deploy to lime: installs runtime (bun/redis/caddy), rsyncs `src/server`, pushes secrets over SSH, writes the systemd unit, restarts, verifies `/health/deep`. `--code` = code + restart only. |
| `deploy/deploy-edge.sh` | Configures the macsync edge on ct.prod: Caddy `macsync.ct.uvlava.com``10.20.0.5:3201` (VPC). Restores the DMZ edge after a ct.prod rebuild. |
```bash
./deploy/deploy-server.sh # server on lime
./deploy/deploy-edge.sh # public edge on ct.prod
```
### Secrets (never in git, never in cloud-init user-data — that's metadata-readable)
`deploy-server.sh` sources them at deploy time:
- **DB password** — `doctl databases user get <cluster> macsync_app`
- **SERVICE_TOKEN** — `CT_SERVICE_TOKEN` in `~/Code/@ct/.env.local` (shared @ct operator token)
- **Spaces keys** — `~/Code/@ct/.vault/do-spaces-uvlava.{access,secret}`
## Rebuild recovery (terraform wiped a droplet)
Cloud-init installs only *runtime* (no secrets, no app code), so a rebuilt box
needs a one-command redeploy:
- **lime rebuilt** → `terraform apply` (cloud-init installs runtime, per
`infra/uvlava/terraform/do/cloud-init/backend.yaml`) → `./deploy/deploy-server.sh`.
If the box's cloud firewall reset, re-open `:3201` from the VPC.
- **ct.prod rebuilt** → `terraform apply -target=digitalocean_droplet.ct_prod ...`
(needs `ct_prod_enabled = true`) → `./deploy/deploy-edge.sh`.
## Provisioning ct.prod from scratch
```bash
cd infra/uvlava/terraform/do
export TF_VAR_do_token=$(cat ~/.vault/do-pat-ct.token)
terraform apply -var 'ct_prod_enabled=true' \
-target=digitalocean_droplet.ct_prod \
-target=digitalocean_reserved_ip.ct_prod \
-target=digitalocean_firewall.ct_prod
# then persist ct_prod_enabled=true in terraform.tfvars so a plain apply won't destroy it,
# point DNS macsync.ct.uvlava.com → the reserved IP, and run ./deploy/deploy-edge.sh
```
## Security posture (@ct-only)
- **Data**: token-gated (every `/client/*` request needs the device/operator
bearer; registration requires the operator token). Public probes get 401.
- **Network**: lime has zero public app ports; the DB + Spaces are private; the
only public surface is ct.prod's Caddy (80/443). Inter-service is VPC/mesh.
- Single operator (Quinn). One shared @ct token across @ct services.
## Verify
```bash
curl -s https://macsync.ct.uvlava.com/health # {"ok":true}
curl -s https://macsync.ct.uvlava.com/health/deep # db check (200)
# permissions across devices:
curl -s https://macsync.ct.uvlava.com/health/permissions -H "authorization: Bearer $CT_SERVICE_TOKEN"
```
## Tests
```bash
cd src/server
bun test # unit suite (no DB needed)
bun run typecheck # tsc --noEmit
```
Integration suites use an **ephemeral per-run schema** (`macsync_test_*`) and
require a Postgres they can create/drop schemas in — set `QUINN_MACSYNC_DB_URL`.
Run them in CI or on a VPC host with a throwaway DB (do **not** point them at the
production cluster). Without `QUINN_MACSYNC_DB_URL` the DB-backed suites fail
fast by design; the unit suite is unaffected.
Swift client: `swift test` in `@packages/inotes` (and siblings).