# Deploy — Prospector prod on `ct.prod` (the hardened public DMZ host) ## Topology (authoritative, 2026-06-30): ct.prod is the public prod host The public sales edge does **NOT** live on `lime`. `lime` is the **internal** store/backend box and keeps **zero public app ports**. Prospector's prod target is **`ct.prod`** (`com.uvlava.ct.prod`) — a **new, dedicated, hardened** DO droplet (nyc3, store VPC, joins wg1) whose only job is to face the internet: ``` internet --(80/443)--> Caddy on ct.prod --(127.0.0.1:3210)--> NestJS app ct.prod --(store VPC 10.20.0.0/24)-----> DO Managed PG (lilith-store-pg, private) ct.prod --(wg1 mesh 10.9.0.0/24)-------> people / mac-sync / mr-number ``` - **Public name**: `apps.ftw.pw` (Caddy + Let's Encrypt). `ftw.pw` is a SEPARATE zone, **not** DO-managed — see the DNS step below. - **The app binds `127.0.0.1:3210` only.** Caddy is the sole public listener and **403s `/internal/*`** (the mac-sync inbound webhook + peers); macsync hits `/internal/inbound` over the **mesh** (`http://10.9.0.10:3210/internal/inbound`), never the public leg. - **DB + mesh deps over private paths only.** DO Managed PG over the VPC; people/mac-sync/mr-number over wg1. **mac-sync runs on the operator's Mac** (not lime, not ct.prod) — `MACSYNC_BASE_URL`/`MACSYNC_DEVICE_ID` are operator-set. - **lime stays internal** (mesh-only; no app/edge ports). - IaC: `uvlava/terraform/do/ct_prod.tf` (count-gated `ct_prod_enabled`; droplet + reserved IP + cloud firewall `80/443` public, `22`+wg mesh-only). Hardened cloud-init `cloud-init/ct-prod.yaml`: ufw, fail2ban, unattended-upgrades, non-root `deploy` user, node20. Mesh entry: `mesh-hosts.json` host `ct.prod`, wg `10.9.0.10`. ## As-built (2026-07-01) — first live bring-up + gotchas The first real deploy landed. What actually happened, and the traps to avoid: - **Live host**: droplet **`581442557`** (`com.uvlava.ct.prod`, 2 GB — the terraform `s-1vcpu-2gb`), reserved IP **`144.126.248.192`**, default IP `159.203.90.3`. App up (systemd `prospector`), DB `up`, migrations applied, Caddy serving. - **DNS is a CNAME into the DO zone** (not a raw A at the registrar): `apps.ftw.pw` → CNAME → `apps.ct.uvlava.com` (A, `digitalocean_record.ct_apps` in `dns.tf`) → the ct.prod reserved IP. Set the joker.com CNAME **once**; the IP lives in IaC. **Use CNAME, never url-forwarding** (browser must stay on `apps.ftw.pw` so Caddy issues its LE cert). - **The backend depends on `@cocotte/ai-harness`** — a workspace package **published to the ct-forge verdaccio** (`http://134.199.243.61:4873/`). `npm ci` on ct.prod can't resolve a local workspace link, so `deploy-server.sh` ships an `.npmrc` (scope routing + read token) and explicitly installs the published tarball after `npm ci`. (Publishing it is CI/CD's job on `main`.) - **DB**: the `prospector` DB + role already exist on `lilith-store-pg`; the deploy only fills `PROSPECTOR_DB_*` in `/opt/prospector/.env` (private host `private-lilith-store-pg-…ondigitalocean.com:25060`, `sslmode=require`, DO CA cert). **ct.prod must be a DB trusted source** — `doctl databases firewalls append --rule droplet:` — or migrations/connect time out. ### ✅ Resolved (2026-07-01) 1. **Duplicate droplet (forge-duplication landmine)** — a second hand-created `com.uvlava.ct.prod` (`581541024`, 4 GB, reserved `134.199.244.34`) was billing in parallel and `apps.ct` wrongly pointed at *its* IP (causing `/prospector/*` 404s + LE failures on the real host). **Destroyed** the droplet + released its reserved IP; only the terraform-tracked `581442557` remains. 2. **Terraform drift on `apps.ct`** — state tracked a stale record id; dropped it and `terraform import`ed the live record (`1824103028` → `144.126.248.192`). `plan` now reports **No changes**. 3. **LE cert** — real Let's Encrypt cert (CN=YE1, non-staging) issued for `apps.ftw.pw` once DNS propagated to `144.126.248.192`. Verified `https://apps.ftw.pw/health` → 200. 4. **Edge auth + token injection (RESOLVED 2026-07-01)** — the console PWA carries no bearer token, so guarded `/prospector/*` calls 401'd through the edge. The Caddyfile (`deploy/edge/apps.ftw.pw.Caddyfile`) now gates the whole site with **`basic_auth`** (operator login) and **injects `Authorization: Bearer {$PROSPECTOR_SERVICE_TOKEN}`** to the loopback app — so an *authenticated* operator's browser is authorized, but the token is never handed to anonymous visitors. Creds + token live in Caddy's systemd env **`/etc/caddy/caddy.env`** on ct.prod (`OPERATOR_USER`, `OPERATOR_BCRYPT`, `PROSPECTOR_SERVICE_TOKEN`), wired via a `caddy.service.d/env.conf` drop-in — **not committed**. Rotate the operator password with `caddy hash-password --plaintext ''` → update `OPERATOR_BCRYPT` in `caddy.env` → `systemctl restart caddy`. Verified: anon→401, operator→console + `/prospector/*` 200, `/internal`→403. ### ⚠ Still open 1. **Private registry over HTTPS** — the deploy pulls `@cocotte/ai-harness` from the ct-forge verdaccio at plaintext `http://134.199.243.61:4873`. Once `npm.ct.uvlava.com` is routed to Verdaccio (TLS via the artifacts-host Caddy), point `.npmrc` / `deploy-server.sh` / the app `.npmrc` at `https://npm.ct.uvlava.com/` — no bearer token over HTTP. 2. **Public SSH exposure** — the DO firewall's `admin_ips` isn't mesh-only as designed: `:22` answers on the public IP. Tighten `var.admin_ips` to the mesh (wg-only SSH). 3. **ct.prod not on wg1** — `phase-b-mesh-join.sh` wasn't run, so `10.9.0.10` is unreachable and people/mac-sync/mr-number (mesh deps) aren't reachable yet. Deploy currently runs over the **public IP** (`SERVER_HOST=`); join wg1 for the mesh deps + to move SSH off the public leg. > ⚠️ ct.prod must be added as a **TRUSTED SOURCE** on the `lilith-store-pg` > managed cluster (DO console → Databases → firewall) or migrations + the app's > DB connect will time out. ### Operator runbook — bring ct.prod live (in order) All terraform here is **plan/apply with `-target`** so the rest of the shared store tier is never dragged in. `ct_prod_enabled` defaults false; the `-var` flips it on for this targeted apply only. ```bash cd ~/Code/@ct/infra/uvlava/terraform/do export TF_VAR_do_token="$(cat ~/.vault/do-pat-ct.token)" # 1. Stand up ct.prod (droplet + reserved IP + cloud firewall) — ONLY these. terraform plan -var=ct_prod_enabled=true \ -target=digitalocean_droplet.ct_prod \ -target=digitalocean_reserved_ip.ct_prod \ -target=digitalocean_firewall.ct_prod # review: 3 to add, 0 change, 0 destroy terraform apply -var=ct_prod_enabled=true \ -target=digitalocean_droplet.ct_prod \ -target=digitalocean_reserved_ip.ct_prod \ -target=digitalocean_firewall.ct_prod terraform output -raw ct_prod_public_ip # = the reserved IP (only exists now) # 2. Join ct.prod to wg1: copy /root/wg1.pub off the box, add it as a [Peer] on # the nyc3 hub (citron); append the citron [Peer] block to ct.prod's # /etc/wireguard/wg1.conf, then `systemctl start wg-quick@wg1` # (phase-b-mesh-join.sh automates this). Then set mesh-hosts.json ct.prod # wg_pubkey + public (= the reserved IP) and re-render (net sync). # 3. Make ct.prod a trusted source on the managed PG cluster (DO console), then # create the prospector DB + role ONCE (secret-bearing; not in terraform): doctl databases db create lilith-store-pg prospector doctl databases user create lilith-store-pg prospector # prints the password # as doadmin on the prospector DB: # ALTER DATABASE prospector OWNER TO prospector; # GRANT ALL ON SCHEMA public TO prospector; ALTER SCHEMA public OWNER TO prospector; # 4. DNS for apps.ftw.pw — CNAME into the DO-managed uvlava zone (NOT a manual A, # NOT url-forwarding). The IP is IaC-owned so it is never hand-copied: # apps.ct.uvlava.com A <- already in dns.tf # (digitalocean_record.ct_apps, # gated by ct_prod_enabled) # apps.ftw.pw CNAME apps.ct.uvlava.com <- add ONCE at joker.com # CNAME (not url-forward) keeps the browser on apps.ftw.pw and lets Caddy issue # the LE cert for apps.ftw.pw. Leave the ftw.pw apex (-> vps-0 short-links) alone. # If ct.prod's IP ever moves, only terraform changes; the joker CNAME stays put. # 5. Ship the app (over the mesh; fills /opt/prospector/.env, runs migrations). cd ~/Code/@ct/@applications/prospector ./deploy/deploy-server.sh # SERVER_HOST defaults to 10.9.0.10 (mesh) # First run halts at the DB __SET_ME__ guard: fill PROSPECTOR_DB_* in # /opt/prospector/.env on ct.prod from step 3, then re-run deploy-server.sh. # 6. Install the Caddy edge on ct.prod (public TLS for apps.ftw.pw). scp deploy/edge/apps.ftw.pw.Caddyfile root@10.9.0.10:/etc/caddy/Caddyfile ssh root@10.9.0.10 'apt-get install -y caddy && systemctl restart caddy' # Verify: https://apps.ftw.pw/prospector/ loads; https://apps.ftw.pw/internal/inbound -> 403. ``` --- ## Legacy reference — the lime bootstrap (internal-only now) > The steps below were written for `lime` and remain accurate for the **DB + > env + systemd** mechanics, which are identical on ct.prod (the deploy script > does them). lime itself is now internal-only; the app + edge moved to ct.prod. Probed 2026-06-29: **`lime`** = lilith-store-backend, Ubuntu 24.04, public `209.38.51.98` · wg `10.9.0.5` · VPC `10.20.0.2`. Postgres **16 + pgbouncer** fronts the DO **Managed** cluster. **NestJS 11 needs Node 20+**. SSH alias `lime` (root, `~/.ssh/id_ed25519_1984`). > ⚠️ These steps `sudo`-write a SHARED prod host. They were blocked under auto mode (correctly). Run them in a non-auto session, or grant a `Bash(ssh ct.prod *)` permission rule, or run them yourself. ## 1. Node 20 on the droplet ```bash ssh lime 'curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash - && sudo apt-get install -y nodejs && node -v' ``` (mac-sync uses Bun, so a system Node bump is safe for it.) ## 2. Create the two DBs — on the DO **Managed** Postgres cluster **There is no local Postgres.** The droplet's pgbouncer (`:6432`) fronts a **DO Managed Postgres cluster**: `private-lilith-store-pg-do-user-28217120-0.l.db.ondigitalocean.com:25060` (holds the live `quinn` DB). So `people` + `prospector` are **new databases on that managed cluster** (additive — does NOT touch `quinn`): - **Via Terraform IaC** (the DO infra is Terraform-managed in `uvlava/terraform/do`). The DBs + dedicated users are already declared (`pg_databases` += people/prospector; `digitalocean_database_user.{people,prospector}`). Just apply: ```bash cd ~/Code/@projects/uvlava/terraform/do TF_VAR_do_token= terraform apply # additive: +2 dbs, +2 users, 0 destroy terraform output -raw people_db_password terraform output -raw prospector_db_password terraform output -raw pg_host # private cluster host for the .env ``` - Services connect **directly to the managed endpoint** over SSL (skip the shared pgbouncer to avoid touching live pooling): `*_DB_HOST=private-lilith-store-pg-...`, `*_DB_PORT=25060`, `*_DB_SSL=true`. (Optionally add `[databases]` entries to `/etc/pgbouncer/pgbouncer.ini` + reload to pool them, but that touches shared infra.) ## 3. Apply migrations ```bash # prospector for f in 0001_prospector 0002_drafts 0003_corrections; do ssh lime "sudo -u postgres psql -d prospector" < migrations/$f.sql ; done # people (from the cocottetech repo) ssh lime "sudo -u postgres psql -d people" < /migrations/0001_people.sql ``` ## 4. Ship the built code Build locally, rsync dist + manifests, install prod deps on the droplet: ```bash npm run build && npm run build -w @prospector/mcp-prospector rsync -az --delete dist package.json package-lock.json migrations lime:/opt/prospector/ ssh lime 'cd /opt/prospector && npm ci --omit=dev' # people-service likewise to /opt/people-service ``` ## 5. Env on the droplet (`/opt/prospector/.env`) ``` NODE_ENV=production PROSPECTOR_API_PORT=3210 PROSPECTOR_DB_HOST=private-lilith-store-pg-do-user-28217120-0.l.db.ondigitalocean.com PROSPECTOR_DB_PORT=25060 # DO managed cluster (direct, SSL) PROSPECTOR_DB_SSL=true PROSPECTOR_DB_NAME=prospector PROSPECTOR_DB_USER=prospector PROSPECTOR_DB_PASSWORD= PROSPECTOR_SERVICE_TOKEN= PEOPLE_BASE_URL=http://127.0.0.1:3061 PEOPLE_SERVICE_TOKEN= MACSYNC_BASE_URL=http://127.0.0.1:3201 # mac-sync runs on this same droplet MACSYNC_SERVICE_TOKEN= MACSYNC_DEVICE_ID= MRNUMBER_BASE_URL=https://my.transquinnftw.com MRNUMBER_SERVICE_TOKEN= ``` (people-service gets its own `/opt/people-service/.env` with `PEOPLE_DB_*` + `PEOPLE_SERVICE_TOKEN`.) ## 6. systemd units (`/etc/systemd/system/{prospector,people-service}.service`) ``` [Service] WorkingDirectory=/opt/prospector EnvironmentFile=/opt/prospector/.env ExecStart=/usr/bin/node dist/main.js Restart=always User=root [Install] WantedBy=multi-user.target ``` `sudo systemctl enable --now people-service prospector` → `curl localhost:3061/health`, `curl localhost:3210/health`. ## 7. Wire mac-sync → prospector webhook In the @mac-sync server (same droplet): on a new inbound, fire-and-forget `POST http://127.0.0.1:3210/internal/inbound` with `Authorization: Bearer $PROSPECTOR_SERVICE_TOKEN`, body `{handle, channel:'imessage', text, occurredAt, hasCallSignal?}`. Env-gated (`PROSPECTOR_WEBHOOK_URL`/token) so macsync runs standalone if unset. (Redo cleanly — the earlier agent left partial edits in @mac-sync.) ## 8. Point the dev UI at prod (over the mesh) `web/.env.local`: ``` PROSPECTOR_API_URL=http://10.9.0.5:3210 PROSPECTOR_SERVICE_TOKEN= ``` Restart `npm run dev -w @prospector/web`. The vite proxy injects the token; the panel now shows **real prod decisions**. ## Verify (go-live) `/health` both services → real inbound (or `prospector_submit_inbound`) → appears in `prospector/activity` → kill-switch flip persists → dev UI shows it over the mesh. ## Post-migration notes (2026-06-29 unification) - Run new migrations: for f in migrations/0006_bilingual.sql ; do ssh lime "sudo -u postgres psql -d prospector" < $f ; done - Bilingual now in prospect_drafts (original/translated/detected_lang); Triage/Detail/Reports use dual when present (data from macsync inbound + future classifier trans). - MCP (@packages/mcp-prospector) now exposes full tools (prospector_* + legacy mappings for cockpit parity): list, thread, draft, send, mr, pastebin, reports, markets, classify, submit, held, activity, etc. Use with PROSPECTOR_BASE_URL + TOKEN. Replaces LP mcp-prospector. - UI fused: Triage = designs/main-view + inbox-ops + LP Stream; Reports = 4 reports + engine subs (Experiments/Patterns/Actions); Queue = queued-tasks + owed/backfill; etc. PWA install in Control. - LP can now drop prospector (see MIGRATION-PLAN in session plan file for removal list + proxies during cutover). - Rebuild/redeploy mcp + app after changes.