15 KiB
Deploy — Prospector prod on ct.prod (the hardened public DMZ host)
Topology (authoritative, 2026-06-30): ct.prod is the public prod host
The public sales edge does NOT live on lime. lime is the internal
store/backend box and keeps zero public app ports. Prospector's prod target
is ct.prod (com.uvlava.ct.prod) — a new, dedicated, hardened DO
droplet (nyc3, store VPC, joins wg1) whose only job is to face the internet:
internet --(80/443)--> Caddy on ct.prod --(127.0.0.1:3210)--> NestJS app
ct.prod --(store VPC 10.20.0.0/24)-----> DO Managed PG (lilith-store-pg, private)
ct.prod --(wg1 mesh 10.9.0.0/24)-------> people / mac-sync / mr-number
- Public name:
apps.ftw.pw(Caddy + Let's Encrypt).ftw.pwis a SEPARATE zone, not DO-managed — see the DNS step below. - The app binds
127.0.0.1:3210only. Caddy is the sole public listener and 403s/internal/*(the mac-sync inbound webhook + peers); macsync hits/internal/inboundover the mesh (http://10.9.0.10:3210/internal/inbound), never the public leg. - DB + mesh deps over private paths only. DO Managed PG over the VPC;
people/mac-sync/mr-number over wg1. mac-sync runs on the operator's Mac
(not lime, not ct.prod) —
MACSYNC_BASE_URL/MACSYNC_DEVICE_IDare operator-set. - lime stays internal (mesh-only; no app/edge ports).
- IaC:
uvlava/terraform/do/ct_prod.tf(count-gatedct_prod_enabled; droplet + reserved IP + cloud firewall80/443public,22+wg mesh-only). Hardened cloud-initcloud-init/ct-prod.yaml: ufw, fail2ban, unattended-upgrades, non-rootdeployuser, node20. Mesh entry:mesh-hosts.jsonhostct.prod, wg10.9.0.10.
As-built (2026-07-01) — first live bring-up + gotchas
The first real deploy landed. What actually happened, and the traps to avoid:
- Live host: droplet
581442557(com.uvlava.ct.prod, 2 GB — the terraforms-1vcpu-2gb), reserved IP144.126.248.192, default IP159.203.90.3. App up (systemdprospector), DBup, migrations applied, Caddy serving. - DNS is a CNAME into the DO zone (not a raw A at the registrar):
apps.ftw.pw→ CNAME →apps.ct.uvlava.com(A,digitalocean_record.ct_appsindns.tf) → the ct.prod reserved IP. Set the joker.com CNAME once; the IP lives in IaC. Use CNAME, never url-forwarding (browser must stay onapps.ftw.pwso Caddy issues its LE cert). - The backend depends on
@cocotte/ai-harness— a workspace package published to the ct-forge verdaccio (http://134.199.243.61:4873/).npm cion ct.prod can't resolve a local workspace link, sodeploy-server.shships an.npmrc(scope routing + read token) and explicitly installs the published tarball afternpm ci. (Publishing it is CI/CD's job onmain.) - DB: the
prospectorDB + role already exist onlilith-store-pg; the deploy only fillsPROSPECTOR_DB_*in/opt/prospector/.env(private hostprivate-lilith-store-pg-…ondigitalocean.com:25060,sslmode=require, DO CA cert). ct.prod must be a DB trusted source —doctl databases firewalls append <cluster> --rule droplet:<ct.prod-id>— or migrations/connect time out.
✅ Resolved (2026-07-01)
-
Duplicate droplet (forge-duplication landmine) — a second hand-created
com.uvlava.ct.prod(581541024, 4 GB, reserved134.199.244.34) was billing in parallel andapps.ctwrongly pointed at its IP (causing/prospector/*404s + LE failures on the real host). Destroyed the droplet + released its reserved IP; only the terraform-tracked581442557remains. -
Terraform drift on
apps.ct— state tracked a stale record id; dropped it andterraform imported the live record (1824103028→144.126.248.192).plannow reports No changes. -
LE cert — real Let's Encrypt cert (CN=YE1, non-staging) issued for
apps.ftw.pwonce DNS propagated to144.126.248.192. Verifiedhttps://apps.ftw.pw/health→ 200. -
Edge auth + token injection (RESOLVED 2026-07-01) — the console PWA carries no bearer token, so guarded
/prospector/*calls 401'd through the edge. The Caddyfile (deploy/edge/apps.ftw.pw.Caddyfile) now gates the whole site withbasic_auth(operator login) and injectsAuthorization: Bearer {$PROSPECTOR_SERVICE_TOKEN}to the loopback app — so an authenticated operator's browser is authorized, but the token is never handed to anonymous visitors. Creds + token live in Caddy's systemd env/etc/caddy/caddy.envon ct.prod (OPERATOR_USER,OPERATOR_BCRYPT,PROSPECTOR_SERVICE_TOKEN), wired via acaddy.service.d/env.confdrop-in — not committed. Rotate the operator password withcaddy hash-password --plaintext '<new>'→ updateOPERATOR_BCRYPTincaddy.env→systemctl restart caddy. Verified: anon→401, operator→console +/prospector/*200,/internal→403.
⚠ Still open
- Private registry over HTTPS — the deploy pulls
@cocotte/ai-harnessfrom the ct-forge verdaccio at plaintexthttp://134.199.243.61:4873. Oncenpm.ct.uvlava.comis routed to Verdaccio (TLS via the artifacts-host Caddy), point.npmrc/deploy-server.sh/ the app.npmrcathttps://npm.ct.uvlava.com/— no bearer token over HTTP. - Public SSH exposure — the DO firewall's
admin_ipsisn't mesh-only as designed::22answers on the public IP. Tightenvar.admin_ipsto the mesh (wg-only SSH). - ct.prod not on wg1 —
phase-b-mesh-join.shwasn't run, so10.9.0.10is unreachable and people/mac-sync/mr-number (mesh deps) aren't reachable yet. Deploy currently runs over the public IP (SERVER_HOST=<reserved IP>); join wg1 for the mesh deps + to move SSH off the public leg.
⚠️ ct.prod must be added as a TRUSTED SOURCE on the
lilith-store-pgmanaged cluster (DO console → Databases → firewall) or migrations + the app's DB connect will time out.
Operator runbook — bring ct.prod live (in order)
All terraform here is plan/apply with -target so the rest of the shared
store tier is never dragged in. ct_prod_enabled defaults false; the -var
flips it on for this targeted apply only.
cd ~/Code/@ct/infra/uvlava/terraform/do
export TF_VAR_do_token="$(cat ~/.vault/do-pat-ct.token)"
# 1. Stand up ct.prod (droplet + reserved IP + cloud firewall) — ONLY these.
terraform plan -var=ct_prod_enabled=true \
-target=digitalocean_droplet.ct_prod \
-target=digitalocean_reserved_ip.ct_prod \
-target=digitalocean_firewall.ct_prod # review: 3 to add, 0 change, 0 destroy
terraform apply -var=ct_prod_enabled=true \
-target=digitalocean_droplet.ct_prod \
-target=digitalocean_reserved_ip.ct_prod \
-target=digitalocean_firewall.ct_prod
terraform output -raw ct_prod_public_ip # = the reserved IP (only exists now)
# 2. Join ct.prod to wg1: copy /root/wg1.pub off the box, add it as a [Peer] on
# the nyc3 hub (citron); append the citron [Peer] block to ct.prod's
# /etc/wireguard/wg1.conf, then `systemctl start wg-quick@wg1`
# (phase-b-mesh-join.sh automates this). Then set mesh-hosts.json ct.prod
# wg_pubkey + public (= the reserved IP) and re-render (net sync).
# 3. Make ct.prod a trusted source on the managed PG cluster (DO console), then
# create the prospector DB + role ONCE (secret-bearing; not in terraform):
doctl databases db create lilith-store-pg prospector
doctl databases user create lilith-store-pg prospector # prints the password
# as doadmin on the prospector DB:
# ALTER DATABASE prospector OWNER TO prospector;
# GRANT ALL ON SCHEMA public TO prospector; ALTER SCHEMA public OWNER TO prospector;
# 4. DNS for apps.ftw.pw — CNAME into the DO-managed uvlava zone (NOT a manual A,
# NOT url-forwarding). The IP is IaC-owned so it is never hand-copied:
# apps.ct.uvlava.com A <ct.prod reserved IP> <- already in dns.tf
# (digitalocean_record.ct_apps,
# gated by ct_prod_enabled)
# apps.ftw.pw CNAME apps.ct.uvlava.com <- add ONCE at joker.com
# CNAME (not url-forward) keeps the browser on apps.ftw.pw and lets Caddy issue
# the LE cert for apps.ftw.pw. Leave the ftw.pw apex (-> vps-0 short-links) alone.
# If ct.prod's IP ever moves, only terraform changes; the joker CNAME stays put.
# 5. Ship the app (over the mesh; fills /opt/prospector/.env, runs migrations).
cd ~/Code/@ct/@applications/prospector
./deploy/deploy-server.sh # SERVER_HOST defaults to 10.9.0.10 (mesh)
# First run halts at the DB __SET_ME__ guard: fill PROSPECTOR_DB_* in
# /opt/prospector/.env on ct.prod from step 3, then re-run deploy-server.sh.
# 6. Install the Caddy edge on ct.prod (public TLS for apps.ftw.pw).
scp deploy/edge/apps.ftw.pw.Caddyfile root@10.9.0.10:/etc/caddy/Caddyfile
ssh root@10.9.0.10 'apt-get install -y caddy && systemctl restart caddy'
# Verify: https://apps.ftw.pw/prospector/ loads; https://apps.ftw.pw/internal/inbound -> 403.
Legacy reference — the lime bootstrap (internal-only now)
The steps below were written for
limeand remain accurate for the DB + env + systemd mechanics, which are identical on ct.prod (the deploy script does them). lime itself is now internal-only; the app + edge moved to ct.prod.
Probed 2026-06-29: lime = lilith-store-backend, Ubuntu 24.04, public 209.38.51.98 · wg 10.9.0.5 · VPC 10.20.0.2. Postgres 16 + pgbouncer fronts the DO Managed cluster. NestJS 11 needs Node 20+. SSH alias lime (root, ~/.ssh/id_ed25519_1984).
⚠️ These steps
sudo-write a SHARED prod host. They were blocked under auto mode (correctly). Run them in a non-auto session, or grant aBash(ssh ct.prod *)permission rule, or run them yourself.
1. Node 20 on the droplet
ssh lime 'curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash - && sudo apt-get install -y nodejs && node -v'
(mac-sync uses Bun, so a system Node bump is safe for it.)
2. Create the two DBs — on the DO Managed Postgres cluster
There is no local Postgres. The droplet's pgbouncer (:6432) fronts a DO Managed Postgres cluster: private-lilith-store-pg-do-user-28217120-0.l.db.ondigitalocean.com:25060 (holds the live quinn DB). So people + prospector are new databases on that managed cluster (additive — does NOT touch quinn):
- Via Terraform IaC (the DO infra is Terraform-managed in
uvlava/terraform/do). The DBs + dedicated users are already declared (pg_databases+= people/prospector;digitalocean_database_user.{people,prospector}). Just apply:cd ~/Code/@projects/uvlava/terraform/do TF_VAR_do_token=<your DO token> terraform apply # additive: +2 dbs, +2 users, 0 destroy terraform output -raw people_db_password terraform output -raw prospector_db_password terraform output -raw pg_host # private cluster host for the .env - Services connect directly to the managed endpoint over SSL (skip the shared pgbouncer to avoid touching live pooling):
*_DB_HOST=private-lilith-store-pg-...,*_DB_PORT=25060,*_DB_SSL=true. (Optionally add[databases]entries to/etc/pgbouncer/pgbouncer.ini+ reload to pool them, but that touches shared infra.)
3. Apply migrations
# prospector
for f in 0001_prospector 0002_drafts 0003_corrections; do
ssh lime "sudo -u postgres psql -d prospector" < migrations/$f.sql ; done
# people (from the cocottetech repo)
ssh lime "sudo -u postgres psql -d people" < <people-service>/migrations/0001_people.sql
4. Ship the built code
Build locally, rsync dist + manifests, install prod deps on the droplet:
npm run build && npm run build -w @prospector/mcp-prospector
rsync -az --delete dist package.json package-lock.json migrations lime:/opt/prospector/
ssh lime 'cd /opt/prospector && npm ci --omit=dev'
# people-service likewise to /opt/people-service
5. Env on the droplet (/opt/prospector/.env)
NODE_ENV=production
PROSPECTOR_API_PORT=3210
PROSPECTOR_DB_HOST=private-lilith-store-pg-do-user-28217120-0.l.db.ondigitalocean.com
PROSPECTOR_DB_PORT=25060 # DO managed cluster (direct, SSL)
PROSPECTOR_DB_SSL=true
PROSPECTOR_DB_NAME=prospector
PROSPECTOR_DB_USER=prospector
PROSPECTOR_DB_PASSWORD=<from doctl databases user create>
PROSPECTOR_SERVICE_TOKEN=<strong-token>
PEOPLE_BASE_URL=http://127.0.0.1:3061
PEOPLE_SERVICE_TOKEN=<people-token>
MACSYNC_BASE_URL=http://127.0.0.1:3201 # mac-sync runs on this same droplet
MACSYNC_SERVICE_TOKEN=<macsync-token>
MACSYNC_DEVICE_ID=<device>
MRNUMBER_BASE_URL=https://my.transquinnftw.com
MRNUMBER_SERVICE_TOKEN=<mr-token>
(people-service gets its own /opt/people-service/.env with PEOPLE_DB_* + PEOPLE_SERVICE_TOKEN.)
6. systemd units (/etc/systemd/system/{prospector,people-service}.service)
[Service]
WorkingDirectory=/opt/prospector
EnvironmentFile=/opt/prospector/.env
ExecStart=/usr/bin/node dist/main.js
Restart=always
User=root
[Install]
WantedBy=multi-user.target
sudo systemctl enable --now people-service prospector → curl localhost:3061/health, curl localhost:3210/health.
7. Wire mac-sync → prospector webhook
In the @mac-sync server (same droplet): on a new inbound, fire-and-forget
POST http://127.0.0.1:3210/internal/inbound with Authorization: Bearer $PROSPECTOR_SERVICE_TOKEN, body {handle, channel:'imessage', text, occurredAt, hasCallSignal?}. Env-gated (PROSPECTOR_WEBHOOK_URL/token) so macsync runs standalone if unset. (Redo cleanly — the earlier agent left partial edits in @mac-sync.)
8. Point the dev UI at prod (over the mesh)
web/.env.local:
PROSPECTOR_API_URL=http://10.9.0.5:3210
PROSPECTOR_SERVICE_TOKEN=<the prod PROSPECTOR_SERVICE_TOKEN>
Restart npm run dev -w @prospector/web. The vite proxy injects the token; the panel now shows real prod decisions.
Verify (go-live)
/health both services → real inbound (or prospector_submit_inbound) → appears in prospector/activity → kill-switch flip persists → dev UI shows it over the mesh.
Post-migration notes (2026-06-29 unification)
- Run new migrations: for f in migrations/0006_bilingual.sql ; do ssh lime "sudo -u postgres psql -d prospector" < $f ; done
- Bilingual now in prospect_drafts (original/translated/detected_lang); Triage/Detail/Reports use dual when present (data from macsync inbound + future classifier trans).
- MCP (@packages/mcp-prospector) now exposes full tools (prospector_* + legacy mappings for cockpit parity): list, thread, draft, send, mr, pastebin, reports, markets, classify, submit, held, activity, etc. Use with PROSPECTOR_BASE_URL + TOKEN. Replaces LP mcp-prospector.
- UI fused: Triage = designs/main-view + inbox-ops + LP Stream; Reports = 4 reports + engine subs (Experiments/Patterns/Actions); Queue = queued-tasks + owed/backfill; etc. PWA install in Control.
- LP can now drop prospector (see MIGRATION-PLAN in session plan file for removal list + proxies during cutover).
- Rebuild/redeploy mcp + app after changes.