History

Natalie 77244e5a7b Some checks failed CI/CD / verify (push) Failing after 3m58s Details CI/CD / deploy (push) Has been skipped Details refactor(ai-harness): @prospector/ai-harness → @cocotte/ai-harness (publishable) Rename the shared inference lib to the @cocotte scope + drop private so it publishes to Verdaccio; prospector now depends on it as a real registry package (resolves on a clean standalone deploy, e.g. ct.prod). engine stays a relative ../engine import (it was never an aliased package — the @prospector/engine ref was only a docstring). No tsc-alias needed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>		2026-07-01 06:53:44 -04:00
..
src	refactor(ai-harness): @prospector/ai-harness → @cocotte/ai-harness (publishable)	2026-07-01 06:53:44 -04:00
package.json	refactor(ai-harness): @prospector/ai-harness → @cocotte/ai-harness (publishable)	2026-07-01 06:53:44 -04:00
README.md	feat(ai-harness): self-hosted vLLM inference layer package	2026-06-30 11:24:08 -04:00
tsconfig.json	feat(ai-harness): self-hosted vLLM inference layer package	2026-06-30 11:24:08 -04:00

README.md

@prospector/ai-harness — self-hosted inference layer

Prospector's inference layer: a direct vLLM client + a typed task registry

the Quinn-voice prompt builders. It talks to the on-demand DO GPU droplet's vLLM (OpenAI-compatible /v1/chat/completions) directly over GPU_INFERENCE_URL — there is no model-boss coordinator (retired). Inference is self-hosted only — never managed/Claude in the runtime loop.

Framework-agnostic: no NestJS, no process.env, no I/O beyond fetch. The backend constructs VllmClient from ConfigService and wires it as a Nest provider (src/gpu/gpu.module.ts); the GPU droplet lifecycle (provision / teardown / idle-shutdown) stays in the backend's src/gpu module.

Files (one concern each — SRP)

File	Concern
`vllm-client.ts`	`VllmClient.chatJson` (strict-JSON completion) + `health`, instance circuit breaker. Co-located test.
`task-registry.ts`	`TASK_REGISTRY` — typed `prospect.classify` / `prospect.draft` / `prospect.judge` (placeholder for the alignment gate) + per-task priority / timeout / schema name.
`prompts.ts`	Pure prompt builders: `buildClassifyPrompt` (22-atom), `buildDraftPrompt` (Quinn voice verbatim) + `DRAFT_SCHEMA` / `parseDraft`.
`types.ts`	`ChatJsonOpts`, `ChatPriority`, `InferenceHealth`.
`index.ts`	Public API manifest.

Contract

VllmClient is fail-soft by construction: with url absent isEnabled() is false and callers skip to their fast/pastebin fallbacks; any transport/parse error throws VllmError, which the enrich path catches and turns into null (the classify/draft contract is unchanged from the model-boss era). An instance circuit breaker trips after 3 consecutive connection failures and fails fast for 60s.

import { VllmClient, TASK_REGISTRY, buildDraftPrompt, DRAFT_SCHEMA, parseDraft } from '@prospector/ai-harness';

const vllm = new VllmClient({ url: process.env.GPU_INFERENCE_URL ?? null, model: process.env.GPU_LLM_MODEL ?? null });
const t = TASK_REGISTRY['prospect.draft'];
const { system, user } = buildDraftPrompt(handle, archetype, inbound);
const out = await vllm.chatJson({
  systemPrompt: system,
  messages: [{ role: 'user', content: user }],
  model: '',
  task: t.key,
  priority: t.priority,
  schema: DRAFT_SCHEMA,
  schemaName: t.schemaName,
  parse: parseDraft,
  timeoutMs: t.timeoutMs,
});

Build

npm run build --workspace @prospector/ai-harness emits dist/ + .d.ts (the backend resolves the package from there). npm run typecheck / npm test for the co-located checks.

Config (read at the backend wiring layer, not here)

GPU_INFERENCE_URL (the vLLM base URL; replaces the retired MODEL_BOSS_URL), GPU_LLM_MODEL (the served model name fallback).

Promotable

Today prospector-local; promotable to a shared @ct package — @applications/onlyfans carries a parallel inference-layer of the same shape, so the direct-vLLM client + task registry + prompts want to live once and be consumed by both. Keep it framework-agnostic to make that move cheap.