prospector/@packages/ai-harness
Natalie 77244e5a7b
Some checks failed
CI/CD / verify (push) Failing after 3m58s
CI/CD / deploy (push) Has been skipped
refactor(ai-harness): @prospector/ai-harness → @cocotte/ai-harness (publishable)
Rename the shared inference lib to the @cocotte scope + drop private so it
publishes to Verdaccio; prospector now depends on it as a real registry package
(resolves on a clean standalone deploy, e.g. ct.prod). engine stays a relative
../engine import (it was never an aliased package — the @prospector/engine ref
was only a docstring). No tsc-alias needed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-01 06:53:44 -04:00
..
src refactor(ai-harness): @prospector/ai-harness → @cocotte/ai-harness (publishable) 2026-07-01 06:53:44 -04:00
package.json refactor(ai-harness): @prospector/ai-harness → @cocotte/ai-harness (publishable) 2026-07-01 06:53:44 -04:00
README.md feat(ai-harness): self-hosted vLLM inference layer package 2026-06-30 11:24:08 -04:00
tsconfig.json feat(ai-harness): self-hosted vLLM inference layer package 2026-06-30 11:24:08 -04:00

@prospector/ai-harness — self-hosted inference layer

Prospector's inference layer: a direct vLLM client + a typed task registry

  • the Quinn-voice prompt builders. It talks to the on-demand DO GPU droplet's vLLM (OpenAI-compatible /v1/chat/completions) directly over GPU_INFERENCE_URL — there is no model-boss coordinator (retired). Inference is self-hosted only — never managed/Claude in the runtime loop.

Framework-agnostic: no NestJS, no process.env, no I/O beyond fetch. The backend constructs VllmClient from ConfigService and wires it as a Nest provider (src/gpu/gpu.module.ts); the GPU droplet lifecycle (provision / teardown / idle-shutdown) stays in the backend's src/gpu module.

Files (one concern each — SRP)

File Concern
vllm-client.ts VllmClient.chatJson (strict-JSON completion) + health, instance circuit breaker. Co-located test.
task-registry.ts TASK_REGISTRY — typed prospect.classify / prospect.draft / prospect.judge (placeholder for the alignment gate) + per-task priority / timeout / schema name.
prompts.ts Pure prompt builders: buildClassifyPrompt (22-atom), buildDraftPrompt (Quinn voice verbatim) + DRAFT_SCHEMA / parseDraft.
types.ts ChatJsonOpts, ChatPriority, InferenceHealth.
index.ts Public API manifest.

Contract

VllmClient is fail-soft by construction: with url absent isEnabled() is false and callers skip to their fast/pastebin fallbacks; any transport/parse error throws VllmError, which the enrich path catches and turns into null (the classify/draft contract is unchanged from the model-boss era). An instance circuit breaker trips after 3 consecutive connection failures and fails fast for 60s.

import { VllmClient, TASK_REGISTRY, buildDraftPrompt, DRAFT_SCHEMA, parseDraft } from '@prospector/ai-harness';

const vllm = new VllmClient({ url: process.env.GPU_INFERENCE_URL ?? null, model: process.env.GPU_LLM_MODEL ?? null });
const t = TASK_REGISTRY['prospect.draft'];
const { system, user } = buildDraftPrompt(handle, archetype, inbound);
const out = await vllm.chatJson({
  systemPrompt: system,
  messages: [{ role: 'user', content: user }],
  model: '',
  task: t.key,
  priority: t.priority,
  schema: DRAFT_SCHEMA,
  schemaName: t.schemaName,
  parse: parseDraft,
  timeoutMs: t.timeoutMs,
});

Build

npm run build --workspace @prospector/ai-harness emits dist/ + .d.ts (the backend resolves the package from there). npm run typecheck / npm test for the co-located checks.

Config (read at the backend wiring layer, not here)

GPU_INFERENCE_URL (the vLLM base URL; replaces the retired MODEL_BOSS_URL), GPU_LLM_MODEL (the served model name fallback).

Promotable

Today prospector-local; promotable to a shared @ct package@applications/onlyfans carries a parallel inference-layer of the same shape, so the direct-vLLM client + task registry + prompts want to live once and be consumed by both. Keep it framework-agnostic to make that move cheap.