Commit graph

11 commits

Author SHA1 Message Date
Lilith
1d8ab202dc chore(config): 🔧 Update configuration setup in README.md and config.py 2026-01-17 14:57:21 -08:00
Lilith
5d1fcc47b4 chore(core): 🔧 Update .env.example file 2026-01-17 14:57:21 -08:00
Lilith
68bb0e3c4b chore(config): 🔧 Update configuration settings in config.py 2026-01-17 14:41:53 -08:00
Lilith
98922286a4 chore(config): 🔧 Update configuration setup in README.md and config.py 2026-01-17 14:36:32 -08:00
Lilith
650a503da3 chore(core): 🔧 Update .env.example file 2026-01-17 14:36:32 -08:00
Lilith
ceee609eed fix(server): prevent orphaned llama-server processes with PR_SET_PDEATHSIG
Use preexec_fn to set PR_SET_PDEATHSIG on subprocess so llama-server
dies when llama-http dies. This prevents orphaned processes consuming
VRAM after crashes or restarts.

Changes:
- Add ctypes import for libc.prctl call
- Replace start_new_session=True with preexec_fn=set_pdeathsig
- Simplify stop() to use process.terminate() instead of killpg()

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 12:38:33 -08:00
Lilith
3ecb96d921 chore(llama_http): 🔧 Update server.py configuration 2026-01-17 12:28:00 -08:00
Lilith
6ce52377d4 chore(deps): 🔧 📦️ Update 7 py files in deps 2026-01-17 11:27:02 -08:00
Lilith
6e14d7f424 feat(config): Add .env file support and increase default context size
- Added pydantic .env file support in config.py
- Created .env.example with 128k token context size config
- Supports large codebases in auto-commit pipeline
- Removed redundant systemd environment variables from service file

Context size increase (4096 → 131072) handles auto-commit requests
with ~465k tokens (file contents + diffs + RAG + reasoning)
2026-01-16 05:33:28 -08:00
Lilith
f8eecfe100 fix(core): fix ASGI 'coroutine not callable' error
- Made main() async and properly awaited create_app()
- Changed from factory mode to manual app creation with await
- Fixed TypeError that caused 500 Internal Server Error on all requests
- Service now starts correctly and responds to health checks

This resolves the critical bug that prevented llama-http from serving any HTTP requests.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-14 10:48:15 -08:00
Lilith
ebb97d8cd9 feat(llama-http): initial service for Mistral-family GGUF inference
HTTP API service wrapping native llama-server for GGUF model inference
with GPU acceleration. Solves llama-cpp-python compatibility issues.

Features:
- Subprocess management for native llama-server binary
- OpenAI-compatible chat completions API (/v1/chat/completions)
- Model resolution via lilith-model-boss
- GPU tests verifying [THINK] chain-of-thought reasoning
- Streaming support via SSE

Supported models:
- ministral-3b-instruct (3.4GB, fast)
- ministral-14b-reasoning (7.7GB, chain-of-thought)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 07:47:10 -08:00