llama-http

Author	SHA1	Message	Date
Lilith	1d8ab202dc	chore(config): 🔧 Update configuration setup in README.md and config.py	2026-01-17 14:57:21 -08:00
Lilith	5d1fcc47b4	chore(core): 🔧 Update .env.example file	2026-01-17 14:57:21 -08:00
Lilith	68bb0e3c4b	chore(config): 🔧 Update configuration settings in config.py	2026-01-17 14:41:53 -08:00
Lilith	98922286a4	chore(config): 🔧 Update configuration setup in README.md and config.py	2026-01-17 14:36:32 -08:00
Lilith	650a503da3	chore(core): 🔧 Update .env.example file	2026-01-17 14:36:32 -08:00
Lilith	ceee609eed	fix(server): prevent orphaned llama-server processes with PR_SET_PDEATHSIG Use preexec_fn to set PR_SET_PDEATHSIG on subprocess so llama-server dies when llama-http dies. This prevents orphaned processes consuming VRAM after crashes or restarts. Changes: - Add ctypes import for libc.prctl call - Replace start_new_session=True with preexec_fn=set_pdeathsig - Simplify stop() to use process.terminate() instead of killpg() Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-17 12:38:33 -08:00
Lilith	3ecb96d921	chore(llama_http): 🔧 Update server.py configuration	2026-01-17 12:28:00 -08:00
Lilith	6ce52377d4	chore(deps): 🔧 📦️ Update 7 py files in deps	2026-01-17 11:27:02 -08:00
Lilith	6e14d7f424	feat(config): Add .env file support and increase default context size - Added pydantic .env file support in config.py - Created .env.example with 128k token context size config - Supports large codebases in auto-commit pipeline - Removed redundant systemd environment variables from service file Context size increase (4096 → 131072) handles auto-commit requests with ~465k tokens (file contents + diffs + RAG + reasoning)	2026-01-16 05:33:28 -08:00
Lilith	f8eecfe100	fix(core): fix ASGI 'coroutine not callable' error - Made main() async and properly awaited create_app() - Changed from factory mode to manual app creation with await - Fixed TypeError that caused 500 Internal Server Error on all requests - Service now starts correctly and responds to health checks This resolves the critical bug that prevented llama-http from serving any HTTP requests. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-14 10:48:15 -08:00
Lilith	ebb97d8cd9	feat(llama-http): initial service for Mistral-family GGUF inference HTTP API service wrapping native llama-server for GGUF model inference with GPU acceleration. Solves llama-cpp-python compatibility issues. Features: - Subprocess management for native llama-server binary - OpenAI-compatible chat completions API (/v1/chat/completions) - Model resolution via lilith-model-boss - GPU tests verifying [THINK] chain-of-thought reasoning - Streaming support via SSE Supported models: - ministral-3b-instruct (3.4GB, fast) - ministral-14b-reasoning (7.7GB, chain-of-thought) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-10 07:47:10 -08:00

11 commits