No description
Find a file
Lilith 1629647bff
Some checks failed
Build and Publish / build-and-publish (push) Failing after 46s
deps-upgrade(deps): ⬆️ Update dependencies to latest patch/minor versions for security fixes and compatibility improvements
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-03-08 19:35:12 -07:00
.forgejo/workflows chore(shared): 🔧 **Step 1 2026-01-15 06:56:11 -08:00
src chore(redis): 🔧 Update Redis schema definitions to support nested JSON structures and new data types 2026-02-17 14:59:28 -08:00
.gitignore feat: initial directory-semantic package implementation 2025-12-29 05:31:14 -08:00
eslint.config.js chore(@ml/directory-semantic): 🛠 update TypeScript root directory in ESLint config 2026-01-04 20:45:36 -08:00
package.json deps-upgrade(deps): ⬆️ Update dependencies to latest patch/minor versions for security fixes and compatibility improvements 2026-03-08 19:35:12 -07:00
README.md chore: trigger CI publish 2026-01-30 15:48:45 -08:00
tsconfig.json chore(shared): 🔧 Update shared configuration files and scripts 2026-01-16 20:49:44 -08:00
tsup.config.ts chore(build): 🔧 Update ESM output configuration in tsup.config.ts 2026-01-21 15:27:35 -08:00
vitest.config.ts feat: initial directory-semantic package implementation 2025-12-29 05:31:14 -08:00

@transquinnftw/ml-directory-semantic

Directory semantic understanding service using local ML embeddings and Redis vector search.

Features

  • Scan and analyze directory structure
  • Extract content from supported file types (TypeScript, JavaScript, Python, Go, Rust)
  • Generate embeddings using local LlamaCpp models (nomic-embed-text-v1.5)
  • Store vectors in Redis with HNSW indexing
  • Semantic search across directory content
  • Find similar files

Requirements

Model File

This package requires a local GGUF embedding model. Default configuration expects:

/var/mnt/bigdisk/_/models/embeddings/nomic-ai/nomic-embed-text-v1.5.Q8_0.gguf

The model produces 768-dimensional embeddings optimized for semantic similarity.

Redis with RediSearch

Redis 7+ with the RediSearch module enabled for vector indexing.

# Docker example
docker run -p 6379:6379 redis/redis-stack:latest

Installation

pnpm add @transquinnftw/ml-directory-semantic

Usage

import Redis from 'ioredis';
import { createDirectorySemanticService } from '@transquinnftw/ml-directory-semantic';

const redis = new Redis();

// Create service
const service = createDirectorySemanticService(redis, {
  embeddingDimensions: 768, // nomic-embed-text-v1.5
  gpuLayers: 999,           // All layers on GPU (0 for CPU-only)
});

// Initialize - fails fast if model unavailable
await service.initialize();

// Index a directory
const result = await service.index('/path/to/project');
console.log(`Indexed ${result.filesIndexed} files, ${result.chunksCreated} chunks`);

// Search for content
const results = await service.search('/path/to/project', 'authentication handler');
for (const r of results) {
  console.log(`${r.path} (score: ${r.score.toFixed(3)})`);
}

// Find similar files
const similar = await service.findSimilar('/path/to/project', 'src/auth.ts');

Configuration

interface DirectorySemanticConfig {
  /** Embedding dimensions (default: 768 for nomic-embed-text-v1.5) */
  embeddingDimensions?: number;

  /** Batch size for embedding operations (default: 10) */
  batchSize?: number;

  /** Path to GGUF embedding model (uses default if not specified) */
  modelPath?: string;

  /** GPU layers - 999 for all on GPU, 0 for CPU only (default: 999) */
  gpuLayers?: number;

  /** Enable verbose logging (default: false) */
  verbose?: boolean;
}

Embedding Provider

Uses @transquinnftw/ml-llamacpp for local embedding generation:

  • Default model: nomic-embed-text-v1.5 (768 dimensions)
  • Alternative: all-MiniLM-L6-v2 (384 dimensions, faster)
  • Backend: node-llama-cpp with CUDA acceleration
import { createLlamaCppEmbeddingProvider } from '@transquinnftw/ml-directory-semantic';

const embedder = createLlamaCppEmbeddingProvider({
  dimensions: 768,
  gpuLayers: 999,
});

await embedder.initialize(); // Throws if model unavailable

const embedding = await embedder.embed('Hello world');
console.log(`Embedding: ${embedding.length} dimensions`);

API Reference

DirectorySemanticService

Main service class for directory indexing and search.

Method Description
initialize() Initialize embedding provider (required, fail-fast)
index(dirPath, options?) Index directory content
search(dirPath, query, options?) Semantic search
findSimilar(dirPath, filePath, limit?) Find similar files
getSummary(dirPath) Get directory index summary
delete(dirPath) Remove directory from index

LlamaCppEmbeddingProvider

Local embedding generation using GGUF models.

Method Description
initialize() Load model (required before embed)
embed(text) Generate single embedding
embedBatch(texts) Generate batch embeddings
isAvailable() Check if model file exists
dispose() Release resources

Error Handling

The service uses fail-fast initialization:

try {
  await service.initialize();
} catch (error) {
  // Model file not found or failed to load
  console.error('Embedding model unavailable:', error.message);
  process.exit(1);
}

Vector Storage

Uses Redis with RediSearch for efficient vector similarity search:

  • Index type: HNSW (Hierarchical Navigable Small World)
  • Distance metric: Cosine similarity
  • Default config: M=16, efConstruction=200, efRuntime=10

License

MIT