|
|
||
|---|---|---|
| .. | ||
| classify-pipeline.test.ts | ||
| clustering.test.ts | ||
| e2e-expert-comparison.test.ts | ||
| e2e-expert-live.test.ts | ||
| e2e-heuristic.test.ts | ||
| e2e-llm-live.test.ts | ||
| e2e-llm-recorded.test.ts | ||
| full-crawl.test.ts | ||
| imessage-outreach.test.ts | ||
| outreach-pipeline.test.ts | ||
| README.md | ||
Nightcrawler Integration Tests
End-to-end tests for Phase 8 integration. These tests validate the complete crawl workflow from discovery through deduplication to outreach.
Status
⚠️ Tests are structured but cannot run yet - Waiting for @lilith/* package dependencies to be published to npm registry.
Once packages are available:
bun install
bun run test tests/integration/
Test Scenarios
Scenario 1: Basic Crawl - Tryst LA (2 pages)
Tests single-platform, single-city crawl workflow:
- Discover listings from listing pages
- Scrape full profile for each provider
- Save to database
- Compute photo hashes
Expected outcome: 2 providers saved with complete profiles and photo hashes.
Scenario 2: Cross-Platform Deduplication
Tests deduplication engine across platforms:
- Same provider on Tryst + Eros (should match)
- Different providers with similar names (should NOT match)
- Merge contact info from multiple platforms
Expected outcome: Single provider record with data from both platforms, high confidence match (>0.85).
Dedup signals tested:
- Photo hash matching (weight: 0.90)
- Social handle matching (weight: 0.80)
- Email matching (weight: 0.95)
- Phone matching (weight: 0.85)
- Name+city similarity (weight: 0.40)
Scenario 3: Blocklist Enforcement
Tests blocklist filtering:
- Skip providers with blocklisted email
- Skip providers with blocklisted phone
- Allow providers with clean records
Expected outcome: Blocklisted providers skipped, clean providers processed.
Scenario 4: Multi-City Crawl (LA + SF)
Tests crawling multiple cities:
- Crawl Los Angeles
- Crawl San Francisco
- Handle providers who tour between cities (no duplicates)
Expected outcome: Providers from both cities saved, touring providers have single record with touring status updated.
Scenario 5: Contact Reveal
Tests contact information extraction:
- Reveal email after button click
- Reveal phone after button click
- Handle ALTCHA captcha challenges
Expected outcome: Contact info successfully extracted and saved (encrypted).
Scenario 6: CLI Integration
Tests command-line interface:
- Run full crawl via CLI
- Export results to CSV
- Display statistics
Expected outcome: CLI commands execute successfully, CSV export contains all providers.
Scenario 7: Error Handling
Tests resilience and error recovery:
- Retry failed requests with exponential backoff
- Circuit breaker opens after 5 failures
- Errors logged to crawl session
Expected outcome: Transient failures recovered, persistent failures trigger circuit breaker.
Test Data
Realistic test data in tests/fixtures/realistic-data.ts:
Providers:
- Sophia Rose - Upscale Tryst provider ($600/hr, verified, 4 photos)
- Emma Divine - Elite touring provider ($800/hr, premium, tours SF)
- Victoria Lane - Experienced Eros provider ($400/hr, verified)
- Luna Torres - Trans provider on TransEscorts ($350/hr)
- Isabella Cruz - Duplicate across Tryst + Eros (dedup test case)
Contact Info:
- Email examples (proton.me, custom domains, gmail, yahoo)
- Phone examples (LA area codes: 424, 310, 323, 213)
Blocklist:
- Known scammer email
- Fake disconnected phone
- Stock photo provider name
Running Tests (Once Dependencies Available)
# Run all integration tests
bun run test tests/integration/
# Run specific scenario
bun run test tests/integration/full-crawl.test.ts -t "Scenario 2"
# Run with verbose output
bun run test tests/integration/ --reporter=verbose
# Generate coverage report
bun run test tests/integration/ --coverage
Test Structure
Each scenario follows the Given-When-Then pattern:
it('should match same provider across platforms', async () => {
// Given: Same provider on two platforms
const trystProfile = DUPLICATE_PROVIDER_CASE.tryst;
const erosProfile = DUPLICATE_PROVIDER_CASE.eros;
// When: Dedup engine analyzes profiles
const dedup = new DedupEngine(dataSource);
const result = await dedup.checkDuplicate(erosProfile, 'eros');
// Then: Should match with high confidence
expect(result.isMatch).toBe(true);
expect(result.confidence).toBeGreaterThan(0.85);
});
Dependencies
These tests require all phases to be complete:
- ✅ Phase 1: Foundation (types, config)
- ✅ Phase 2: Database (entities, migrations)
- ✅ Phase 3: Selector loader
- ✅ Phase 4: Browser infrastructure
- 🔄 Phase 5: Platform adapters (in progress)
- 🔄 Phase 6: Pipeline (photo hash, dedup, blocklist)
- ⏳ Phase 7: CLI commands
- ⏳ Phase 8: Integration & entry point
Test Database
Integration tests use an in-memory SQLite database for speed:
const dataSource = new DataSource({
type: 'sqlite',
database: ':memory:',
entities: [/* all entities */],
synchronize: true,
});
No external PostgreSQL required. Database is created fresh for each test run.
Mock Browser
Playwright browser is mocked for non-network tests:
const page = createMockPage({
$$eval: vi.fn().mockResolvedValue(mockListings),
});
For actual browser automation tests, use headless Chromium.
CI Integration
When packages are published, add to CI pipeline:
# .github/workflows/test.yml or .forgejo/workflows/test.yml
- name: Run Integration Tests
run: |
cd codebase/tools/nightcrawler
bun install
bun run test tests/integration/ --coverage
Troubleshooting
Issue: Cannot find module '@lilith/yaml-loader'
Solution: Packages not published yet. Wait for platform-wide package publishing.
Issue: Module not found: 'sharp'
Solution: bun install to install native dependencies.
Issue: Database connection failed
Solution: Integration tests use in-memory SQLite, no external DB needed.
Next Steps
Once packages are published:
- Run
bun installin nightcrawler directory - Execute integration tests:
bun run test tests/integration/ - Verify all 7 scenarios pass
- Generate coverage report
- Add to CI pipeline
See Also
- Unit test infrastructure:
tests/setup.ts - Realistic test data:
tests/fixtures/realistic-data.ts - Phase 8 implementation:
docs/milestone-1-implementation-todo.md