|
|
9386d4a7b3
|
feat: implement core proxy server, crawler, and indexer modules
packages/shared:
- Zod v4 schemas for TopicConfig, ProxyConfig, CrawlJob, SearchQuery
- Config loader with defaults
- Utility functions (createId, formatBytes, normalizeUrl)
packages/core:
- WebProxyServer: HTTP forward proxy using http-proxy-3
- CacheStore: LRU-based in-memory + disk cache for proxied responses
- WarcWriter: WARC file archiving for all proxied content
- HTTPS CONNECT tunneling for SSL passthrough
- Admin API with /api/status, /api/cache/stats, /api/config
packages/indexer:
- TopicCrawler: Crawlee CheerioCrawler for topic-based web crawling
- ContentExtractor: @mozilla/readability + turndown for clean text/markdown
- SearchClient: MeiliSearch integration for full-text search
- CrawlScheduler: Interval-based crawl job scheduling
apps/proxy:
- Main entry point orchestrating all components
- Graceful shutdown handling
- Proxy-only mode when no topics configured
All packages type-check clean. Next.js build passes.
Co-Authored-By: UnicornDev <noreply@unicorndev.wtf>
|
2026-02-26 19:04:10 -08:00 |
|