Jeremy Meyer 9386d4a7b3 feat: implement core proxy server, crawler, and indexer modules
packages/shared:
- Zod v4 schemas for TopicConfig, ProxyConfig, CrawlJob, SearchQuery
- Config loader with defaults
- Utility functions (createId, formatBytes, normalizeUrl)

packages/core:
- WebProxyServer: HTTP forward proxy using http-proxy-3
- CacheStore: LRU-based in-memory + disk cache for proxied responses
- WarcWriter: WARC file archiving for all proxied content
- HTTPS CONNECT tunneling for SSL passthrough
- Admin API with /api/status, /api/cache/stats, /api/config

packages/indexer:
- TopicCrawler: Crawlee CheerioCrawler for topic-based web crawling
- ContentExtractor: @mozilla/readability + turndown for clean text/markdown
- SearchClient: MeiliSearch integration for full-text search
- CrawlScheduler: Interval-based crawl job scheduling

apps/proxy:
- Main entry point orchestrating all components
- Graceful shutdown handling
- Proxy-only mode when no topics configured

All packages type-check clean. Next.js build passes.

Co-Authored-By: UnicornDev <noreply@unicorndev.wtf>
2026-02-26 19:04:10 -08:00

24 lines
630 B
TypeScript

/**
* @file config
* @description Configuration loader
* @layer Shared
*/
import { readFileSync, existsSync } from "node:fs";
import { resolve } from "node:path";
import { ProxyConfigSchema, type ProxyConfig } from "./types/config.js";
export const defaultConfig: ProxyConfig = ProxyConfigSchema.parse({});
export function loadConfig(configPath?: string): ProxyConfig {
const path = configPath ?? resolve(process.cwd(), "webproxy.config.json");
if (!existsSync(path)) {
return defaultConfig;
}
const raw = readFileSync(path, "utf-8");
const json = JSON.parse(raw);
return ProxyConfigSchema.parse(json);
}