webproxy/specs/webproxy.md
Jeremy Meyer 17bba2d040 feat: initial monorepo setup with Next.js landing page
- pnpm workspaces monorepo with apps/ and packages/
- Next.js 16 landing page (apps/web) with dark theme, feature overview
- Package stubs: @webproxy/core, @webproxy/indexer, @webproxy/shared
- Proxy server placeholder (apps/proxy)
- Project spec, architecture docs, and deployment guide
- Gitea remote configured at 185.191.239.154:3000

Co-Authored-By: UnicornDev <noreply@unicorndev.wtf>
2026-02-26 18:24:28 -08:00

85 lines
3.4 KiB
Markdown

# WebProxy - Local Internet Indexing Layer
## Overview
WebProxy is a self-hosted program that runs on a local device and acts as a web internet indexing layer. It crawls, caches, and indexes web content for any topics of interest, then serves that cached content to other devices on the local network as if it were the live internet.
## Problem Statement
- Internet access can be slow, metered, unreliable, or censored
- Multiple devices on a network redundantly fetch the same content
- No local control over what content is available or prioritized
- Search results depend on external providers with their own agendas
## Solution
A local proxy/indexer that:
1. **Crawls & Indexes** - Fetches web pages, search results, and content based on configured topics of interest
2. **Caches Locally** - Stores all fetched content in a local database/filesystem
3. **Serves to Network** - Acts as a proxy/DNS for other devices, serving cached content as if it were live internet
4. **Stays Fresh** - Periodically re-crawls to keep content updated based on configurable schedules
## Architecture
### Monorepo Structure
```
webproxy/
├── apps/
│ ├── web/ # Next.js landing page & admin dashboard
│ └── proxy/ # Core proxy server (serves content to network devices)
├── packages/
│ ├── core/ # Proxy engine & HTTP handling
│ ├── indexer/ # Web crawling & indexing engine
│ └── shared/ # Shared types, utils, config schemas
├── docs/ # Deployment & usage documentation
└── deploy/ # Deployment scripts & configs
```
### Key Components
1. **Proxy Server** (`apps/proxy`) - HTTP/HTTPS proxy that intercepts requests, checks local cache, serves cached content or fetches fresh
2. **Indexer** (`packages/indexer`) - Crawls configured topics, indexes content, stores in local DB
3. **Web Dashboard** (`apps/web`) - Next.js app for configuration, monitoring, topic management
4. **Core Engine** (`packages/core`) - Shared proxy logic, caching strategies, content transformation
### Data Flow
```
[Network Device] → [WebProxy Proxy Server] → [Local Cache]
↓ (cache miss)
[Live Internet]
[Cache & Index]
[Serve to Device]
```
## User Stories
- As a network admin, I want to configure topics of interest so the proxy pre-fetches relevant content
- As a device user, I want to browse the web through the proxy and get fast cached responses
- As a network admin, I want to see what content is cached and manage storage
- As a device user, I want search results that include locally cached content
## Acceptance Criteria
### Landing Page (Phase 1 - Current)
- [x] Monorepo structure with pnpm workspaces
- [ ] Next.js landing page explaining the product
- [ ] Documentation for deployment
- [ ] Gitea remote configured for CI/CD
- [ ] Clean, professional landing page with feature overview
### Core Proxy (Phase 2 - Future)
- [ ] HTTP proxy server that intercepts requests
- [ ] Local content cache with configurable storage
- [ ] Topic-based crawling configuration
- [ ] Web dashboard for management
## Deployment
- Self-hosted via Gitea on `185.191.239.154`
- Git-based deployment workflow
- Docker support planned for Phase 2