- Rewrite README.md with comprehensive architecture overview, component descriptions, and deployment instructions - Update controller/README.md to focus on Docker deployment with pre-built images from Gitea registry - Simplify epistine-webserver/README.md by removing development setup and focusing on Docker Compose deployment - Add detailed system diagrams, task type tables, and configuration examples - Remove redundant development workflow instructions in favor of containerized deployment
9.3 KiB
EpistineFiles
EpistineFiles is a distributed document processing system designed to manage, analyze, and extract insights from heterogeneous document collections. It provides a scalable architecture where a central controller orchestrates specialized agents that perform various document processing tasks, with a modern webserver offering a unified management interface.
Architecture Overview
The system follows a modular, service‑oriented design:
┌─────────────────────────────────────────────────────────────┐
│ EpistineFiles Webserver │
│ (FastAPI + React + Nginx) – User interface & API gateway │
└──────────────────────────────┬──────────────────────────────┘
│ HTTP/WebSocket
┌──────────────────────────────▼──────────────────────────────┐
│ Controller (FastAPI) │
│ – Task orchestration & scheduling │
│ – Agent registration & health monitoring │
│ – Database persistence (PostgreSQL) │
│ – Centralized logging & API key management │
└──────────────────────────────┬──────────────────────────────┘
│ REST API
┌──────────────────────────────▼──────────────────────────────┐
│ Agents (Specialized Workers) │
│ – OCR processing │
│ – PDF text extraction │
│ – Video content analysis │
│ – Metadata enrichment │
│ – Document scoring & summarization │
└─────────────────────────────────────────────────────────────┘
Core Components
1. Controller (controller/)
The central brain of the system. It exposes a REST API for managing agents, tasks, and system configuration, and maintains a PostgreSQL database for persistent state.
Key Features:
- FastAPI‑based REST API with automatic OpenAPI documentation
- Role‑based API key authentication with fine‑grained scopes
- PostgreSQL integration with connection pooling and schema management
- Configurable logging system with file rotation
- CORS middleware supporting cross‑origin requests from the webserver
Deployment: Available as a pre‑built Docker image from the Gitea container registry:
gitea.kareemhorstink.me/imrayya/epistinefiles/controller:latest
2. Webserver (epistine-webserver/)
A modern web interface built with FastAPI (backend) and React (frontend), providing a unified dashboard for monitoring agents, tasks, and logs.
Key Features:
- Real‑time dashboard with WebSocket updates
- Agent and task management interfaces
- Interactive log viewer with filtering
- Settings configuration panel
- Responsive UI built with React Bootstrap
Deployment: The full stack can be started via Docker Compose; pre‑built images are available from the Gitea registry.
3. Agents (Planned)
Specialized workers that register with the controller and execute specific document‑processing tasks. Each agent declares its capabilities, allowing the controller to assign appropriate work.
Planned Agent Types:
- OCR Agent: Text extraction from images and scanned documents
- PDF Agent: PDF text extraction, metadata parsing, and page analysis
- Video Agent: Video content analysis, frame extraction, and transcription
- Metadata Agent: Document metadata enrichment and normalization
- Scoring Agent: Document relevance scoring and ranking
- Summarization Agent: Automatic document summarization
Task System
Tasks are the unit of work in EpistineFiles. They follow a standardized template that includes:
- Task type (OCR, PDF, video, etc.)
- Input parameters (document references, processing options)
- Priority and scheduling constraints
- Expected output format
Supported Task Types
| Task Type | Description |
|---|---|
| OCR tasks | Text extraction from image‑based documents using optical character recognition |
| PDF handling tasks | PDF text extraction, metadata processing, and page‑level analysis |
| Video tasks | Video content analysis, frame extraction, and transcription |
| Non‑OCR image tasks | Management of image files that do not require OCR (resizing, tagging, etc.) |
| Tokenize tasks | Text tokenization for search and indexing pipelines |
| Metadata enrich tasks | Document metadata enrichment from external sources |
| Scoring tasks | Document scoring and ranking based on configurable metrics |
| Summarize tasks | Automatic document summarization using extractive or abstractive methods |
Getting Started
Prerequisites
- Docker and Docker Compose
- (Optional) Node.js and Python for local development
Running the System
1. Controller
The controller is available as a pre‑built Docker image. To run it:
docker run -p 8000:8000 \
-e POSTGRES_SUPERUSER_NAME=your_superuser \
-e POSTGRES_SUPERUSER_PASSWORD=your_secure_password \
-e POSTGRES_IP=localhost \
-e POSTGRES_PORT=5432 \
-e SCHEMA_FILE_PATH=./data/schema/version1.sql \
-e MASTER_API_KEY=your_master_api_key \
-v /path/to/certs:/app/certs:ro \
-v /path/to/logs:/logs \
gitea.kareemhorstink.me/imrayya/epistinefiles/controller:latest
See controller/README.md for detailed configuration and Docker Compose examples.
2. Webserver
The webserver stack can be started with Docker Compose:
cd epistine-webserver
docker-compose up -d
The application will be available at:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- Nginx: http://localhost
Note
: The Docker Compose configuration builds images locally. For production use, pre‑built images are available from
gitea.kareemhorstink.me.
Refer to epistine-webserver/README.md for more details.
Development
Project Structure
EpistineFiles/
├── controller/ # Central controller service
│ ├── app/ # FastAPI application code
│ ├── data/schema/ # Database schema migrations
│ └── README.md # Controller‑specific documentation
├── epistine-webserver/ # Web interface (FastAPI + React)
│ ├── backend/ # FastAPI backend
│ ├── frontend/ # React frontend
│ ├── nginx/ # Nginx configuration
│ └── README.md # Webserver‑specific documentation
├── plan/ # Architectural plans and design documents
└── README.md # This file
Building from Source
While pre‑built Docker images are recommended for deployment, you can build the components locally:
# Build controller
cd controller
docker build -t epistine-controller .
# Build webserver
cd ../epistine-webserver
docker-compose build
Configuration
Each component uses environment variables and configuration files for setup:
- Controller:
.envfile for secrets,settings.tomlfor structured configuration - Webserver:
.envfile for backend,frontend/.envfor React environment variables
Example configuration files are provided as .env.example in each directory.
Monitoring & Logging
- Controller: Logs are written to
/logs(configurable) with rotation based on file size. - Webserver: Backend logs are captured via Docker; frontend errors are reported to the browser console.
- Health endpoints: Both controller (
/health) and webserver (/api/health) expose health‑check endpoints.
Security
- API Key Authentication: All controller endpoints require a valid API key with appropriate scopes.
- CORS Configuration: Controller CORS settings are configurable to allow only trusted origins (e.g., the webserver).
- Database Security: PostgreSQL connections use superuser credentials stored as environment variables.
- HTTPS Support: Controller can be configured with SSL certificates for secure communication.
Project Status
EpistineFiles is under active development. The controller and webserver are functional and deployed via Docker images. The agent framework is planned for future implementation.
Support
For questions or issues:
- Check the component‑specific README files
- Review the architectural plans in the
plan/directory - Contact the development team
This project is developed for internal use and is not publicly licensed.