EpistineFiles

EpistineFiles is a distributed document processing system designed to manage, analyze, and extract insights from heterogeneous document collections. It provides a scalable architecture where a central controller orchestrates specialized agents that perform various document processing tasks, with a modern webserver offering a unified management interface.

Architecture Overview

The system follows a modular, serviceoriented design:

┌─────────────────────────────────────────────────────────────┐
│                    EpistineFiles Webserver                   │
│  (FastAPI + React + Nginx)  User interface & API gateway   │
└──────────────────────────────┬──────────────────────────────┘
                               │ HTTP/WebSocket
┌──────────────────────────────▼──────────────────────────────┐
│                    Controller (FastAPI)                      │
│   Task orchestration & scheduling                          │
│   Agent registration & health monitoring                   │
│   Database persistence (PostgreSQL)                        │
│   Centralized logging & API key management                 │
└──────────────────────────────┬──────────────────────────────┘
                               │ REST API
┌──────────────────────────────▼──────────────────────────────┐
│                    Agents (Specialized Workers)              │
│   OCR processing                                            │
│   PDF text extraction                                       │
│   Video content analysis                                    │
│   Metadata enrichment                                       │
│   Document scoring & summarization                          │
└─────────────────────────────────────────────────────────────┘

Core Components

1. Controller (controller/)

The central brain of the system. It exposes a REST API for managing agents, tasks, and system configuration, and maintains a PostgreSQL database for persistent state.

Key Features:

  • FastAPIbased REST API with automatic OpenAPI documentation
  • Rolebased API key authentication with finegrained scopes
  • PostgreSQL integration with connection pooling and schema management
  • Configurable logging system with file rotation
  • CORS middleware supporting crossorigin requests from the webserver

Deployment: Available as a prebuilt Docker image from the Gitea container registry:
gitea.kareemhorstink.me/imrayya/epistinefiles/controller:latest

2. Webserver (epistine-webserver/)

A modern web interface built with FastAPI (backend) and React (frontend), providing a unified dashboard for monitoring agents, tasks, and logs.

Key Features:

  • Realtime dashboard with WebSocket updates
  • Agent and task management interfaces
  • Interactive log viewer with filtering
  • Settings configuration panel
  • Responsive UI built with React Bootstrap

Deployment: The full stack can be started via Docker Compose; prebuilt images are available from the Gitea registry.

3. Agents (Planned)

Specialized workers that register with the controller and execute specific documentprocessing tasks. Each agent declares its capabilities, allowing the controller to assign appropriate work.

Planned Agent Types:

  • OCR Agent: Text extraction from images and scanned documents
  • PDF Agent: PDF text extraction, metadata parsing, and page analysis
  • Video Agent: Video content analysis, frame extraction, and transcription
  • Metadata Agent: Document metadata enrichment and normalization
  • Scoring Agent: Document relevance scoring and ranking
  • Summarization Agent: Automatic document summarization

Task System

Tasks are the unit of work in EpistineFiles. They follow a standardized template that includes:

  • Task type (OCR, PDF, video, etc.)
  • Input parameters (document references, processing options)
  • Priority and scheduling constraints
  • Expected output format

Supported Task Types

Task Type Description
OCR tasks Text extraction from imagebased documents using optical character recognition
PDF handling tasks PDF text extraction, metadata processing, and pagelevel analysis
Video tasks Video content analysis, frame extraction, and transcription
NonOCR image tasks Management of image files that do not require OCR (resizing, tagging, etc.)
Tokenize tasks Text tokenization for search and indexing pipelines
Metadata enrich tasks Document metadata enrichment from external sources
Scoring tasks Document scoring and ranking based on configurable metrics
Summarize tasks Automatic document summarization using extractive or abstractive methods

Getting Started

Prerequisites

  • Docker and Docker Compose
  • (Optional) Node.js and Python for local development

Running the System

1. Controller

The controller is available as a prebuilt Docker image. To run it:

docker run -p 8000:8000 \
  -e POSTGRES_SUPERUSER_NAME=your_superuser \
  -e POSTGRES_SUPERUSER_PASSWORD=your_secure_password \
  -e POSTGRES_IP=localhost \
  -e POSTGRES_PORT=5432 \
  -e SCHEMA_FILE_PATH=./data/schema/version1.sql \
  -e MASTER_API_KEY=your_master_api_key \
  -v /path/to/certs:/app/certs:ro \
  -v /path/to/logs:/logs \
  gitea.kareemhorstink.me/imrayya/epistinefiles/controller:latest

See controller/README.md for detailed configuration and Docker Compose examples.

2. Webserver

The webserver stack can be started with Docker Compose:

cd epistine-webserver
docker-compose up -d

The application will be available at:

Note

: The Docker Compose configuration builds images locally. For production use, prebuilt images are available from gitea.kareemhorstink.me.

Refer to epistine-webserver/README.md for more details.

Development

Project Structure

EpistineFiles/
├── controller/                 # Central controller service
│   ├── app/                   # FastAPI application code
│   ├── data/schema/           # Database schema migrations
│   └── README.md              # Controllerspecific documentation
├── epistine-webserver/        # Web interface (FastAPI + React)
│   ├── backend/               # FastAPI backend
│   ├── frontend/              # React frontend
│   ├── nginx/                 # Nginx configuration
│   └── README.md              # Webserverspecific documentation
├── plan/                      # Architectural plans and design documents
└── README.md                  # This file

Building from Source

While prebuilt Docker images are recommended for deployment, you can build the components locally:

# Build controller
cd controller
docker build -t epistine-controller .

# Build webserver
cd ../epistine-webserver
docker-compose build

Configuration

Each component uses environment variables and configuration files for setup:

  • Controller: .env file for secrets, settings.toml for structured configuration
  • Webserver: .env file for backend, frontend/.env for React environment variables

Example configuration files are provided as .env.example in each directory.

Monitoring & Logging

  • Controller: Logs are written to /logs (configurable) with rotation based on file size.
  • Webserver: Backend logs are captured via Docker; frontend errors are reported to the browser console.
  • Health endpoints: Both controller (/health) and webserver (/api/health) expose healthcheck endpoints.

Security

  • API Key Authentication: All controller endpoints require a valid API key with appropriate scopes.
  • CORS Configuration: Controller CORS settings are configurable to allow only trusted origins (e.g., the webserver).
  • Database Security: PostgreSQL connections use superuser credentials stored as environment variables.
  • HTTPS Support: Controller can be configured with SSL certificates for secure communication.

Project Status

EpistineFiles is under active development. The controller and webserver are functional and deployed via Docker images. The agent framework is planned for future implementation.

Support

For questions or issues:

  • Check the componentspecific README files
  • Review the architectural plans in the plan/ directory
  • Contact the development team

This project is developed for internal use and is not publicly licensed.

Description
No description provided
Readme 269 KiB
Languages
Python 84.5%
JavaScript 8.8%
PLpgSQL 5.6%
Dockerfile 0.9%
HTML 0.2%