docs: update project documentation with architecture overview and deployment details
- Rewrite README.md with comprehensive architecture overview, component descriptions, and deployment instructions - Update controller/README.md to focus on Docker deployment with pre-built images from Gitea registry - Simplify epistine-webserver/README.md by removing development setup and focusing on Docker Compose deployment - Add detailed system diagrams, task type tables, and configuration examples - Remove redundant development workflow instructions in favor of containerized deployment
This commit is contained in:
219
README.md
219
README.md
@@ -1,35 +1,202 @@
|
||||
# EpistineFiles
|
||||
|
||||
This project consists of a **controller** and multiple **agents** that manage task runners.
|
||||
EpistineFiles is a distributed document processing system designed to manage, analyze, and extract insights from heterogeneous document collections. It provides a scalable architecture where a central **controller** orchestrates specialized **agents** that perform various document processing tasks, with a modern **webserver** offering a unified management interface.
|
||||
|
||||
## Project Structure
|
||||
## Architecture Overview
|
||||
|
||||
- **controller/**: The central control module that manages all different nodes.
|
||||
- Controls coordination between agents
|
||||
- Task distribution and monitoring
|
||||
- API key handling
|
||||
- Database interactions
|
||||
- Logging functionality
|
||||
The system follows a modular, service‑oriented design:
|
||||
|
||||
- **agents/**: (Not yet present in project structure) Agents will be task runners that perform specific operations like:
|
||||
- OCR processing
|
||||
- Advanced document operations
|
||||
- Other specialized tasks
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ EpistineFiles Webserver │
|
||||
│ (FastAPI + React + Nginx) – User interface & API gateway │
|
||||
└──────────────────────────────┬──────────────────────────────┘
|
||||
│ HTTP/WebSocket
|
||||
┌──────────────────────────────▼──────────────────────────────┐
|
||||
│ Controller (FastAPI) │
|
||||
│ – Task orchestration & scheduling │
|
||||
│ – Agent registration & health monitoring │
|
||||
│ – Database persistence (PostgreSQL) │
|
||||
│ – Centralized logging & API key management │
|
||||
└──────────────────────────────┬──────────────────────────────┘
|
||||
│ REST API
|
||||
┌──────────────────────────────▼──────────────────────────────┐
|
||||
│ Agents (Specialized Workers) │
|
||||
│ – OCR processing │
|
||||
│ – PDF text extraction │
|
||||
│ – Video content analysis │
|
||||
│ – Metadata enrichment │
|
||||
│ – Document scoring & summarization │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Each agent will list its capabilities, allowing the controller to assign appropriate tasks.
|
||||
## Core Components
|
||||
|
||||
## Tasks
|
||||
### 1. Controller (`controller/`)
|
||||
The central brain of the system. It exposes a REST API for managing agents, tasks, and system configuration, and maintains a PostgreSQL database for persistent state.
|
||||
|
||||
### Task Templates
|
||||
- Standardized format for task definitions
|
||||
- Agents use templates to understand what operations they can perform
|
||||
**Key Features:**
|
||||
- FastAPI‑based REST API with automatic OpenAPI documentation
|
||||
- Role‑based API key authentication with fine‑grained scopes
|
||||
- PostgreSQL integration with connection pooling and schema management
|
||||
- Configurable logging system with file rotation
|
||||
- CORS middleware supporting cross‑origin requests from the webserver
|
||||
|
||||
### Specific Tasks
|
||||
- **OCR tasks**: Text extraction from image-based documents
|
||||
- **PDF handling tasks**: PDF text extraction and metadata processing
|
||||
- **Video tasks**: Video content analysis and processing
|
||||
- **Non-OCR image tasks**: Managing image files that don't require OCR
|
||||
- **Tokenize tasks**: Text tokenization for search/indexing
|
||||
- **Metadata enrich tasks**: Document metadata enrichment
|
||||
- **Scoring tasks**: Document scoring/ranking
|
||||
- **Summarize tasks**: Document summarization
|
||||
**Deployment:** Available as a pre‑built Docker image from the Gitea container registry:
|
||||
`gitea.kareemhorstink.me/imrayya/epistinefiles/controller:latest`
|
||||
|
||||
### 2. Webserver (`epistine-webserver/`)
|
||||
A modern web interface built with FastAPI (backend) and React (frontend), providing a unified dashboard for monitoring agents, tasks, and logs.
|
||||
|
||||
**Key Features:**
|
||||
- Real‑time dashboard with WebSocket updates
|
||||
- Agent and task management interfaces
|
||||
- Interactive log viewer with filtering
|
||||
- Settings configuration panel
|
||||
- Responsive UI built with React Bootstrap
|
||||
|
||||
**Deployment:** The full stack can be started via Docker Compose; pre‑built images are available from the Gitea registry.
|
||||
|
||||
### 3. Agents (Planned)
|
||||
Specialized workers that register with the controller and execute specific document‑processing tasks. Each agent declares its capabilities, allowing the controller to assign appropriate work.
|
||||
|
||||
**Planned Agent Types:**
|
||||
- **OCR Agent**: Text extraction from images and scanned documents
|
||||
- **PDF Agent**: PDF text extraction, metadata parsing, and page analysis
|
||||
- **Video Agent**: Video content analysis, frame extraction, and transcription
|
||||
- **Metadata Agent**: Document metadata enrichment and normalization
|
||||
- **Scoring Agent**: Document relevance scoring and ranking
|
||||
- **Summarization Agent**: Automatic document summarization
|
||||
|
||||
## Task System
|
||||
|
||||
Tasks are the unit of work in EpistineFiles. They follow a standardized template that includes:
|
||||
|
||||
- **Task type** (OCR, PDF, video, etc.)
|
||||
- **Input parameters** (document references, processing options)
|
||||
- **Priority and scheduling constraints**
|
||||
- **Expected output format**
|
||||
|
||||
### Supported Task Types
|
||||
|
||||
| Task Type | Description |
|
||||
|-----------|-------------|
|
||||
| **OCR tasks** | Text extraction from image‑based documents using optical character recognition |
|
||||
| **PDF handling tasks** | PDF text extraction, metadata processing, and page‑level analysis |
|
||||
| **Video tasks** | Video content analysis, frame extraction, and transcription |
|
||||
| **Non‑OCR image tasks** | Management of image files that do not require OCR (resizing, tagging, etc.) |
|
||||
| **Tokenize tasks** | Text tokenization for search and indexing pipelines |
|
||||
| **Metadata enrich tasks** | Document metadata enrichment from external sources |
|
||||
| **Scoring tasks** | Document scoring and ranking based on configurable metrics |
|
||||
| **Summarize tasks** | Automatic document summarization using extractive or abstractive methods |
|
||||
|
||||
## Getting Started
|
||||
|
||||
### Prerequisites
|
||||
- Docker and Docker Compose
|
||||
- (Optional) Node.js and Python for local development
|
||||
|
||||
### Running the System
|
||||
|
||||
#### 1. Controller
|
||||
The controller is available as a pre‑built Docker image. To run it:
|
||||
|
||||
```bash
|
||||
docker run -p 8000:8000 \
|
||||
-e POSTGRES_SUPERUSER_NAME=your_superuser \
|
||||
-e POSTGRES_SUPERUSER_PASSWORD=your_secure_password \
|
||||
-e POSTGRES_IP=localhost \
|
||||
-e POSTGRES_PORT=5432 \
|
||||
-e SCHEMA_FILE_PATH=./data/schema/version1.sql \
|
||||
-e MASTER_API_KEY=your_master_api_key \
|
||||
-v /path/to/certs:/app/certs:ro \
|
||||
-v /path/to/logs:/logs \
|
||||
gitea.kareemhorstink.me/imrayya/epistinefiles/controller:latest
|
||||
```
|
||||
|
||||
See [`controller/README.md`](controller/README.md) for detailed configuration and Docker Compose examples.
|
||||
|
||||
#### 2. Webserver
|
||||
The webserver stack can be started with Docker Compose:
|
||||
|
||||
```bash
|
||||
cd epistine-webserver
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
The application will be available at:
|
||||
- Frontend: http://localhost:3000
|
||||
- Backend API: http://localhost:8000
|
||||
- Nginx: http://localhost
|
||||
|
||||
> **Note**: The Docker Compose configuration builds images locally. For production use, pre‑built images are available from `gitea.kareemhorstink.me`.
|
||||
|
||||
Refer to [`epistine-webserver/README.md`](epistine-webserver/README.md) for more details.
|
||||
|
||||
## Development
|
||||
|
||||
### Project Structure
|
||||
```
|
||||
EpistineFiles/
|
||||
├── controller/ # Central controller service
|
||||
│ ├── app/ # FastAPI application code
|
||||
│ ├── data/schema/ # Database schema migrations
|
||||
│ └── README.md # Controller‑specific documentation
|
||||
├── epistine-webserver/ # Web interface (FastAPI + React)
|
||||
│ ├── backend/ # FastAPI backend
|
||||
│ ├── frontend/ # React frontend
|
||||
│ ├── nginx/ # Nginx configuration
|
||||
│ └── README.md # Webserver‑specific documentation
|
||||
├── plan/ # Architectural plans and design documents
|
||||
└── README.md # This file
|
||||
```
|
||||
|
||||
### Building from Source
|
||||
While pre‑built Docker images are recommended for deployment, you can build the components locally:
|
||||
|
||||
```bash
|
||||
# Build controller
|
||||
cd controller
|
||||
docker build -t epistine-controller .
|
||||
|
||||
# Build webserver
|
||||
cd ../epistine-webserver
|
||||
docker-compose build
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Each component uses environment variables and configuration files for setup:
|
||||
|
||||
- **Controller**: `.env` file for secrets, `settings.toml` for structured configuration
|
||||
- **Webserver**: `.env` file for backend, `frontend/.env` for React environment variables
|
||||
|
||||
Example configuration files are provided as `.env.example` in each directory.
|
||||
|
||||
## Monitoring & Logging
|
||||
|
||||
- **Controller**: Logs are written to `/logs` (configurable) with rotation based on file size.
|
||||
- **Webserver**: Backend logs are captured via Docker; frontend errors are reported to the browser console.
|
||||
- **Health endpoints**: Both controller (`/health`) and webserver (`/api/health`) expose health‑check endpoints.
|
||||
|
||||
## Security
|
||||
|
||||
- **API Key Authentication**: All controller endpoints require a valid API key with appropriate scopes.
|
||||
- **CORS Configuration**: Controller CORS settings are configurable to allow only trusted origins (e.g., the webserver).
|
||||
- **Database Security**: PostgreSQL connections use superuser credentials stored as environment variables.
|
||||
- **HTTPS Support**: Controller can be configured with SSL certificates for secure communication.
|
||||
|
||||
## Project Status
|
||||
|
||||
EpistineFiles is under active development. The controller and webserver are functional and deployed via Docker images. The agent framework is planned for future implementation.
|
||||
|
||||
## Support
|
||||
|
||||
For questions or issues:
|
||||
- Check the component‑specific README files
|
||||
- Review the architectural plans in the `plan/` directory
|
||||
- Contact the development team
|
||||
|
||||
---
|
||||
|
||||
*This project is developed for internal use and is not publicly licensed.*
|
||||
|
||||
@@ -60,28 +60,32 @@ The controller exposes the following REST API endpoints:
|
||||
|
||||
## Running the Controller
|
||||
|
||||
### Locally
|
||||
The controller is available as a Docker image from the Gitea container registry. Use the following command to run it:
|
||||
|
||||
```bash
|
||||
uvicorn run app.py --host 0.0.0.0 --port 8000 \
|
||||
--ssl-keyfile certs/controller.local-key.pem \
|
||||
--ssl-certfile certs/controller.local.pem
|
||||
docker run -p 8000:8000 \
|
||||
-e POSTGRES_SUPERUSER_NAME=your_superuser \
|
||||
-e POSTGRES_SUPERUSER_PASSWORD=your_secure_password \
|
||||
-e POSTGRES_IP=localhost \
|
||||
-e POSTGRES_PORT=5432 \
|
||||
-e SCHEMA_FILE_PATH=./schema/version1.sql \
|
||||
-e MASTER_API_KEY=your_master_api_key \
|
||||
-e SSL_KEYFILE=/app/certs/controller.local-key.pem \
|
||||
-e SSL_CERTFILE=/app/certs/controller.local.pem \
|
||||
-e CONTROLLER_PORT=8000 \
|
||||
-e STREAMLIT_ORIGIN=http://localhost:8501 \
|
||||
-v /path/to/certs:/app/certs:ro \
|
||||
-v /path/to/logs:/logs \
|
||||
gitea.kareemhorstink.me/imrayya/epistinefiles/controller:latest
|
||||
```
|
||||
|
||||
### Docker
|
||||
|
||||
```bash
|
||||
docker build -t epistine-controller .
|
||||
docker run -p 8000:8000 --env-file .env epistine-controller
|
||||
```
|
||||
|
||||
### Docker Compose
|
||||
Alternatively, you can use Docker Compose with the pre‑built image:
|
||||
|
||||
```yaml
|
||||
version: '3.8'
|
||||
services:
|
||||
controller:
|
||||
build: ./controller
|
||||
image: gitea.kareemhorstink.me/imrayya/epistinefiles/controller:latest
|
||||
ports:
|
||||
- "8000:8000"
|
||||
env_file: .env
|
||||
@@ -90,6 +94,8 @@ services:
|
||||
- /path/to/your/logs:/logs
|
||||
```
|
||||
|
||||
> **Note**: Replace the environment variables and volume paths with your actual configuration. The image is automatically updated when changes are pushed to the repository.
|
||||
|
||||
## Dependencies
|
||||
|
||||
All required dependencies are listed in `requirements.txt`:
|
||||
|
||||
@@ -62,36 +62,23 @@ epistine-webserver/
|
||||
### Prerequisites
|
||||
|
||||
- Docker and Docker Compose
|
||||
- Node.js (for development)
|
||||
|
||||
### Development Setup
|
||||
### Running the Application
|
||||
|
||||
The webserver components are available as pre‑built Docker images from the Gitea container registry. To start the full stack:
|
||||
|
||||
1. Clone the repository
|
||||
2. Navigate to the project directory
|
||||
3. Start the development environment:
|
||||
3. Start the services using Docker Compose:
|
||||
```bash
|
||||
docker-compose up --build
|
||||
docker-compose up -d
|
||||
```
|
||||
4. The application will be available at:
|
||||
- Frontend: http://localhost:3000
|
||||
- Backend API: http://localhost:8000
|
||||
- Nginx: http://localhost
|
||||
|
||||
### Development Workflow
|
||||
|
||||
1. Backend development:
|
||||
```bash
|
||||
cd backend
|
||||
pip install -r requirements.txt
|
||||
uvicorn main:app --reload
|
||||
```
|
||||
|
||||
2. Frontend development:
|
||||
```bash
|
||||
cd frontend
|
||||
npm install
|
||||
npm start
|
||||
```
|
||||
> **Note**: The Docker Compose configuration builds images locally. For production use, pre‑built images are available from `gitea.kareemhorstink.me`.
|
||||
|
||||
## API Endpoints
|
||||
|
||||
@@ -178,38 +165,6 @@ The application uses PostgreSQL with the following main tables:
|
||||
- `REACT_APP_API_URL` - Backend API URL
|
||||
- `REACT_APP_WS_URL` - WebSocket URL
|
||||
|
||||
## Testing
|
||||
|
||||
### Backend Tests
|
||||
```bash
|
||||
cd backend
|
||||
tox
|
||||
```
|
||||
|
||||
### Frontend Tests
|
||||
```bash
|
||||
cd frontend
|
||||
npm test
|
||||
```
|
||||
|
||||
## Deployment
|
||||
|
||||
### Production
|
||||
1. Build the Docker images:
|
||||
```bash
|
||||
docker-compose -f docker-compose.prod.yml up --build -d
|
||||
```
|
||||
|
||||
2. Configure environment variables in `.env` file
|
||||
|
||||
3. Set up SSL certificates for HTTPS
|
||||
|
||||
### Monitoring
|
||||
|
||||
The application includes built-in monitoring endpoints:
|
||||
- `/api/health` - Health check
|
||||
- `/metrics` - Performance metrics
|
||||
- `/status` - System status
|
||||
|
||||
## Security
|
||||
|
||||
@@ -229,17 +184,6 @@ The application includes built-in monitoring endpoints:
|
||||
- XSS protection
|
||||
- CSRF protection
|
||||
|
||||
## Contributing
|
||||
|
||||
1. Fork the repository
|
||||
2. Create a feature branch
|
||||
3. Make your changes
|
||||
4. Add tests for new functionality
|
||||
5. Submit a pull request
|
||||
|
||||
## License
|
||||
|
||||
This project is licensed under the MIT License - see the LICENSE file for details.
|
||||
|
||||
## Support
|
||||
|
||||
|
||||
Reference in New Issue
Block a user