docs: update project documentation with architecture overview and deployment details
Some checks failed
Build Master / build-controller (push) Successful in 33s
Build Master / build-webserver-backend (push) Successful in 1m17s
Build Master / build-webserver-frontend (push) Failing after 18s

- Rewrite README.md with comprehensive architecture overview, component descriptions, and deployment instructions
- Update controller/README.md to focus on Docker deployment with pre-built images from Gitea registry
- Simplify epistine-webserver/README.md by removing development setup and focusing on Docker Compose deployment
- Add detailed system diagrams, task type tables, and configuration examples
- Remove redundant development workflow instructions in favor of containerized deployment
This commit is contained in:
2026-02-14 01:09:46 +07:00
parent 18a94b5d8a
commit f9d609f6e0
3 changed files with 218 additions and 101 deletions

219
README.md
View File

@@ -1,35 +1,202 @@
# EpistineFiles
This project consists of a **controller** and multiple **agents** that manage task runners.
EpistineFiles is a distributed document processing system designed to manage, analyze, and extract insights from heterogeneous document collections. It provides a scalable architecture where a central **controller** orchestrates specialized **agents** that perform various document processing tasks, with a modern **webserver** offering a unified management interface.
## Project Structure
## Architecture Overview
- **controller/**: The central control module that manages all different nodes.
- Controls coordination between agents
- Task distribution and monitoring
- API key handling
- Database interactions
- Logging functionality
The system follows a modular, serviceoriented design:
- **agents/**: (Not yet present in project structure) Agents will be task runners that perform specific operations like:
- OCR processing
- Advanced document operations
- Other specialized tasks
```
┌─────────────────────────────────────────────────────────────┐
EpistineFiles Webserver │
(FastAPI + React + Nginx) User interface & API gateway │
└──────────────────────────────┬──────────────────────────────┘
│ HTTP/WebSocket
┌──────────────────────────────▼──────────────────────────────┐
│ Controller (FastAPI) │
Task orchestration & scheduling │
Agent registration & health monitoring │
Database persistence (PostgreSQL) │
Centralized logging & API key management │
└──────────────────────────────┬──────────────────────────────┘
│ REST API
┌──────────────────────────────▼──────────────────────────────┐
│ Agents (Specialized Workers) │
OCR processing │
PDF text extraction │
Video content analysis │
Metadata enrichment │
Document scoring & summarization │
└─────────────────────────────────────────────────────────────┘
```
Each agent will list its capabilities, allowing the controller to assign appropriate tasks.
## Core Components
## Tasks
### 1. Controller (`controller/`)
The central brain of the system. It exposes a REST API for managing agents, tasks, and system configuration, and maintains a PostgreSQL database for persistent state.
### Task Templates
- Standardized format for task definitions
- Agents use templates to understand what operations they can perform
**Key Features:**
- FastAPIbased REST API with automatic OpenAPI documentation
- Rolebased API key authentication with finegrained scopes
- PostgreSQL integration with connection pooling and schema management
- Configurable logging system with file rotation
- CORS middleware supporting crossorigin requests from the webserver
### Specific Tasks
- **OCR tasks**: Text extraction from image-based documents
- **PDF handling tasks**: PDF text extraction and metadata processing
- **Video tasks**: Video content analysis and processing
- **Non-OCR image tasks**: Managing image files that don't require OCR
- **Tokenize tasks**: Text tokenization for search/indexing
- **Metadata enrich tasks**: Document metadata enrichment
- **Scoring tasks**: Document scoring/ranking
- **Summarize tasks**: Document summarization
**Deployment:** Available as a prebuilt Docker image from the Gitea container registry:
`gitea.kareemhorstink.me/imrayya/epistinefiles/controller:latest`
### 2. Webserver (`epistine-webserver/`)
A modern web interface built with FastAPI (backend) and React (frontend), providing a unified dashboard for monitoring agents, tasks, and logs.
**Key Features:**
- Realtime dashboard with WebSocket updates
- Agent and task management interfaces
- Interactive log viewer with filtering
- Settings configuration panel
- Responsive UI built with React Bootstrap
**Deployment:** The full stack can be started via Docker Compose; prebuilt images are available from the Gitea registry.
### 3. Agents (Planned)
Specialized workers that register with the controller and execute specific documentprocessing tasks. Each agent declares its capabilities, allowing the controller to assign appropriate work.
**Planned Agent Types:**
- **OCR Agent**: Text extraction from images and scanned documents
- **PDF Agent**: PDF text extraction, metadata parsing, and page analysis
- **Video Agent**: Video content analysis, frame extraction, and transcription
- **Metadata Agent**: Document metadata enrichment and normalization
- **Scoring Agent**: Document relevance scoring and ranking
- **Summarization Agent**: Automatic document summarization
## Task System
Tasks are the unit of work in EpistineFiles. They follow a standardized template that includes:
- **Task type** (OCR, PDF, video, etc.)
- **Input parameters** (document references, processing options)
- **Priority and scheduling constraints**
- **Expected output format**
### Supported Task Types
| Task Type | Description |
|-----------|-------------|
| **OCR tasks** | Text extraction from imagebased documents using optical character recognition |
| **PDF handling tasks** | PDF text extraction, metadata processing, and pagelevel analysis |
| **Video tasks** | Video content analysis, frame extraction, and transcription |
| **NonOCR image tasks** | Management of image files that do not require OCR (resizing, tagging, etc.) |
| **Tokenize tasks** | Text tokenization for search and indexing pipelines |
| **Metadata enrich tasks** | Document metadata enrichment from external sources |
| **Scoring tasks** | Document scoring and ranking based on configurable metrics |
| **Summarize tasks** | Automatic document summarization using extractive or abstractive methods |
## Getting Started
### Prerequisites
- Docker and Docker Compose
- (Optional) Node.js and Python for local development
### Running the System
#### 1. Controller
The controller is available as a prebuilt Docker image. To run it:
```bash
docker run -p 8000:8000 \
-e POSTGRES_SUPERUSER_NAME=your_superuser \
-e POSTGRES_SUPERUSER_PASSWORD=your_secure_password \
-e POSTGRES_IP=localhost \
-e POSTGRES_PORT=5432 \
-e SCHEMA_FILE_PATH=./data/schema/version1.sql \
-e MASTER_API_KEY=your_master_api_key \
-v /path/to/certs:/app/certs:ro \
-v /path/to/logs:/logs \
gitea.kareemhorstink.me/imrayya/epistinefiles/controller:latest
```
See [`controller/README.md`](controller/README.md) for detailed configuration and Docker Compose examples.
#### 2. Webserver
The webserver stack can be started with Docker Compose:
```bash
cd epistine-webserver
docker-compose up -d
```
The application will be available at:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- Nginx: http://localhost
> **Note**: The Docker Compose configuration builds images locally. For production use, prebuilt images are available from `gitea.kareemhorstink.me`.
Refer to [`epistine-webserver/README.md`](epistine-webserver/README.md) for more details.
## Development
### Project Structure
```
EpistineFiles/
├── controller/ # Central controller service
│ ├── app/ # FastAPI application code
│ ├── data/schema/ # Database schema migrations
│ └── README.md # Controllerspecific documentation
├── epistine-webserver/ # Web interface (FastAPI + React)
│ ├── backend/ # FastAPI backend
│ ├── frontend/ # React frontend
│ ├── nginx/ # Nginx configuration
│ └── README.md # Webserverspecific documentation
├── plan/ # Architectural plans and design documents
└── README.md # This file
```
### Building from Source
While prebuilt Docker images are recommended for deployment, you can build the components locally:
```bash
# Build controller
cd controller
docker build -t epistine-controller .
# Build webserver
cd ../epistine-webserver
docker-compose build
```
## Configuration
Each component uses environment variables and configuration files for setup:
- **Controller**: `.env` file for secrets, `settings.toml` for structured configuration
- **Webserver**: `.env` file for backend, `frontend/.env` for React environment variables
Example configuration files are provided as `.env.example` in each directory.
## Monitoring & Logging
- **Controller**: Logs are written to `/logs` (configurable) with rotation based on file size.
- **Webserver**: Backend logs are captured via Docker; frontend errors are reported to the browser console.
- **Health endpoints**: Both controller (`/health`) and webserver (`/api/health`) expose healthcheck endpoints.
## Security
- **API Key Authentication**: All controller endpoints require a valid API key with appropriate scopes.
- **CORS Configuration**: Controller CORS settings are configurable to allow only trusted origins (e.g., the webserver).
- **Database Security**: PostgreSQL connections use superuser credentials stored as environment variables.
- **HTTPS Support**: Controller can be configured with SSL certificates for secure communication.
## Project Status
EpistineFiles is under active development. The controller and webserver are functional and deployed via Docker images. The agent framework is planned for future implementation.
## Support
For questions or issues:
- Check the componentspecific README files
- Review the architectural plans in the `plan/` directory
- Contact the development team
---
*This project is developed for internal use and is not publicly licensed.*

View File

@@ -60,28 +60,32 @@ The controller exposes the following REST API endpoints:
## Running the Controller
### Locally
The controller is available as a Docker image from the Gitea container registry. Use the following command to run it:
```bash
uvicorn run app.py --host 0.0.0.0 --port 8000 \
--ssl-keyfile certs/controller.local-key.pem \
--ssl-certfile certs/controller.local.pem
docker run -p 8000:8000 \
-e POSTGRES_SUPERUSER_NAME=your_superuser \
-e POSTGRES_SUPERUSER_PASSWORD=your_secure_password \
-e POSTGRES_IP=localhost \
-e POSTGRES_PORT=5432 \
-e SCHEMA_FILE_PATH=./schema/version1.sql \
-e MASTER_API_KEY=your_master_api_key \
-e SSL_KEYFILE=/app/certs/controller.local-key.pem \
-e SSL_CERTFILE=/app/certs/controller.local.pem \
-e CONTROLLER_PORT=8000 \
-e STREAMLIT_ORIGIN=http://localhost:8501 \
-v /path/to/certs:/app/certs:ro \
-v /path/to/logs:/logs \
gitea.kareemhorstink.me/imrayya/epistinefiles/controller:latest
```
### Docker
```bash
docker build -t epistine-controller .
docker run -p 8000:8000 --env-file .env epistine-controller
```
### Docker Compose
Alternatively, you can use Docker Compose with the prebuilt image:
```yaml
version: '3.8'
services:
controller:
build: ./controller
image: gitea.kareemhorstink.me/imrayya/epistinefiles/controller:latest
ports:
- "8000:8000"
env_file: .env
@@ -90,6 +94,8 @@ services:
- /path/to/your/logs:/logs
```
> **Note**: Replace the environment variables and volume paths with your actual configuration. The image is automatically updated when changes are pushed to the repository.
## Dependencies
All required dependencies are listed in `requirements.txt`:

View File

@@ -62,36 +62,23 @@ epistine-webserver/
### Prerequisites
- Docker and Docker Compose
- Node.js (for development)
### Development Setup
### Running the Application
The webserver components are available as prebuilt Docker images from the Gitea container registry. To start the full stack:
1. Clone the repository
2. Navigate to the project directory
3. Start the development environment:
3. Start the services using Docker Compose:
```bash
docker-compose up --build
docker-compose up -d
```
4. The application will be available at:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- Nginx: http://localhost
### Development Workflow
1. Backend development:
```bash
cd backend
pip install -r requirements.txt
uvicorn main:app --reload
```
2. Frontend development:
```bash
cd frontend
npm install
npm start
```
> **Note**: The Docker Compose configuration builds images locally. For production use, prebuilt images are available from `gitea.kareemhorstink.me`.
## API Endpoints
@@ -178,38 +165,6 @@ The application uses PostgreSQL with the following main tables:
- `REACT_APP_API_URL` - Backend API URL
- `REACT_APP_WS_URL` - WebSocket URL
## Testing
### Backend Tests
```bash
cd backend
tox
```
### Frontend Tests
```bash
cd frontend
npm test
```
## Deployment
### Production
1. Build the Docker images:
```bash
docker-compose -f docker-compose.prod.yml up --build -d
```
2. Configure environment variables in `.env` file
3. Set up SSL certificates for HTTPS
### Monitoring
The application includes built-in monitoring endpoints:
- `/api/health` - Health check
- `/metrics` - Performance metrics
- `/status` - System status
## Security
@@ -229,17 +184,6 @@ The application includes built-in monitoring endpoints:
- XSS protection
- CSRF protection
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Submit a pull request
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Support