An automated, privacy-focused, GPU-accelerated pipeline to translate manga and comics locally.
This project aims to provide a full-stack solution (Frontend, Backend, and AI Worker) to detect text bubbles, perform OCR, translate contextually using LLMs, and typeset the result back into the original image—all without external APIs or recurring costs.
The project follows a Microservices architecture to ensure the heavy AI processing doesn't block the web server.
The backend API has been significantly enhanced with production-ready features:
- Real-time SSE updates showing page-by-page translation progress
- Instant feedback with proper Python stdout unbuffering
- Reliable broadcasting via Redis pub/sub architecture
- Connection stability with proper resource cleanup and error handling
- Automatic extraction of original and translated archives
- Subdirectory preservation - maintains complex folder structures
- Instant page counting - displays total pages immediately on upload
- Smart path handling - supports nested directories and Unicode filenames
- Proper SSE lifecycle with deferred cleanup in goroutines
- Wildcard routing for flexible file serving
- Enhanced logging with detailed progress tracking
- Type-safe callbacks throughout the translation pipeline
The core engine is currently fully operational.
-
29 pages/minute
-
~1,700 pages/hour
-
Batch processing (.zip native)
-
100% Local & Uncensored: Powered by llama.cpp and Abliterated models. No moralizing, just translation.
-
Smart Detection: Uses YOLOv8 fine-tuned on Manga109 to detect speech bubbles.
- Smart Box Merging automatically consolidates fragmented vertical text bubbles.
-
Specialized OCR: Uses MangaOCR to handle vertical Japanese text and handwritten fonts.
-
- Uses Qwen 2.5 7B (Instruction tuned).
- Custom prompt engineering to handle "Subject-less" Japanese sentences.
- "Anti-Thinking" regex filters to remove internal LLM monologues.
-
- NEW (V10): Intelligent Masked Inpainting - Uses OpenCV threshold detection and cv2.inpaint to remove ONLY dark text pixels, preserving artwork and backgrounds even when bounding boxes overlap.
- Pixel-Perfect Wrapping: Custom algorithm measuring exact pixel width of words to avoid overflow.
- Sanitization: Filters out unsupported characters (emojis, math symbols) to prevent font rendering glitches.
-
Batch Processing: Native support for .zip archives (extract → translate → repack).
-
Modular Architecture: Clean, maintainable codebase with separation of concerns for easy customization and extension.
See the V10 intelligent masked inpainting in action! These examples showcase the ability to preserve artwork while cleanly removing text.
Original (Japanese) |
Translated (English) |
Original (Japanese) |
Translated (English) |
V10 Improvements Demonstrated:
- Clean text removal without damaging background artwork
- Preserved bubble borders and shading
- Accurate text positioning and sizing
- No artifacts in overlapping bubble regions
Before starting, download the required AI models:
📦 Download Models (Google Drive)
Required files:
Qwen2.5-7B-Instruct-abliterated-v2.Q4_K_M.gguf(~4.6 GB) - LLM for translationmanga-text-detector.pt- YOLO model for text bubble detection
Place these files in the ai-worker/models/ directory.
Two launcher scripts are provided at the project root. They handle everything: Docker services, the Go worker, and opening your browser automatically.
Prerequisites: Docker Desktop, Go 1.23+, Python 3.10+, CUDA 12.x
# Clone the repository
git clone <repository-url>
cd manga-translator
# Set up the Python AI worker environment (once)
cd ai-worker
python -m venv venv
# Windows:
venv\Scripts\activate
# Linux/Mac:
# source venv/bin/activate
pip install -r requirements.txt
cd ..Windows:
run.batLinux / Mac:
chmod +x run.sh
./run.shBoth scripts will:
- Start all Docker services (PostgreSQL, Redis, Go API, Next.js frontend, Asynqmon)
- Launch the Go worker in a separate terminal window (uses your GPU via the local Python venv)
- Open
http://localhost:3000in your default browser
| Service | URL |
|---|---|
| Frontend | http://localhost:3000 |
| Backend API | http://localhost:8080 |
| Asynq Monitor | http://localhost:8081 |
Why a hybrid setup? The AI pipeline (
llama-cpp-python, PyTorch CUDA) requires direct GPU access which Docker on Windows cannot provide without the NVIDIA Container Toolkit. The Go worker runs natively on the host and spawns Python as a subprocess, while all other services run in Docker for easy reproducibility.
Run each component separately for development:
cd backend-api
docker-compose up -d postgres rediscd ../ai-worker
python -m venv venv
venv\Scripts\activate # Windows
# or: source venv/bin/activate # Linux/Mac
pip install -r requirements.txtcd ../backend-api
cp .env.example .env
# Edit .env to configure paths (especially PYTHON_PATH)
# Run migrations
migrate -path ./migrations -database "postgres://manga_user:secure_pass@localhost:5432/manga_translator?sslmode=disable" up
# Start API server
go run ./cmd/api
# In another terminal, start worker
go run ./cmd/api --mode=workercd ../frontend
npm install # or: pnpm install
cp .env.local.example .env.local
npm run dev # or: pnpm devUse just the AI worker for command-line batch translation:
cd ai-worker
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
# Translate a single image or ZIP file
python main.py path/to/manga_chapter.zip- GPU: NVIDIA GPU with 6GB+ VRAM (Recommended: 8GB+)
- CUDA: CUDA Toolkit 12.x
- Python: 3.10+
- Go: 1.23+ (for backend development)
- Node.js: 20+ (for frontend development)
- Docker: Docker Desktop (for containerized deployment)
- Core AI Pipeline (Detection, OCR, Translation, Inpainting)
- GPU Optimization (VRAM management, 4-bit quantization)
- Smart Typesetting (Pixel wrapping, box merging)
- Modular Code Architecture (Config, Services, Utils separation)
- Go/Fiber HTTP server with hexagonal architecture
- PostgreSQL database with migrations
- Asynq + Redis job queue
- Python worker subprocess integration
- File upload and validation
- SSE real-time progress tracking
- Redis pub/sub for event broadcasting
- Docker multi-stage build
- Production Docker Compose orchestration
- Unit & integration tests (future)
- Modern UI with Next.js 16 and Tailwind CSS
- Drag-and-drop file upload zone
- API integration with backend
- Real-time SSE progress tracking
- Translation status dashboard
- Interactive result viewer (original/translated toggle)
- Thumbnail generation (future)
- User authentication (future)
- Docker Compose (one-command full stack deployment)
- PostgreSQL + Redis services
- Multi-container orchestration (API + Worker + Frontend)
- Asynq monitoring UI
- CI/CD pipeline (future)
- Prometheus/Grafana monitoring (future)
This project showcases a comprehensive full-stack development skillset with modern technologies and architectural patterns:
- Go: High-performance API with Fiber v3 framework, clean architecture principles
- PostgreSQL: Database design, migrations, complex queries with pgx driver
- Redis: Pub/sub messaging, caching, session management
- Queue Systems: Asynq for distributed job processing and background tasks
- Real-time Communication: Server-Sent Events (SSE) implementation with proper lifecycle management
- File Processing: ZIP extraction, multi-format image handling, recursive directory operations
- Concurrency: Goroutines, channels, context management, proper resource cleanup
- Next.js 16: Modern React framework with App Router, TypeScript
- Real-time UI: EventSource API integration, live progress tracking, state management
- Responsive Design: Tailwind CSS, component architecture, dark mode support
- API Integration: RESTful client, error handling, file upload/download flows
- Python: Pipeline architecture, object-oriented design, type hints
- Deep Learning: PyTorch, YOLO object detection, custom model inference
- LLM Integration: llama.cpp, GGUF quantization, prompt engineering
- Computer Vision: OpenCV, image processing, inpainting algorithms, threshold detection
- OCR: MangaOCR integration, text detection, language processing
- Docker: Multi-stage builds, docker-compose orchestration, container networking
- CI/CD Ready: Structured for automated deployment pipelines
- Environment Management: Configuration patterns, secret handling, multi-environment support
- Service Architecture: Microservices, inter-service communication, process orchestration
- Architecture: Hexagonal/Clean Architecture, separation of concerns, SOLID principles
- API Design: RESTful conventions, proper HTTP semantics, error handling patterns
- Code Quality: Type safety (Go, TypeScript), linting (Ruff, golangci-lint), modular design
- Documentation: Comprehensive README files, inline comments, changelog management
- Version Control: Git workflows, semantic versioning, project organization
- GPU Acceleration: CUDA integration, VRAM management, 4-bit quantization
- Streaming: Chunked processing, real-time progress reporting, buffering strategies
- Database: Query optimization, indexing, connection pooling
- Caching: Redis caching strategies, file system optimization
We welcome contributions from the community! Whether you want to fix bugs, add features, improve documentation, or optimize performance, your help is appreciated.
Before contributing:
- 📖 Read our CONTRIBUTING.md guide
- 💬 Open an Issue to discuss significant changes (especially for
/ai-workermodifications) - ✅ Follow code standards: Ruff (Python), golangci-lint (Go), ESLint (Frontend)
- 🧪 Include tests and documentation with your changes
Languages: Contributions can be made in French or English.
This project is licensed under a Custom Non-Commercial Open Source License.
You are free to:
- ✅ Use, modify, and distribute for personal, educational, or research purposes
- ✅ Fork and create derivative works (non-commercial)
- ✅ Contribute back to the project
Restrictions:
- ❌ Commercial use requires explicit permission from the author
- 📧 For commercial licensing: contact@antoinerospars.dev
See LICENSE for full terms.
Copyright (c) 2026 P4ST4S / Antoine Rospars
- Models: Qwen (Alibaba Cloud), YOLOv8 (Ultralytics), MangaOCR (kha-white).
- Tech: Llama.cpp, PyTorch, Pillow.
Current Version: V10 (Stable) - Intelligent Masked Inpainting
See CHANGELOG for detailed version history.





