Manga AI Translator

An automated, privacy-focused, GPU-accelerated pipeline to translate manga and comics locally.

This project aims to provide a full-stack solution (Frontend, Backend, and AI Worker) to detect text bubbles, perform OCR, translate contextually using LLMs, and typeset the result back into the original image—all without external APIs or recurring costs.

Demo

Architecture

The project follows a Microservices architecture to ensure the heavy AI processing doesn't block the web server.

Project Structure

Module	Status	Description
`/ai-worker`	v10.0	The core Python engine. Handles Computer Vision, OCR, and LLM Inference on GPU.
`/backend-api`	v2.0	High-performance Go API with real-time SSE progress, Redis pub/sub, ZIP extraction, and nested file support.
`/frontend`	v1.0	Modern Web UI (Next.js 16) for drag-and-drop uploads and reading translated chapters.

What's New in Backend v2.0

The backend API has been significantly enhanced with production-ready features:

🔴 Live Progress Streaming

Real-time SSE updates showing page-by-page translation progress
Instant feedback with proper Python stdout unbuffering
Reliable broadcasting via Redis pub/sub architecture
Connection stability with proper resource cleanup and error handling

📦 Enhanced ZIP Support

Automatic extraction of original and translated archives
Subdirectory preservation - maintains complex folder structures
Instant page counting - displays total pages immediately on upload
Smart path handling - supports nested directories and Unicode filenames

🏗️ Architecture Improvements

Proper SSE lifecycle with deferred cleanup in goroutines
Wildcard routing for flexible file serving
Enhanced logging with detailed progress tracking
Type-safe callbacks throughout the translation pipeline

Key Features (AI Worker V10)

The core engine is currently fully operational.

Perfs (RTX 2060 12GB):

29 pages/minute
~1,700 pages/hour
Batch processing (.zip native)
100% Local & Uncensored: Powered by llama.cpp and Abliterated models. No moralizing, just translation.
Smart Detection: Uses YOLOv8 fine-tuned on Manga109 to detect speech bubbles.
- Smart Box Merging automatically consolidates fragmented vertical text bubbles.
Specialized OCR: Uses MangaOCR to handle vertical Japanese text and handwritten fonts.
Context-Aware Translation:
- Uses Qwen 2.5 7B (Instruction tuned).
- Custom prompt engineering to handle "Subject-less" Japanese sentences.
- "Anti-Thinking" regex filters to remove internal LLM monologues.
Advanced Typesetting:
- NEW (V10): Intelligent Masked Inpainting - Uses OpenCV threshold detection and cv2.inpaint to remove ONLY dark text pixels, preserving artwork and backgrounds even when bounding boxes overlap.
- Pixel-Perfect Wrapping: Custom algorithm measuring exact pixel width of words to avoid overflow.
- Sanitization: Filters out unsupported characters (emojis, math symbols) to prevent font rendering glitches.
Batch Processing: Native support for .zip archives (extract → translate → repack).
Modular Architecture: Clean, maintainable codebase with separation of concerns for easy customization and extension.

Examples

See the V10 intelligent masked inpainting in action! These examples showcase the ability to preserve artwork while cleanly removing text.

Example 1: Naruto

Original (Japanese)

Translated (English)

Example 2: One Piece

Original (Japanese)

Translated (English)

V10 Improvements Demonstrated:

Clean text removal without damaging background artwork
Preserved bubble borders and shading
Accurate text positioning and sizing
No artifacts in overlapping bubble regions

Download Models

Before starting, download the required AI models:

📦 Download Models (Google Drive)

Required files:

Qwen2.5-7B-Instruct-abliterated-v2.Q4_K_M.gguf (~4.6 GB) - LLM for translation
manga-text-detector.pt - YOLO model for text bubble detection

Place these files in the ai-worker/models/ directory.

Quick Start

Option 1: One-command start (Recommended)

Two launcher scripts are provided at the project root. They handle everything: Docker services, the Go worker, and opening your browser automatically.

Prerequisites: Docker Desktop, Go 1.23+, Python 3.10+, CUDA 12.x

First-time setup

# Clone the repository
git clone <repository-url>
cd manga-translator

# Set up the Python AI worker environment (once)
cd ai-worker
python -m venv venv

# Windows:
venv\Scripts\activate
# Linux/Mac:
# source venv/bin/activate

pip install -r requirements.txt
cd ..

Launch

Windows:

run.bat

Linux / Mac:

chmod +x run.sh
./run.sh

Both scripts will:

Start all Docker services (PostgreSQL, Redis, Go API, Next.js frontend, Asynqmon)
Launch the Go worker in a separate terminal window (uses your GPU via the local Python venv)
Open http://localhost:3000 in your default browser

Service	URL
Frontend	http://localhost:3000
Backend API	http://localhost:8080
Asynq Monitor	http://localhost:8081

Why a hybrid setup? The AI pipeline (llama-cpp-python, PyTorch CUDA) requires direct GPU access which Docker on Windows cannot provide without the NVIDIA Container Toolkit. The Go worker runs natively on the host and spawns Python as a subprocess, while all other services run in Docker for easy reproducibility.

Option 2: Local Development

Run each component separately for development:

1. Start Database Services

cd backend-api
docker-compose up -d postgres redis

2. Set Up AI Worker

cd ../ai-worker
python -m venv venv
venv\Scripts\activate  # Windows
# or: source venv/bin/activate  # Linux/Mac
pip install -r requirements.txt

3. Run Backend

cd ../backend-api
cp .env.example .env
# Edit .env to configure paths (especially PYTHON_PATH)

# Run migrations
migrate -path ./migrations -database "postgres://manga_user:secure_pass@localhost:5432/manga_translator?sslmode=disable" up

# Start API server
go run ./cmd/api

# In another terminal, start worker
go run ./cmd/api --mode=worker

4. Run Frontend

cd ../frontend
npm install  # or: pnpm install
cp .env.local.example .env.local
npm run dev  # or: pnpm dev

Option 3: AI Worker Only (CLI)

Use just the AI worker for command-line batch translation:

cd ai-worker
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt

# Translate a single image or ZIP file
python main.py path/to/manga_chapter.zip

System Requirements

GPU: NVIDIA GPU with 6GB+ VRAM (Recommended: 8GB+)
CUDA: CUDA Toolkit 12.x
Python: 3.10+
Go: 1.23+ (for backend development)
Node.js: 20+ (for frontend development)
Docker: Docker Desktop (for containerized deployment)

Roadmap

AI Worker

Core AI Pipeline (Detection, OCR, Translation, Inpainting)
GPU Optimization (VRAM management, 4-bit quantization)
Smart Typesetting (Pixel wrapping, box merging)
Modular Code Architecture (Config, Services, Utils separation)

Backend API (v1.0 - Complete ✅)

Frontend (v0.1 - Complete ✅)

Modern UI with Next.js 16 and Tailwind CSS
Drag-and-drop file upload zone
API integration with backend
Real-time SSE progress tracking
Translation status dashboard
Interactive result viewer (original/translated toggle)
Thumbnail generation (future)
User authentication (future)

Infrastructure (Complete ✅)

Docker Compose (one-command full stack deployment)
PostgreSQL + Redis services
Multi-container orchestration (API + Worker + Frontend)
Asynq monitoring UI
CI/CD pipeline (future)
Prometheus/Grafana monitoring (future)

Technical Skills Demonstrated

This project showcases a comprehensive full-stack development skillset with modern technologies and architectural patterns:

Backend Development

Go: High-performance API with Fiber v3 framework, clean architecture principles
PostgreSQL: Database design, migrations, complex queries with pgx driver
Redis: Pub/sub messaging, caching, session management
Queue Systems: Asynq for distributed job processing and background tasks
Real-time Communication: Server-Sent Events (SSE) implementation with proper lifecycle management
File Processing: ZIP extraction, multi-format image handling, recursive directory operations
Concurrency: Goroutines, channels, context management, proper resource cleanup

Frontend Development

Next.js 16: Modern React framework with App Router, TypeScript
Real-time UI: EventSource API integration, live progress tracking, state management
Responsive Design: Tailwind CSS, component architecture, dark mode support
API Integration: RESTful client, error handling, file upload/download flows

AI/ML & Computer Vision

Python: Pipeline architecture, object-oriented design, type hints
Deep Learning: PyTorch, YOLO object detection, custom model inference
LLM Integration: llama.cpp, GGUF quantization, prompt engineering
Computer Vision: OpenCV, image processing, inpainting algorithms, threshold detection
OCR: MangaOCR integration, text detection, language processing

DevOps & Infrastructure

Docker: Multi-stage builds, docker-compose orchestration, container networking
CI/CD Ready: Structured for automated deployment pipelines
Environment Management: Configuration patterns, secret handling, multi-environment support
Service Architecture: Microservices, inter-service communication, process orchestration

Software Engineering Practices

Architecture: Hexagonal/Clean Architecture, separation of concerns, SOLID principles
API Design: RESTful conventions, proper HTTP semantics, error handling patterns
Code Quality: Type safety (Go, TypeScript), linting (Ruff, golangci-lint), modular design
Documentation: Comprehensive README files, inline comments, changelog management
Version Control: Git workflows, semantic versioning, project organization

Performance Optimization

GPU Acceleration: CUDA integration, VRAM management, 4-bit quantization
Streaming: Chunked processing, real-time progress reporting, buffering strategies
Database: Query optimization, indexing, connection pooling
Caching: Redis caching strategies, file system optimization

Contributing

We welcome contributions from the community! Whether you want to fix bugs, add features, improve documentation, or optimize performance, your help is appreciated.

Before contributing:

📖 Read our CONTRIBUTING.md guide
💬 Open an Issue to discuss significant changes (especially for /ai-worker modifications)
✅ Follow code standards: Ruff (Python), golangci-lint (Go), ESLint (Frontend)
🧪 Include tests and documentation with your changes

Languages: Contributions can be made in French or English.

License

This project is licensed under a Custom Non-Commercial Open Source License.

You are free to:

✅ Use, modify, and distribute for personal, educational, or research purposes
✅ Fork and create derivative works (non-commercial)
✅ Contribute back to the project

Restrictions:

❌ Commercial use requires explicit permission from the author
📧 For commercial licensing: contact@antoinerospars.dev

See LICENSE for full terms.

Credits

Models: Qwen (Alibaba Cloud), YOLOv8 (Ultralytics), MangaOCR (kha-white).
Tech: Llama.cpp, PyTorch, Pillow.

Current Version: V10 (Stable) - Intelligent Masked Inpainting

See CHANGELOG for detailed version history.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
ai-worker		ai-worker
backend-api		backend-api
docs		docs
frontend		frontend
.dockerignore		.dockerignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
docker-compose.yml		docker-compose.yml
run.bat		run.bat
run.sh		run.sh

License

P4ST4S/AutoScanlate-AI

Folders and files

Latest commit

History

Repository files navigation

Manga AI Translator

Demo

Architecture

Project Structure

What's New in Backend v2.0

🔴 Live Progress Streaming

📦 Enhanced ZIP Support

🏗️ Architecture Improvements

Key Features (AI Worker V10)

Examples

Example 1: Naruto

Example 2: One Piece

Download Models

Quick Start

Option 1: One-command start (Recommended)

First-time setup

Launch

Option 2: Local Development

1. Start Database Services

2. Set Up AI Worker

3. Run Backend

4. Run Frontend

Option 3: AI Worker Only (CLI)

System Requirements

Roadmap

AI Worker

Backend API (v1.0 - Complete ✅)

Frontend (v0.1 - Complete ✅)

Infrastructure (Complete ✅)

Technical Skills Demonstrated

Backend Development

Frontend Development

AI/ML & Computer Vision

DevOps & Infrastructure

Software Engineering Practices

Performance Optimization

Contributing

License

Credits

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Contributors 3

Languages