100% Local • Zero Cost • Complete Privacy
A pure Python implementation of Retrieval-Augmented Generation (RAG) that runs entirely on your machine. No API keys, no cloud services, no data leaving your computer.
- Python 3.9+
- 4GB+ RAM (8GB recommended)
- Windows, macOS, or Linux
Windows (PowerShell):
.\scripts\windows\setup_complete.ps1macOS/Linux:
./scripts/unix/setup_complete.shThis will:
- Install Ollama (local LLM server)
- Download TinyLlama model (1.1B parameters, runs on any machine)
- Download Nomic embedding model
- Set up Python environment
- Install all dependencies
-
Install Ollama:
- Download from ollama.ai
- Or use scripts:
.\scripts\windows\install_ollama_windows.ps1
-
Pull Models:
ollama pull tinyllama ollama pull nomic-embed-text
-
Install Dependencies:
pip install -r requirements.txt
python src/cli.pyfrom src.rag_pipeline_local import LocalRAGPipeline
# Initialize
rag = LocalRAGPipeline()
# Add documents
rag.add_documents([
"Python is a versatile programming language.",
"RAG combines retrieval and generation."
])
# Query
response = rag.query("What is Python?")
print(response['answer'])jupyter notebook rag_example.ipynbpython_example/
├── src/ # Core RAG implementation
│ ├── rag_pipeline_local.py # Main pipeline
│ ├── llm_local.py # Ollama LLM wrapper
│ ├── embeddings_local.py # Local embeddings
│ ├── vector_store_lancedb.py # Vector storage
│ ├── chunking.py # Text chunking
│ └── cli.py # Interactive CLI
├── scripts/ # Setup & utility scripts
│ ├── windows/ # Windows scripts
│ └── unix/ # macOS/Linux scripts
├── tests/ # Test suite
├── docs/ # Detailed documentation
├── config/ # Configuration files
└── requirements.txt # Python dependencies
- 100% Local: Everything runs on your machine
- Zero Cost: No API fees, ever
- Private: Your data never leaves your computer
- Fast: Optimized for local inference
- Simple: Clean API, easy to understand
- Extensible: Modular design for customization
Create a .env file (optional):
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_LLM_MODEL=tinyllama:latest
OLLAMA_EMBEDDING_MODEL=nomic-embed-text:latest
CHUNK_SIZE=500
CHUNK_OVERLAP=50| RAM | Model | Quality | Speed |
|---|---|---|---|
| 4GB | tinyllama | Good | Fast |
| 8GB | mistral | Better | Good |
| 16GB | llama2:13b | Great | Moderate |
| 32GB+ | mixtral | Best | Slower |
Run the test suite:
python tests/test_local_rag.py- Architecture - System design and components
- Setup Guide - Detailed installation instructions
- Learning RAG - Understanding RAG concepts
- Migration Guide - Upgrading from older versions
UV is a fast Python package manager:
# Install uv
.\scripts\windows\setup_uv.ps1 # Windows
./scripts/unix/setup_uv.sh # Unix
# Run with uv
uv run python src/cli.py# Format code
black src/
# Type checking
mypy src/
# Linting
pylint src/- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
MIT License - Use freely in your projects!
# Start Ollama
ollama serve
# Check if running
curl http://localhost:11434/api/tags# List models
ollama list
# Pull missing model
ollama pull tinyllama- Use smaller models (tinyllama instead of llama2)
- Reduce chunk_size in config
- Close other applications
- Privacy First: Your documents, your queries, your hardware
- No Vendor Lock-in: Not dependent on any cloud service
- Cost Effective: One-time setup, unlimited usage
- Fast Iteration: No network latency
- Full Control: Customize everything
Built with ❤️ for the local-first community