Skip to content

Latest commit

 

History

History
322 lines (258 loc) · 7.69 KB

File metadata and controls

322 lines (258 loc) · 7.69 KB

Complete Local RAG Setup Guide - Cross-Platform (Windows/Mac/Linux)

What We Built

A 100% local, ZERO-COST RAG system that replaces:

  • OpenAI Embeddings ($0.13/million tokens) → Ollama/SentenceTransformers (FREE)
  • Anthropic Claude ($0.25-1.25/million tokens) → Ollama Local LLMs (FREE)
  • ChromaDB → LanceDB (10x faster)

Prerequisites Installed

  • Python 3.9+ (via system or UV)
  • 8GB RAM minimum (16GB recommended, 32GB optimal)
  • OS: Windows 10/11, macOS 10.15+, Linux (Ubuntu 20.04+)
  • Shell: PowerShell 5.1+ (Windows) or Bash (Mac/Linux)

Step-by-Step Setup Process

Quick Start (One Command)

Windows (PowerShell)

.\setup_complete.ps1

Mac/Linux (Bash)

chmod +x setup_complete.sh
./setup_complete.sh

Manual Setup Process

1. Install UV Package Manager

Windows (PowerShell)

.\setup_uv.ps1

Mac/Linux (Bash)

./setup_uv.sh
# Or directly:
curl -LsSf https://astral.sh/uv/install.sh | sh

2. Install Ollama

Windows

# Download and run installer:
# https://github.com/ollama/ollama/releases/latest/download/OllamaSetup.exe
# Or use script:
.\install_ollama_windows.ps1

Mac

# Using Homebrew
brew install ollama
# Or using script
./install_ollama_unix.sh

Linux

# Official installer
curl -fsSL https://ollama.ai/install.sh | sh
# Or using script
./install_ollama_unix.sh

3. Start Ollama Service

Windows (PowerShell)

# Direct path
& "$env:LOCALAPPDATA\Programs\Ollama\ollama.exe" serve
# Or if in PATH
ollama serve
# Or use batch file
.\run_ollama_direct.bat

Mac/Linux (Bash)

ollama serve
# Or in background
nohup ollama serve > /tmp/ollama.log 2>&1 &
# Or use script
./run_ollama_direct.sh

Keep this running in a separate terminal!

4. Pull Required Models

All Platforms

# Pull embedding model (274MB)
ollama pull nomic-embed-text

# Pull LLM based on your RAM:
# 8GB RAM - Mistral 7B (4.4GB)
ollama pull mistral

# 16GB RAM - Llama 2 13B
ollama pull llama2:13b

# 32GB+ RAM - Mixtral (best quality)
ollama pull mixtral

Windows (if ollama not in PATH)

$ollama = "$env:LOCALAPPDATA\Programs\Ollama\ollama.exe"
& $ollama pull nomic-embed-text
& $ollama pull mistral

5. Install Local RAG Dependencies

Using UV (recommended - all platforms)

uv pip install lancedb pyarrow sentence-transformers

Using pip (fallback)

pip install lancedb pyarrow sentence-transformers

6. Test Everything Works

All Platforms

# Run comprehensive test
python test_local_rag.py

Test Ollama API

# List models
curl http://localhost:11434/api/tags

# Test embedding
curl -X POST http://localhost:11434/api/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model":"nomic-embed-text","prompt":"test"}'

# Test LLM
curl -X POST http://localhost:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{"model":"mistral","prompt":"Hello","stream":false}'

File Structure Created

RAG Shit/
├── src/
│   ├── vector_store_lancedb.py    # LanceDB implementation
│   ├── embeddings_local.py        # Ollama/ST embeddings
│   ├── llm_local.py               # Ollama LLM wrapper
│   └── rag_pipeline_local.py     # Complete local pipeline
├── config/
│   ├── local_rag_config.yaml     # Configuration file
│   └── settings.py                # Python settings
├── Windows Scripts/
│   ├── install_ollama_windows.ps1 # Ollama installer
│   ├── setup_ollama_models.ps1    # Model downloader
│   ├── setup_complete.ps1         # One-click setup
│   ├── run_ollama_direct.bat      # Direct launcher
│   └── setup_uv.ps1               # UV installer
├── Unix Scripts/
│   ├── install_ollama_unix.sh     # Ollama installer
│   ├── setup_ollama_models.sh     # Model downloader
│   ├── setup_complete.sh          # One-click setup
│   ├── run_ollama_direct.sh       # Direct launcher
│   └── setup_uv.sh                # UV installer
├── test_local_rag.py              # Test suite
├── LOCAL_RAG_SETUP.md            # Complete guide
├── QUICKSTART_LOCAL.md           # Quick start
└── MIGRATION_GUIDE.md            # Migration from APIs

Configuration Files

1. Ollama Models Configuration

Models are stored in: %USERPROFILE%\.ollama\models

Available models after setup:

  • nomic-embed-text - 768-dim embeddings
  • mistral:7b - 7B parameter LLM

2. LanceDB Configuration

Data stored in: ./data/lancedb/

Features enabled:

  • Vector search with ANN index
  • Hybrid search capability
  • Automatic versioning

3. Environment Variables

No API keys needed! But if you want fallback to APIs:

# .env file (OPTIONAL - only for fallback)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

Usage Examples

Basic Usage

from src.rag_pipeline_local import LocalRAGPipeline

# Initialize pipeline
rag = LocalRAGPipeline(
    llm_model="mistral:7b",
    embedding_model="nomic-embed-text"
)

# Add documents
docs = ["Your document text here..."]
rag.add_documents(docs)

# Query - ZERO COST!
response = rag.query("What is this about?")
print(f"Answer: {response.answer}")
print(f"Cost: ${response.cost}")  # Always $0.00!

With Sentence Transformers (No Ollama needed for embeddings)

rag = LocalRAGPipeline(
    llm_model="mistral:7b",
    embedding_model="all-MiniLM-L6-v2",
    use_sentence_transformers=True
)

Troubleshooting

Issue: "ollama: command not found"

Solution: Use full path or add to PATH:

$env:Path += ";$env:LOCALAPPDATA\Programs\Ollama"

Issue: "Ollama service not running"

Solution: Start in new window:

Start-Process "$env:LOCALAPPDATA\Programs\Ollama\ollama.exe" -ArgumentList "serve"

Issue: "Model not found"

Solution: Pull the model:

& "$env:LOCALAPPDATA\Programs\Ollama\ollama.exe" pull mistral

Issue: "UV command not found"

Solution: Run setup and restart PowerShell:

.\setup_uv.ps1
# Then restart PowerShell

Performance Metrics

On 32GB RAM system with Mistral 7B:

  • Embedding generation: ~0.2s per document
  • Vector search: ~0.05s for 10k documents
  • LLM generation: ~2-5s for 500 tokens
  • Total query time: ~3-6s
  • Cost: $0.00

Model Recommendations by RAM

RAM Embedding Model LLM Model Quality
4GB all-MiniLM-L6-v2 (ST) phi-2 Basic
8GB nomic-embed-text mistral:7b Good
16GB nomic-embed-text llama2:13b Excellent
32GB+ mxbai-embed-large mixtral:8x7b Best

Cost Comparison

Operation Old (APIs) New (Local) Savings
1M embeddings $130 $0 100%
1M LLM tokens $250-1250 $0 100%
Monthly (100 queries/day) ~$11.40 $0 100%
Yearly ~$137 $0 100%

Next Steps

  1. Optimize for your hardware:

    • GPU? Install CUDA-enabled llama.cpp
    • More RAM? Use larger models
    • SSD? Move LanceDB to SSD for faster search
  2. Production deployment:

    • Set up Ollama as Windows service
    • Add monitoring/logging
    • Implement caching layer
  3. Advanced features:

    • Implement reranking
    • Add hybrid search
    • Use quantized models for speed

The Bottom Line

You now have a complete local RAG system that:

  • Costs nothing to run
  • Protects your data (100% local)
  • Runs offline (no internet needed)
  • Has no rate limits
  • Performs comparably to cloud solutions

Stop paying for API calls. This is the way.