Skip to content

Nebula-Block-Data/rag-example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

NebulaRAG

A production-ready RAG (Retrieval-Augmented Generation) pipeline designed to work with NebulaBlock's Inference API. This project demonstrates how to build a complete RAG system with document indexing, semantic search, state-of-the-art reranking, and answer generation.

πŸš€ Features

  • Production-Ready: Robust error handling, compression support, and browser-like headers
  • State-of-the-Art Models: BAAI/bge-reranker-v2-m3 for superior reranking performance
  • Lightweight: Minimal dependencies, no heavy ML frameworks
  • Configurable: Environment-based configuration for all endpoints and models
  • OpenAI-Compatible: Works with OpenAI-compatible APIs
  • Complete Pipeline: Document splitting β†’ embedding β†’ retrieval β†’ reranking β†’ generation
  • CLI Interface: Easy-to-use command-line interface with comprehensive options
  • In-Memory Store: Fast vector similarity search with cosine similarity
  • Compression Support: Handles Brotli and Gzip compression automatically
  • Cloudflare Bypass: Browser-like headers to avoid security blocks

πŸ“‹ Requirements

  • Python 3.8+
  • NebulaBlock API access
  • Internet connection for API calls

πŸ› οΈ Installation

Option 1: Development Installation (Recommended)

# Clone the repository
git clone <repository-url>
cd rag-example

# Install in development mode
pip install -e .

Option 2: Direct Usage

# Clone the repository
git clone <repository-url>
cd rag-example

# Install dependencies
pip install -r requirements.txt

# Run directly
python -m nebularag.cli.main --help

βš™οΈ Configuration

Environment Variables

Create a .env file in the project root with the following variables:

# Required
NEBULABLOCK_BASE_URL=https://inference.nebulablock.com/v1
NEBULABLOCK_API_KEY=sk-your-api-key-here

# Optional (defaults shown)
NEBULABLOCK_EMBEDDINGS_PATH=/embeddings
NEBULABLOCK_RERANK_PATH=/rerank
NEBULABLOCK_CHAT_PATH=/chat/completions

# Models (optimized for performance)
NEBULABLOCK_EMBEDDING_MODEL=Qwen/Qwen3-Embedding-8B
NEBULABLOCK_RERANKER_MODEL=BAAI/bge-reranker-v2-m3
NEBULABLOCK_CHAT_MODEL=mistralai/Mistral-Small-3.2-24B-Instruct-2506

Default Models

  • Embedding: Qwen/Qwen3-Embedding-8B - High-quality 4096-dimensional embeddings
  • Reranker: BAAI/bge-reranker-v2-m3 - State-of-the-art reranking model for superior relevance scoring
  • Chat: mistralai/Mistral-Small-3.2-24B-Instruct-2506 - Powerful instruction-following model

πŸ“ Project Structure

rag-example/
β”œβ”€β”€ nebularag/                    # Main package
β”‚   β”œβ”€β”€ cli/                      # Command-line interface
β”‚   β”‚   └── main.py              # CLI entry point
β”‚   β”œβ”€β”€ clients/                  # External API clients
β”‚   β”‚   └── nebula_client.py     # NebulaBlock API client
β”‚   β”œβ”€β”€ config/                   # Configuration management
β”‚   β”‚   └── settings.py          # Environment settings
β”‚   β”œβ”€β”€ core/                     # Core RAG components
β”‚   β”‚   β”œβ”€β”€ rag_pipeline.py      # Main RAG pipeline
β”‚   β”‚   └── vector_store.py      # In-memory vector store
β”‚   └── utils/                    # Utility functions
β”‚       β”œβ”€β”€ file_utils.py        # File operations
β”‚       └── text_processing.py   # Text splitting utilities
β”œβ”€β”€ tests/                        # Test suite
β”‚   └── test_api.py              # API connectivity tests
β”œβ”€β”€ examples/                     # Usage examples
β”‚   └── basic_usage.py           # Programmatic usage example
β”œβ”€β”€ docs/                         # Sample documents
β”‚   └── sample.md                # Example markdown file
β”œβ”€β”€ setup.py                      # Package configuration
β”œβ”€β”€ requirements.txt              # Python dependencies
β”œβ”€β”€ .env.example                  # Environment template
β”œβ”€β”€ .gitignore                    # Git ignore rules
└── README.md                     # This file

πŸš€ Usage

Basic Usage

  1. Prepare your documents: Place .txt, .md, or .pdf files in a directory (e.g., docs/)

  2. Set up environment: Copy .env.example to .env and fill in your API credentials

  3. Run the RAG pipeline:

    python -m nebularag.cli.main --docs docs --question "Why machine learning with nebula block?"

Advanced Usage

# Custom chunk size and overlap
python -m nebularag.cli.main \
  --docs docs \
  --question "Why machine learning with nebula block?" \
  --chunk-size 1000 \
  --chunk-overlap 150 \
  --top-k 15 \
  --rerank-k 8

CLI Options

Option Description Default
--docs Path to documents directory Required
--question Question to ask Required
--chunk-size Size of text chunks 800
--chunk-overlap Overlap between chunks 120
--top-k Number of candidates to retrieve 12
--rerank-k Number of candidates after reranking 6

Programmatic Usage

from nebularag import RAGPipeline, NebulaBlockClient, read_text_files
from nebularag.config import get_settings

# Initialize the RAG pipeline
client = NebulaBlockClient()
rag = RAGPipeline(
    client=client,
    chunk_size=800,
    chunk_overlap=120,
    top_k=12,
    rerank_k=6
)

# Load and index documents
docs = read_text_files('docs')
rag.index_texts(docs)

# Ask questions
result = rag.answer("What is the main topic?")
print(f"Answer: {result['answer']}")
print(f"Sources: {len(result['sources'])} chunks")

Testing the API

Test your NebulaBlock API connection:

python tests/test_api.py

πŸ”§ How It Works

RAG Pipeline Flow

  1. Document Processing:

    • Reads .txt, .md, and .pdf files from the specified directory
    • Extracts text content from PDFs using PyPDF2
    • Splits documents into overlapping chunks (default: 800 chars, 120 overlap)
  2. Indexing:

    • Generates embeddings for each chunk using Qwen/Qwen3-Embedding-8B
    • Stores embeddings in an in-memory vector store with cosine similarity
  3. Retrieval:

    • Embeds the user question
    • Retrieves top-K most similar chunks by cosine similarity
  4. Reranking:

    • Sends retrieved candidates to BAAI/bge-reranker-v2-m3
    • Reranks based on relevance to the question with superior accuracy
    • Selects top rerank-K candidates
  5. Generation:

    • Combines reranked chunks as context
    • Sends context + question to Mistral-Small-3.2-24B-Instruct-2506
    • Returns the generated answer with source citations

API Compatibility

The client assumes OpenAI/Cohere-like JSON structures but keeps endpoints configurable:

  • Embeddings: POST /embeddings with {"model": "...", "input": [...]}
  • Reranking: POST /rerank with {"model": "...", "query": "...", "documents": [...]}
  • Chat: POST /chat/completions with {"model": "...", "messages": [...]}

Advanced Features

  • Compression Support: Automatically handles Brotli and Gzip compression
  • Cloudflare Bypass: Uses browser-like headers to avoid security blocks
  • Error Handling: Comprehensive error handling with retries and fallbacks
  • Unicode Support: Robust text encoding with UTF-8 and Latin-1 fallbacks

πŸ§ͺ Examples

Example 1: Basic Question Answering

# With sample documents
python -m nebularag.cli.main \
  --docs docs \
  --question "Why machine learning with nebula block?"

Example 2: Multi-Question Demo

# Run the comprehensive demo
python examples/basic_usage.py

Example 3: Using OpenAI SDK (Alternative)

If you prefer the official OpenAI client:

from openai import OpenAI
import os

client = OpenAI(
    base_url=os.environ["NEBULABLOCK_BASE_URL"],
    api_key=os.environ["NEBULABLOCK_API_KEY"]
)

# Embedding
response = client.embeddings.create(
    model=os.environ["NEBULABLOCK_EMBEDDING_MODEL"],
    input=["hello world"]
)

# Chat
response = client.chat.completions.create(
    model=os.environ["NEBULABLOCK_CHAT_MODEL"],
    messages=[{"role": "user", "content": "Hi"}]
)

πŸ› οΈ Development

Running Tests

# Test API connectivity
python tests/test_api.py

# Test imports
python -c "from nebularag import NebulaBlockClient, RAGPipeline; print('Import successful!')"

# Run the full demo
python examples/basic_usage.py

Adding New Features

  1. New Vector Store: Implement the interface in nebularag/core/vector_store.py
  2. New Splitters: Add functions to nebularag/utils/text_processing.py
  3. New Clients: Extend nebularag/clients/nebula_client.py or create new client classes

πŸ“ Notes

  • This is a production-ready implementation with robust error handling
  • For production use, consider:
    • Persistent vector databases (Pinecone, Weaviate, etc.)
    • Semantic chunking strategies
    • Caching mechanisms
    • Rate limiting and retry logic
  • The reranker uses BAAI/bge-reranker-v2-m3 for superior performance
  • All API calls are synchronous; async support can be added for better performance
  • Compression and encoding issues are handled automatically

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Troubleshooting

Common Issues

  1. ModuleNotFoundError: Make sure you've installed the package with pip install -e .
  2. API Key Error: Verify your NEBULABLOCK_API_KEY is set correctly
  3. Connection Error: Check your NEBULABLOCK_BASE_URL and internet connection
  4. Empty Results: Ensure your documents directory contains .txt, .md, or .pdf files
  5. Compression Error: Install Brotli with pip install brotli>=1.0.9
  6. Cloudflare Block: The client automatically uses browser-like headers to bypass this

Getting Help

  • Check the Issues page
  • Review the API documentation for NebulaBlock
  • Test your API connection with python tests/test_api.py

🎯 Performance

  • Embedding Model: Qwen/Qwen3-Embedding-8B provides 4096-dimensional embeddings
  • Reranker: BAAI/bge-reranker-v2-m3 offers state-of-the-art relevance scoring
  • Chat Model: Mistral-Small-3.2-24B-Instruct-2506 delivers high-quality responses
  • Vector Search: Cosine similarity with in-memory storage for fast retrieval
  • Compression: Automatic Brotli/Gzip handling for efficient data transfer

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages