Persistent memory for AI agents. Search, think, recall — across every conversation you've ever had.
Your AI agent forgets everything between sessions. Every architecture decision, every debugging session, every preference you've expressed — gone. You repeat yourself constantly.
BrainLayer fixes this. It's a local-first memory layer that gives any MCP-compatible AI agent the ability to remember, think, and recall across conversations.
"What approach did I use for auth last month?" → brain_search
"Show me everything about this file's history" → brain_recall
"What was I working on yesterday?" → brain_recall
"Remember this decision for later" → brain_store
pip install brainlayer
brainlayer init # Interactive setup wizard
brainlayer index # Index your Claude Code conversationsThen add to your editor's MCP config:
Claude Code (~/.claude.json):
{
"mcpServers": {
"brainlayer": {
"command": "brainlayer-mcp"
}
}
}Other editors (Cursor, Zed, VS Code)
Cursor (MCP settings):
{
"mcpServers": {
"brainlayer": {
"command": "brainlayer-mcp"
}
}
}Zed (settings.json):
{
"context_servers": {
"brainlayer": {
"command": { "path": "brainlayer-mcp" }
}
}
}VS Code (.vscode/mcp.json):
{
"servers": {
"brainlayer": {
"command": "brainlayer-mcp"
}
}
}That's it. Your agent now has persistent memory across every conversation.
graph LR
A["Claude Code / Cursor / Zed"] -->|MCP| B["BrainLayer MCP Server<br/>3 tools"]
B --> C["Hybrid Search<br/>semantic + keyword (RRF)"]
C --> D["SQLite + sqlite-vec<br/>single .db file"]
E["Claude Code JSONL<br/>conversations"] --> F["Pipeline"]
F -->|extract → classify → chunk → embed| D
G["Local LLM<br/>Ollama / MLX"] -->|enrich| D
Everything runs locally. No cloud accounts, no API keys, no Docker, no database servers.
| Component | Implementation |
|---|---|
| Storage | SQLite + sqlite-vec (single .db file, WAL mode) |
| Embeddings | bge-large-en-v1.5 via sentence-transformers (1024 dims, runs on CPU/MPS) |
| Search | Hybrid: vector similarity + FTS5 keyword, merged with Reciprocal Rank Fusion |
| Enrichment | Local LLM via Ollama or MLX — 10-field metadata per chunk |
| MCP Server | stdio-based, MCP SDK v1.26+, compatible with any MCP client |
| Clustering | HDBSCAN + UMAP for brain graph visualization (optional) |
| Tool | Description |
|---|---|
brain_search |
Semantic search — unified search across query, file_path, chunk_id, filters. |
brain_store |
Persist memories — ideas, decisions, learnings, mistakes. Auto-type/auto-importance. |
brain_recall |
Proactive retrieval — current context, sessions, session summaries. |
Old brainlayer_* names still work as aliases.
brain_searchaliases:brainlayer_search,brainlayer_context,brainlayer_stats,brainlayer_list_projects,brainlayer_file_timeline,brainlayer_operations,brainlayer_regression,brainlayer_plan_links,brainlayer_thinkbrain_storealias:brainlayer_storebrain_recallaliases:brainlayer_recall,brainlayer_current_context,brainlayer_sessions,brainlayer_session_summary
BrainLayer enriches each chunk with 10 structured metadata fields using a local LLM:
| Field | Example |
|---|---|
summary |
"Debugging Telegram bot message drops under load" |
tags |
"telegram, debugging, performance" |
importance |
8 (architectural decision) vs 2 (directory listing) |
intent |
debugging, designing, implementing, configuring, deciding, reviewing |
primary_symbols |
"TelegramBot, handleMessage, grammy" |
resolved_query |
"How does the Telegram bot handle rate limiting?" |
epistemic_level |
hypothesis, substantiated, validated |
version_scope |
"grammy 1.32, Node 22" |
debt_impact |
introduction, resolution, none |
external_deps |
"grammy, Supabase, Railway" |
Two local LLM backends:
| Backend | Best for | Speed |
|---|---|---|
| MLX (Apple Silicon) | M1/M2/M3 Macs | 21-87% faster than Ollama |
| Ollama | Any platform | ~1s/chunk (short), ~13s (long) |
brainlayer enrich # Default backend (auto-detects)
BRAINLAYER_ENRICH_BACKEND=mlx brainlayer enrich --batch-size=100| BrainLayer | Mem0 | Zep/Graphiti | Letta | LangChain Memory | |
|---|---|---|---|---|---|
| MCP native | 3 tools | 1 server | 1 server | No | No |
| Think / Recall | Yes | No | No | No | No |
| Local-first | SQLite | Cloud-first | Cloud-only | Docker+PG | Framework |
| Zero infra | pip install |
API key | API key | Docker | Multiple deps |
| Multi-source | 6 sources | API only | API only | API only | API only |
| Enrichment | 10 fields | Basic | Temporal | Self-write | None |
| Session analysis | Yes | No | No | No | No |
| Open source | Apache 2.0 | Apache 2.0 | Source-available | Apache 2.0 | MIT |
BrainLayer is the only memory layer that:
- Thinks before answering — categorizes past knowledge by intent (decisions, bugs, patterns) instead of raw search results
- Runs on a single file — no database servers, no Docker, no cloud accounts
- Works with every MCP client — 3 tools, instant integration, zero SDK
brainlayer init # Interactive setup wizard
brainlayer index # Index new conversations
brainlayer search "query" # Semantic + keyword search
brainlayer enrich # Run LLM enrichment on new chunks
brainlayer enrich-sessions # Session-level analysis (decisions, learnings)
brainlayer stats # Database statistics
brainlayer brain-export # Generate brain graph JSON
brainlayer export-obsidian # Export to Obsidian vault
brainlayer dashboard # Interactive TUI dashboardAll configuration is via environment variables:
| Variable | Default | Description |
|---|---|---|
BRAINLAYER_DB |
~/.local/share/brainlayer/brainlayer.db |
Database file path |
BRAINLAYER_ENRICH_BACKEND |
auto-detect (MLX on Apple Silicon, else Ollama) | Enrichment LLM backend |
BRAINLAYER_ENRICH_MODEL |
glm-4.7-flash |
Ollama model name |
BRAINLAYER_MLX_MODEL |
mlx-community/Qwen2.5-Coder-14B-Instruct-4bit |
MLX model identifier |
BRAINLAYER_OLLAMA_URL |
http://127.0.0.1:11434/api/generate |
Ollama API endpoint |
BRAINLAYER_MLX_URL |
http://127.0.0.1:8080/v1/chat/completions |
MLX server endpoint |
BRAINLAYER_STALL_TIMEOUT |
300 |
Seconds before killing a stuck enrichment chunk |
BRAINLAYER_HEARTBEAT_INTERVAL |
25 |
Log progress every N chunks during enrichment |
BRAINLAYER_SANITIZE_EXTRA_NAMES |
(empty) | Comma-separated names to redact from indexed content |
BRAINLAYER_SANITIZE_USE_SPACY |
true |
Use spaCy NER for PII detection |
pip install "brainlayer[brain]" # Brain graph visualization (HDBSCAN + UMAP)
pip install "brainlayer[cloud]" # Cloud backfill (Gemini Batch API)
pip install "brainlayer[youtube]" # YouTube transcript indexing
pip install "brainlayer[ast]" # AST-aware code chunking (tree-sitter)
pip install "brainlayer[dev]" # Development: pytest, ruffBrainLayer can index conversations from multiple sources:
| Source | Format | Indexer |
|---|---|---|
| Claude Code | JSONL (~/.claude/projects/) |
brainlayer index |
| Claude Desktop | JSON export | brainlayer index --source desktop |
Exported .txt chat |
brainlayer index --source whatsapp |
|
| YouTube | Transcripts via yt-dlp | brainlayer index --source youtube |
| Markdown | Any .md files |
brainlayer index --source markdown |
| Manual | Via MCP tool | brain_store |
pip install -e ".[dev]"
pytest tests/ # Full suite (268 tests)
pytest tests/ -m "not integration" # Unit tests only (fast)
ruff check src/ # LintingContributions welcome! See CONTRIBUTING.md for dev setup, testing, and PR guidelines.
Apache 2.0 — see LICENSE.
BrainLayer was originally developed as "Zikaron" (Hebrew: memory) inside a personal AI agent ecosystem. It was extracted into a standalone project because every developer deserves persistent AI memory — not just the ones building their own agent systems.