GitHub - agentralabs/agentic-vision: Persistent visual memory for AI agents — capture screenshots, embed with CLIP ViT-B/32, compare, recall. MCP server + Rust core library.

Quickstart · Why · Benchmarks · How It Works · Install · Full Install Guide · Paper

AI agents can't see across sessions.

Your agent takes a screenshot, analyzes it, and forgets. Next session — blank slate. It can't compare what a page looks like now versus yesterday. It can't recall what the error dialog said three conversations ago. It can't search its own visual history.

Text-based memory exists. Visual memory doesn't — until now.

AgenticVision gives AI agents persistent visual memory. Capture images, embed them with CLIP ViT-B/32, store them in a compact binary format, and query them by similarity, time, or description. Every capture is a first-class MCP resource that any LLM can access.

cargo install agentic-vision-mcp

One binary. 11 MCP tools. Persistent .avis files. Works with Claude Desktop, VS Code, Cursor, Windsurf, and any MCP-compatible client.

Benchmarks

Rust core. CLIP ViT-B/32 via ONNX Runtime. Binary .avis format. Real numbers from cargo test --release:

Operation	Time	Notes
Image capture (file → embed → store)	47 ms	CLIP ViT-B/32, 512-dim
Similarity search (top-5)	1-2 ms	Brute-force cosine, f64 precision
Visual diff (pixel-level)	<1 ms	8×8 grid region detection
MCP tool round-trip	7.2 ms	Including process startup (~6.1 ms)
Storage per capture	~4.26 KB	Embedding + JPEG thumbnail
Capacity per GB	~250K	Observations

All benchmarks on Apple M4, macOS 26.2, Rust 1.90.0 --release. ONNX Runtime for CLIP inference. Fallback mode available when ONNX model is not present.

Why AgenticVision

Agents need visual continuity. A debugging agent should remember what the UI looked like before and after a code change. A monitoring agent should detect visual regressions. A research agent should build a visual knowledge base over time.

Capture once, query forever. Every image is embedded into a 512-dimensional CLIP vector and stored with its JPEG thumbnail, timestamp, and description. Query by cosine similarity, time range, or text search — in milliseconds.

Binary format, not a database. The .avis file is a single portable binary — 64-byte header, JSON payload, JPEG thumbnails. Copy it, share it, back it up. No server, no database, no dependencies.

Works with every MCP client. AgenticVision-MCP exposes 11 tools, 6 resources, and 4 prompts via the Model Context Protocol. Any LLM that speaks MCP gains visual memory automatically.

Links to AgenticMemory. The vision_link tool connects visual captures to AgenticMemory cognitive graph nodes — bridging what an agent sees with what it knows.

How It Works

Capture — vision_capture accepts images from files, base64, screenshots, or the system clipboard. Each image is resized, embedded via CLIP ViT-B/32 into a 512-dimensional vector, compressed to JPEG thumbnail, and stored in the .avis binary file. Screenshots support optional region capture; clipboard reads the current image from the OS clipboard.
Query — vision_query retrieves captures by time range, description, recency, and quality constraints (min_quality, sort_by). Results include capture metadata, quality scores, thumbnails, and similarity scores.
Compare — vision_compare places two captures side-by-side for LLM analysis. vision_diff performs pixel-level differencing with 8×8 grid region detection to identify exactly what changed.
Link — vision_link connects captures to AgenticMemory nodes, bridging visual observations with the agent's cognitive graph. An agent can recall "what did the UI look like when I made that decision?"

The .avis binary format uses a 64-byte fixed header (magic 0x41564953, version, counts, timestamps) followed by a JSON payload containing captures with embedded JPEG thumbnails and 512-dim float vectors. Single-file, portable, no external dependencies.

MCP surface area

11 Tools:

Tool	Description
`vision_capture`	Capture and embed an image (file, base64, screenshot, clipboard), with metadata redaction and quality scoring
`vision_compare`	Side-by-side comparison of two captures
`vision_query`	Query captures by time, description, recency
`vision_ocr`	Extract text from a captured image
`vision_similar`	Find visually similar captures (cosine similarity)
`vision_track`	Track visual changes to a target over time
`vision_diff`	Pixel-level diff between two captures
`vision_health`	Quality + staleness + memory-link coverage summary
`vision_link`	Link a capture to an AgenticMemory node
`session_start`	Begin a named observation session
`session_end`	End the current session

6 Resources:

URI	Description
`avis://capture/{id}`	Single capture with metadata and thumbnail
`avis://session/{id}`	All captures in a session
`avis://timeline/{start}/{end}`	Captures within a time range
`avis://similar/{id}`	Visually similar captures
`avis://stats`	Storage statistics and counts
`avis://recent`	Most recent captures

4 Prompts:

Prompt	Description
`observe`	Guided visual observation workflow
`compare`	Structured comparison between captures
`track`	Change tracking over time
`describe`	Detailed image description

Install

One-liner (desktop profile, backwards-compatible):

curl -fsSL https://agentralabs.tech/install/vision | bash

Environment profiles (one command per environment):

# Desktop MCP clients (auto-merge Claude Desktop + Claude Code when detected)
curl -fsSL https://agentralabs.tech/install/vision/desktop | bash

# Terminal-only (no desktop config writes)
curl -fsSL https://agentralabs.tech/install/vision/terminal | bash

# Remote/server hosts (no desktop config writes)
curl -fsSL https://agentralabs.tech/install/vision/server | bash

Channel	Command	Result
GitHub installer (official)	`curl -fsSL https://agentralabs.tech/install/vision \| bash`	Installs release binaries when available, otherwise source fallback; merges MCP config
GitHub installer (desktop profile)	`curl -fsSL https://agentralabs.tech/install/vision/desktop \| bash`	Explicit desktop profile behavior
GitHub installer (terminal profile)	`curl -fsSL https://agentralabs.tech/install/vision/terminal \| bash`	Installs binaries only; no desktop config writes
GitHub installer (server profile)	`curl -fsSL https://agentralabs.tech/install/vision/server \| bash`	Installs binaries only; server-safe behavior
crates.io + Cargo deps (official)	`cargo install agentic-vision-mcp` + `cargo add agentic-vision`	Installs MCP server binary and adds the core library crate to your project

Server auth and artifact sync

For cloud/server runtime:

export AGENTIC_TOKEN="$(openssl rand -hex 32)"

All MCP clients must send Authorization: Bearer <same-token>. If .avis/.amem/.acb files are on another machine, sync them to the server first.

MCP Server (for Claude Desktop, VS Code, Cursor, Windsurf):

cargo install agentic-vision-mcp

Core library (for Rust projects):

cargo add agentic-vision

Configure Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "vision": {
      "command": "agentic-vision-mcp",
      "args": ["--vision", "~/.vision.avis", "serve"]
    }
  }
}

See INSTALL.md for full installation guide, VS Code / Cursor configuration, build from source, and troubleshooting.

Do not use /tmp for vision files — macOS and Linux clear this directory periodically. Use ~/.vision.avis for persistent storage.

Deployment Model

Standalone by default: AgenticVision is independently installable and operable. Integration with AgenticMemory or AgenticCodebase is optional, never required.
Autonomic operations by default: daemon/runtime maintenance uses safe profile-based defaults with cache hygiene, migration safeguards, and health-ledger snapshots.

Area	Default behavior	Controls
Autonomic profile	Conservative local-first posture	`CORTEX_AUTONOMIC_PROFILE=desktop
Cache + registry maintenance	Periodic expiry cleanup and registry GC	`CORTEX_MAINTENANCE_TICK_SECS`, `CORTEX_REGISTRY_GC_EVERY_TICKS`, `CORTEX_REGISTRY_GC_KEEP_DELTAS`
Storage migration	Policy-gated with checkpointed auto-safe path	`CORTEX_STORAGE_MIGRATION_POLICY=auto-safe
Storage budget policy	20-year projection + capture rollup under pressure	`CORTEX_STORAGE_BUDGET_MODE=auto-rollup
Maintenance throttling	SLA-aware under sustained cache pressure	`CORTEX_SLA_MAX_CACHE_ENTRIES_BEFORE_GC_THROTTLE`
Health ledger	Periodic operational snapshots (default: `~/.agentra/health-ledger`)	`CORTEX_HEALTH_LEDGER_DIR`, `AGENTRA_HEALTH_LEDGER_DIR`, `CORTEX_HEALTH_LEDGER_EMIT_SECS`

Quickstart

MCP (Claude Desktop, VS Code, Cursor)

After configuring the MCP server (see Install), ask your agent:

"Take a screenshot and remember it."

The LLM calls vision_capture automatically. Then later:

"What did the screen look like earlier?"

The LLM calls vision_query to retrieve and display past captures.

Rust API

use agentic_vision::{VisionStore, CaptureSource};

let mut store = VisionStore::open("observations.avis")?;

// Capture from file
let id = store.capture(
    CaptureSource::File("screenshot.png"),
    "Homepage after deploy"
)?;

// Find similar
let matches = store.similar(id, 5)?;
for m in matches {
    println!("  {} (similarity: {:.3})", m.description, m.score);
}

Validation

Suite	Tests	Notes
Rust core (`agentic-vision`)	38	Unit + integration (includes screenshot/clipboard)
Python SDK tests	47	Edge cases, format validation
MCP integration suite	3	Python → Rust stdio transport
Multi-agent suite	3	Shared file, vision-memory linking, rapid handoff
Total	91	All passing

Two research papers:

Repository Structure

This is a Cargo workspace monorepo containing the core library and MCP server.

agentic-vision/
├── Cargo.toml                    # Workspace root
├── crates/
│   ├── agentic-vision/           # Core library (crates.io: agentic-vision v0.1.0)
│   └── agentic-vision-mcp/       # MCP server (crates.io: agentic-vision-mcp v0.1.0)
├── tests/                        # Integration tests (Python → Rust, multi-agent)
├── models/                       # ONNX model directory (CLIP ViT-B/32)
├── publication/                  # Research papers (I, II)
├── assets/                       # SVG diagrams and visuals
└── docs/                         # Guides and reference

Running Tests

# All workspace tests (unit + integration)
cargo test --workspace

# Core library only
cargo test -p agentic-vision

# MCP server only
cargo test -p agentic-vision-mcp

# Python integration tests
python tests/integration/test_mcp_clients.py
python tests/integration/test_multi_agent.py

MCP Server Quick Start

cargo install agentic-vision-mcp

Configure Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "vision": {
      "command": "agentic-vision-mcp",
      "args": ["--vision", "~/.vision.avis", "serve"]
    }
  }
}

agentic-vision-mcp supports both line-delimited JSON-RPC and Content-Length framed MCP stdio messages.

Roadmap: v0.2.0 — Remote Server Support

The next release is planned to add HTTP/SSE transport for remote deployments. Track progress in #2.

Feature	Status
`--token` bearer auth	Planned
`--multi-tenant` per-user vision files	Planned
`/health` endpoint	Planned
`--tls-cert` / `--tls-key` native HTTPS	Planned
OCR with Tesseract (`--features ocr`)	Planned
Clipboard TIFF fix	Planned
`delete` / `export` / `compact` CLI commands	Planned
Docker image + compose	Planned
Remote deployment docs	Planned

Planned CLI shape (not available in current release):

agentic-vision-mcp serve-http --port 8081 --token "<token>"
agentic-vision-mcp serve-http --multi-tenant --data-dir /data/users --port 8081 --token "<token>"

Contributing

See CONTRIBUTING.md. The fastest ways to help:

Try it and file issues
Add an MCP tool — extend the visual memory surface
Write an example — show a real use case
Improve docs — every clarification helps someone

_{Built by Agentra Labs}

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
.github		.github
assets		assets
benchmarks		benchmarks
clients		clients
crates		crates
docker		docker
docs		docs
examples		examples
extractors		extractors
integrations		integrations
models		models
packaging		packaging
publication		publication
runtime		runtime
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
GUIDE.md		GUIDE.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI agents can't see across sessions.

Benchmarks

Why AgenticVision

How It Works

Install

Server auth and artifact sync

Deployment Model

Quickstart

MCP (Claude Desktop, VS Code, Cursor)

Rust API

Validation

Repository Structure

Running Tests

MCP Server Quick Start

Roadmap: v0.2.0 — Remote Server Support

Contributing

About

Uh oh!

Releases 5

Sponsor this project

Uh oh!

Packages

Contributors 2

Uh oh!

Languages

Uh oh!

License

agentralabs/agentic-vision

Folders and files

Latest commit

History

Repository files navigation

AI agents can't see across sessions.

Benchmarks

Why AgenticVision

How It Works

Install

Server auth and artifact sync

Deployment Model

Quickstart

MCP (Claude Desktop, VS Code, Cursor)

Rust API

Validation

Repository Structure

Running Tests

MCP Server Quick Start

Roadmap: v0.2.0 — Remote Server Support

Contributing

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 5

Sponsor this project

Uh oh!

Packages 0

Contributors 2

Uh oh!

Languages

Packages