Benchmarking framework for comparing Node.js vector search libraries that run locally on Linux, Windows, and macOS.
Designed to simulate real-world media management use cases with synthetic datasets that mimic the cluster structure of real embeddings.
I'd been trying to find optimal configuration defaults for each library to balance indexing speed, query speed, storage, and recall, but given that the user may not know the final size of the corpus, there's a bit of chicken-and-egg going on here.
The current approach is to run both USearch and LanceDB with default settings, and let them figure out what we should do.
In any event, I'm quite sure that the current settings are suboptimal. If you have experience with these libraries and suggestions for better configuration parameters either during index build time or query time, please open an issue or PR!
| sqlite-vec | USearch | LanceDB | DuckDB VSS | |
|---|---|---|---|---|
| Algorithm | Brute force | HNSW | IVF_FLAT / IVF_PQ | HNSW |
| Recall | 100% (exact) | High, consistent across scales | Degrades at scale without tuning | High |
| Query speed | Slow at scale | Fast | Fast with tuned index | Moderate |
| Build speed | Fast (insert only) | Slow (graph construction) | Fast (centroid training) | Slow |
| Memory | Disk-backed (SQLite pager) | Full index in RAM | Memory-mapped on-disk | Aggressive RAM usage |
| Set and forget | Yes | Yes | No (needs per-scale numPartitions/nprobes) |
Yes |
| Large scale | Impractical above ~100k | Works well | Works if tuned | Excluded (OOM risk) |
| Best for | Small datasets needing exact results | Read-heavy workloads with build time budget | Large datasets that exceed RAM | Small datasets only |
npm install
npm run prepare # generate synthetic datasets + ground truth
npm run bench:xs # 1k vectors, 128d
npm run bench:s # 10k vectors, 512d
npm run bench:m # 100k vectors, 512d
npm run bench:l # 500k vectors, 512d
npm run bench:xl # 1M vectors, 512d
npm run bench:xxl # 2M vectors, 512dDimensions match real-world embedding models used in self-hosted media management.
| Size | Index vectors | Dim | Model class | Held-out queries | k values |
|---|---|---|---|---|---|
| xs | 1k | 128 | dlib faces | 200 | 10 |
| s | 10k | 512 | CLIP / FaceNet | 200 | 1, 10 |
| m | 100k | 512 | CLIP at scale | 200 | 1, 10 |
| l | 500k | 512 | CLIP large library | 200 | 1, 10 |
| xl | 1M | 512 | CLIP large collection | 200 | 1, 10 |
| xxl | 2M | 512 | CLIP power user | 200 | 1, 10 |
Custom profiles: create a JSON file in profiles/ and run npm run bench -- profiles/my-profile.json.
Charts are auto-generated after each benchmark run.
- GMM synthetic data: Vectors are generated using a Gaussian Mixture Model rather than uniform random points on the unit sphere. Cluster counts scale proportionally with dataset size (~250 vectors per cluster) to maintain consistent difficulty across profiles. See
profiles/*.jsonfor per-profile cluster counts andsrc/dataset.tsfor the generator. Uniform-on-sphere data is pathological for ANN algorithms — IVF and HNSW both plateau well below 90% recall because there is no exploitable cluster structure. - Default configurations: Each library runs with out-of-the-box defaults (no per-scale tuning). This tests what a user gets without manual optimization. Library-specific settings are in the profile JSONs under
usearch,lancedb, etc. - Held-out queries: Query vectors are held out from the same generated distribution (tail split, like ann-benchmarks' train/test split), so queries share the same cluster structure as the index but are never present in it. This avoids self-match bias while ensuring ANN recall is meaningful.
- Ground truth: Exact brute-force search via sqlite-vec (L2 distance). Recall is only computed when the ANN library's metric is compatible with L2; mismatches are rejected at startup.
- MaybePromise runners: Runner methods return
T | Promise<T>. Sync runners (sqlite-vec, USearch) avoid async overhead; async runners (LanceDB) return Promises. The harness awaits all calls uniformly. - 64-bit PRNG: Uses BigInt-based splitmix64 (full 2^64 period) for vector generation, sufficient for XL datasets (512M+ random draws).
- Index construction: USearch supports multi-threaded batch insertion via its C++ bindings. LanceDB and sqlite-vec insertions are single-threaded; sqlite-vec uses a transaction wrapper for batch commits.
- Reproducibility: Deterministic seed (42) for all vector generation. Index and query vectors are split from a single generation pass.
zvec (@zvec/zvec on npm) is Alibaba's in-process vector database built on Proxima. As of February 2026, it only supports Linux (x86_64, ARM64) and macOS (ARM64) — no Windows. PhotoStructure needs to run on all three platforms, so zvec is excluded.
DuckDB with its VSS extension provides HNSW indexing via CREATE INDEX ... USING HNSW. We evaluated it (@duckdb/node-api, MIT license) and found three problems:
- Memory consumption: DuckDB is an OLAP engine designed to use all available RAM for analytical queries. At 500k vectors (512d), it nearly OOMed a development machine. For a desktop app running on consumer hardware alongside other applications, this is disqualifying.
- Query overhead: Each vector search requires serializing the query vector as a SQL array literal (
[1.0,2.0,...]::FLOAT[512]— ~3KB of text per query). The@duckdb/node-apibindings don't yet support bindingFloat32Arraydirectly toFLOAT[]parameters (issue #182). At 100k vectors, DuckDB VSS achieved 39 QPS vs usearch at 367 QPS. - Index bloat: HNSW persistence is experimental (
SET hnsw_enable_experimental_persistence = true). The on-disk index was 2.5x larger than other libraries (488 MB vs ~200 MB at 100k vectors).
The runner is still included in xs/s/m profiles for reference, but excluded from l/xl/xxl to avoid OOM crashes.
LanceDB's Node.js table.add() accepts either row objects (Record<string, unknown>[]) or Apache Arrow tables. However, passing Arrow tables constructed from the project's own apache-arrow module silently fails — searches return empty results. This is because @lancedb/lancedb bundles its own apache-arrow instance, and the IPC serialization path can't handle Data objects from a different module. The instanceof Table duck-type check passes, but the underlying class hierarchies differ.
There are no parallelism, concurrency, or batch-size options exposed through the Node.js API. The NAPI binding serializes everything into a single Arrow IPC buffer per add() call. Concurrent add() calls from JS would serialize at LanceDB's internal write lock.
The runner uses row objects with vectors.subarray() (zero-copy view) fed to Array.from() (unavoidable float-to-number boxing), batched at 10k rows.
- Create a runner in
src/runners/extendingBenchmarkRunnerfromsrc/runners/base.ts - Implement
setup(),buildIndex(),search(), andcleanup()(sync or async) - Register it in
src/harness.ts - Add it to your profile's
librariesarray