Skip to content

feat(echo-cas): content-addressed blob store (Phase 1)#263

Open
flyingrobots wants to merge 2 commits intomainfrom
echo-cas-phase1
Open

feat(echo-cas): content-addressed blob store (Phase 1)#263
flyingrobots wants to merge 2 commits intomainfrom
echo-cas-phase1

Conversation

@flyingrobots
Copy link
Owner

Summary

  • Adds echo-cas leaf crate with BlobStore trait and MemoryTier implementation
  • Content-addressed storage keyed by BLAKE3 hash — no dependency on warp-core
  • Advisory byte budget tracking for browser contexts (enforcement deferred to Phase 3 GC)
  • Set-based pin/unpin for retention roots (pre-pin on missing blobs is legal)
  • 14 unit tests covering round-trips, idempotence, hash verification, pinning invariants, and 8MB payload smoke test
  • WASM-compatible (wasm32-unknown-unknown target checks clean)

What's deferred

  • DiskTier / ColdTier (Phase 3)
  • Wire protocol: WANT / PROVIDE / FRAME (Phase 3)
  • GC sweep / eviction (Phase 3)
  • serde derives, async methods, iteration API

Test plan

  • cargo check -p echo-cas
  • cargo test -p echo-cas — 14/14 pass
  • cargo clippy -p echo-cas -- -D warnings — zero warnings
  • cargo check -p echo-cas --target wasm32-unknown-unknown
  • cargo doc -p echo-cas --no-deps
  • Pre-push hook passes (fmt, clippy, tests, rustdoc, patterns, determinism)

…oryTier)

Introduces echo-cas, a leaf crate providing the BlobStore trait and an
in-memory implementation (MemoryTier) for content-addressed storage keyed
by BLAKE3 hash. Sufficient for the in-browser website demo; disk/cold
tiers and GC are deferred to Phase 3.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 7, 2026

Summary by CodeRabbit

  • New Features

    • Added a content-addressed blob store with BLAKE3-based hashing and integrity verification.
    • Implemented an in-memory storage tier with idempotent put/get, verified uploads, and lifecycle operations (pin/unpin).
    • Added optional byte-budget tracking with over-budget detection and observable storage metrics.
  • Documentation

    • Included README and package metadata for the new blob store.

Walkthrough

A new echo-cas crate is added to the workspace, providing a content‑addressed blob store: a 32‑byte BLAKE3 BlobHash type, blob_hash() helper, CasError enum, a sync object‑safe BlobStore trait (put / put_verified / get / has / pin / unpin), and an in‑memory MemoryTier implementation with optional byte budgeting and tests.

Changes

Cohort / File(s) Summary
Workspace Configuration
Cargo.toml
Added crates/echo-cas to [workspace] members and added echo-cas = { version = "0.1.0", path = "crates/echo-cas" } to [workspace.dependencies].
Crate Manifest & Docs
crates/echo-cas/Cargo.toml, crates/echo-cas/README.md
New crate manifest with metadata and dependencies (blake3, thiserror); README with license/description for echo-cas.
Core API
crates/echo-cas/src/lib.rs
Introduces BlobHash (32‑byte wrapper + Display + as_bytes), blob_hash() helper, CasError::HashMismatch, and the BlobStore trait. Re-exports MemoryTier.
In‑Memory Tier
crates/echo-cas/src/memory.rs
Adds MemoryTier implementing BlobStore using HashMap<BlobHash, Arc<[u8]>> and HashSet for pins; optional max_bytes advisory budgeting; observables (len, byte_count, is_over_budget, pinned_count, is_pinned); comprehensive unit tests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~23 minutes

Poem

🧩 A tiny crate with hashes bright,
Stores each blob by blake3 light,
Pin and proof, verify with care,
MemoryTier holds bytes in air —
Neat, tested, tidy; review beware.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(echo-cas): content-addressed blob store (Phase 1)' directly describes the main change: adding a new echo-cas crate with a content-addressed blob store implementation, matching the PR's primary objective.
Description check ✅ Passed The description provides a comprehensive summary of the changeset, covering the BlobStore trait, MemoryTier implementation, testing approach, deferred work, and verification performed—all directly related to the PR's scope.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch echo-cas-phase1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🤖 Fix all issues with AI agents
In `@crates/echo-cas/Cargo.toml`:
- Line 19: Remove the dangling empty [dev-dependencies] header in Cargo.toml or
make it intentional: either delete the entire [dev-dependencies] section if
there are no dev dependencies, or populate it (e.g., add proptest/criterion) or
replace the header with a commented TODO (e.g., "# TODO: add dev-dependencies")
so the presence of the section is explicit; locate the orphaned
"[dev-dependencies]" token in the Cargo.toml and apply one of these options.
- Line 17: Update the thiserror dependency in Cargo.toml from the 1.x line to
the 2.x line: replace the dependency spec `thiserror = "1"` with `thiserror =
"2"` (or a caret/semver range like `^2`) and then run cargo update / cargo build
to verify proc-macro derives still compile for wasm32; if there is a documented
downstream blocker, add a comment next to the thiserror entry explaining why v2
cannot be used.

In `@crates/echo-cas/README.md`:
- Line 1: The README's SPDX header ("Apache-2.0 OR MIND-UCAL-1.0") conflicts
with the workspace Cargo.toml which uses license.workspace = true and the root
workspace license = "Apache-2.0"; fix by making the license declarations
consistent: either update the workspace/root Cargo.toml to include the
dual-license string "Apache-2.0 OR MIND-UCAL-1.0" (so all crates inherit the OR
license) or remove "MIND-UCAL-1.0" from the README SPDX header; check and update
the crate's Cargo.toml/license or the root workspace license field accordingly
to ensure Cargo.toml, license.workspace, and README.md match.

In `@crates/echo-cas/src/lib.rs`:
- Around line 60-62: Change BlobHash to hide its inner field (make the [u8; 32]
private) so callers cannot construct it directly; keep #[repr(transparent)] and
the derives, expose a single safe public constructor blob_hash(bytes: [u8;32])
(already present) for normal creation, add a from_bytes([u8;32]) constructor
intended for deserialization/wire use with a doc comment "caller asserts this
came from BLAKE3", and keep an as_bytes(&self) -> &[u8;32] accessor for
read-only access; if you intentionally allowed public construction, add a clear
doc warning on the BlobHash struct that constructing from non-BLAKE3 bytes is
legal but semantically meaningless.
- Around line 130-132: The current trait method signature fn get(&self, hash:
&BlobHash) -> Option<Arc<[u8]>> bakes an Arc-backed in-memory representation
into the trait contract; add a FIXME comment above the get signature (reference
symbols: get, BlobHash, Arc<[u8]>) stating FIXME(phase3): avoid forcing
Arc<[u8]> — consider returning bytes::Bytes or using an associated type (e.g.,
type Blob: AsRef<[u8]>) or another zero-copy representation in Phase 3, so
future DiskTier/ColdTier implementations can choose mmap/Bytes/Cow/streams
instead of allocating an Arc.
- Around line 71-78: The Display impl for BlobHash (fn fmt on BlobHash) does 32
separate write! calls via the Formatter; instead, build the full hex string
first and do a single write to the Formatter. In fmt, allocate a String with
capacity self.0.len()*2, use std::fmt::Write::write_fmt or write! into that
String in the loop to append each byte as "{:02x}", then call
f.write_str(&hex_string) once; this keeps the existing Display behavior but
reduces Formatter invocations while remaining safe (no unsafe code).

In `@crates/echo-cas/src/memory.rs`:
- Around line 223-234: Add a unit test that verifies pin idempotence: create a
MemoryTier via MemoryTier::new(), put some data with put(...) to get a hash,
call pin(&hash) twice, assert pinned_count() == 1, then unpin(&hash) and assert
is_pinned(&hash) is false; this ensures pin uses set semantics rather than a
reference count (refer to MemoryTier::new, put, pin, pinned_count, unpin,
is_pinned).
- Around line 154-168: Add a new unit test that exercises the happy path and
idempotence of MemoryTier::put_verified: create a MemoryTier via
MemoryTier::new, compute the correct hash for some test data using blob_hash,
call put_verified(hash, data) and assert it returns Ok, then assert store.len()
== 1, store.byte_count() == data.len(), and that store.get(&hash) returns the
stored bytes; finally call put_verified again with the same hash/data and assert
it still returns Ok and that len() and byte_count() have not increased
(idempotence). Ensure the test is named (e.g., put_verified_happy_path) and
mirrors the style of existing tests.
- Around line 27-32: Add a doc comment on MemoryTier explaining that byte_count
is a monotonically increasing tally of stored bytes for Phase 1 (no GC) and
therefore is_over_budget() will remain true once exceeded until explicit blob
removals are implemented in Phase 3; when implementing remove/evict in the
future you must decrement byte_count by the removed blob's length (compute via
the Arc<[u8]>::len() at removal time) so the budget can recover, and update the
is_over_budget() doc to state this invariant and recovery condition.
- Around line 98-106: The put method currently calls blob_hash then probes the
blobs map twice via contains_key and insert; change both put and put_verified to
use the HashMap entry API (e.g., blobs.entry(hash)) to perform a single lookup:
on Vacant(entry) insert Arc::from(bytes) and increment byte_count only when you
actually insert, and on Occupied return the existing hash without modifying
byte_count; apply the same entry-based logic in put_verified to avoid the double
lookup and ensure byte_count stays correct.
- Around line 108-118: The put_verified function currently does a double lookup
by calling self.blobs.contains_key(&computed) then insert, which can be replaced
with the HashMap entry API to do a single lookup and update byte_count only when
a new blob is actually inserted. Change put_verified to use
self.blobs.entry(computed) and on Vacant insert Arc::from(bytes) while
incrementing self.byte_count only for the new insertion (avoid incrementing when
the entry is Occupied); reference function name put_verified and map self.blobs
to locate the code to update.
- Around line 86-88: The is_over_budget method currently uses strict >
(self.max_bytes.is_some_and(|max| self.byte_count > max)), so a stored
byte_count equal to max_bytes is considered within budget; update the API docs
to explicitly state the budget is inclusive (i.e., exactly max_bytes is NOT over
budget) and mention the related symbols is_over_budget, with_limits, max_bytes,
and byte_count so callers understand the off-by-one behavior; if you prefer
exclusive semantics instead, change the comparison to >= in is_over_budget and
update tests/docs accordingly.

@@ -0,0 +1,6 @@
<!-- SPDX-License-Identifier: Apache-2.0 OR MIND-UCAL-1.0 -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

License mismatch: README declares Apache-2.0 OR MIND-UCAL-1.0 but Cargo.toml inherits workspace license Apache-2.0 only.

This is a compliance inconsistency. Cargo.tomllicense.workspace = true → root workspace declares license = "Apache-2.0". This README introduces a second license (MIND-UCAL-1.0) that doesn't appear anywhere in the manifest chain. Either:

  1. Update the workspace/crate license field to match the dual-license, or
  2. Drop MIND-UCAL-1.0 from the README header.

Shipping contradictory license declarations is a legal landmine.

🤖 Prompt for AI Agents
In `@crates/echo-cas/README.md` at line 1, The README's SPDX header ("Apache-2.0
OR MIND-UCAL-1.0") conflicts with the workspace Cargo.toml which uses
license.workspace = true and the root workspace license = "Apache-2.0"; fix by
making the license declarations consistent: either update the workspace/root
Cargo.toml to include the dual-license string "Apache-2.0 OR MIND-UCAL-1.0" (so
all crates inherit the OR license) or remove "MIND-UCAL-1.0" from the README
SPDX header; check and update the crate's Cargo.toml/license or the root
workspace license field accordingly to ensure Cargo.toml, license.workspace, and
README.md match.

Comment on lines +60 to +62
#[repr(transparent)]
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Debug)]
pub struct BlobHash(pub [u8; 32]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

BlobHash(pub [u8; 32]) — public inner field undermines type-level integrity guarantees.

Anyone can write BlobHash([0u8; 32]) and hand it to get, pin, or put_verified. The type doesn't encode "this is a real BLAKE3 hash" — it's just a bag of bytes wearing a trenchcoat. Today the only consequence is a None from get or a spurious pin on a phantom hash, but once GC or network protocol code trusts BlobHash as proof-of-work, unvalidated construction becomes a footgun.

Consider making the inner field private and offering:

  • blob_hash(bytes) as the only public constructor (already exists).
  • A from_bytes([u8; 32]) for deserialization / wire protocol (Phase 3), documented as "caller asserts this came from BLAKE3".

You can still expose as_bytes() and keep #[repr(transparent)] for FFI. The cost is one extra function; the payoff is that BlobHash in a function signature means something.

If this is a deliberate "value type, no invariants" choice (like NodeId in warp-core), at minimum add a doc warning on the struct: "Constructing a BlobHash from non-BLAKE3 bytes is legal but semantically meaningless."

🤖 Prompt for AI Agents
In `@crates/echo-cas/src/lib.rs` around lines 60 - 62, Change BlobHash to hide its
inner field (make the [u8; 32] private) so callers cannot construct it directly;
keep #[repr(transparent)] and the derives, expose a single safe public
constructor blob_hash(bytes: [u8;32]) (already present) for normal creation, add
a from_bytes([u8;32]) constructor intended for deserialization/wire use with a
doc comment "caller asserts this came from BLAKE3", and keep an as_bytes(&self)
-> &[u8;32] accessor for read-only access; if you intentionally allowed public
construction, add a clear doc warning on the BlobHash struct that constructing
from non-BLAKE3 bytes is legal but semantically meaningless.

Comment on lines +71 to +78
impl std::fmt::Display for BlobHash {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
for byte in &self.0 {
write!(f, "{byte:02x}")?;
}
Ok(())
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Display impl: 32 individual write! calls per hash.

Each write!(f, "{byte:02x}") is a formatting + write call. For a 32-byte hash that's 32 calls through the Formatter machinery. Not catastrophic, but if you ever log hashes in a hot path (and you will), this adds up.

A single-pass approach:

Proposed alternative
 impl std::fmt::Display for BlobHash {
     fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
-        for byte in &self.0 {
-            write!(f, "{byte:02x}")?;
-        }
-        Ok(())
+        let mut buf = [0u8; 64];
+        for (i, &b) in self.0.iter().enumerate() {
+            let hi = b >> 4;
+            let lo = b & 0x0F;
+            buf[i * 2] = HEX_CHARS[hi as usize];
+            buf[i * 2 + 1] = HEX_CHARS[lo as usize];
+        }
+        // SAFETY: hex chars are always valid UTF-8.
+        f.write_str(unsafe { std::str::from_utf8_unchecked(&buf) })
     }
 }
+
+const HEX_CHARS: &[u8; 16] = b"0123456789abcdef";

...but you forbid(unsafe_code). So the safe equivalent:

Safe single-allocation approach
 impl std::fmt::Display for BlobHash {
     fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
-        for byte in &self.0 {
-            write!(f, "{byte:02x}")?;
-        }
-        Ok(())
+        let hex: String = self.0.iter().map(|b| format!("{b:02x}")).collect();
+        f.write_str(&hex)
     }
 }

Not a hill to die on for Phase 1, but flag it for when you're profiling.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
impl std::fmt::Display for BlobHash {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
for byte in &self.0 {
write!(f, "{byte:02x}")?;
}
Ok(())
}
}
impl std::fmt::Display for BlobHash {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
let hex: String = self.0.iter().map(|b| format!("{b:02x}")).collect();
f.write_str(&hex)
}
}
🤖 Prompt for AI Agents
In `@crates/echo-cas/src/lib.rs` around lines 71 - 78, The Display impl for
BlobHash (fn fmt on BlobHash) does 32 separate write! calls via the Formatter;
instead, build the full hex string first and do a single write to the Formatter.
In fmt, allocate a String with capacity self.0.len()*2, use
std::fmt::Write::write_fmt or write! into that String in the loop to append each
byte as "{:02x}", then call f.write_str(&hex_string) once; this keeps the
existing Display behavior but reduces Formatter invocations while remaining safe
(no unsafe code).

Comment on lines +86 to +88
pub fn is_over_budget(&self) -> bool {
self.max_bytes.is_some_and(|max| self.byte_count > max)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

is_over_budget uses > not >= — at exactly max_bytes, you're NOT over budget.

with_limits(10) + exactly 10 bytes stored → is_over_budget() returns false. This is a deliberate choice (budget is inclusive), but it's worth calling out because it's a classic off-by-one ambiguity. The doc should state whether the budget is inclusive or exclusive.

Suggested doc clarification
     /// Returns `true` if `byte_count` exceeds the configured budget.
     ///
-    /// Always returns `false` if no budget was set.
+    /// The budget is inclusive: exactly `max_bytes` bytes is within budget.
+    /// Always returns `false` if no budget was set.
     pub fn is_over_budget(&self) -> bool {
🤖 Prompt for AI Agents
In `@crates/echo-cas/src/memory.rs` around lines 86 - 88, The is_over_budget
method currently uses strict > (self.max_bytes.is_some_and(|max| self.byte_count
> max)), so a stored byte_count equal to max_bytes is considered within budget;
update the API docs to explicitly state the budget is inclusive (i.e., exactly
max_bytes is NOT over budget) and mention the related symbols is_over_budget,
with_limits, max_bytes, and byte_count so callers understand the off-by-one
behavior; if you prefer exclusive semantics instead, change the comparison to >=
in is_over_budget and update tests/docs accordingly.

Comment on lines +98 to +106
fn put(&mut self, bytes: &[u8]) -> BlobHash {
let hash = blob_hash(bytes);
if self.blobs.contains_key(&hash) {
return hash;
}
self.byte_count += bytes.len();
self.blobs.insert(hash, Arc::from(bytes));
hash
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

put computes hash then probes the map twice on the cold path.

contains_key followed by insert is two lookups. Use the entry API:

Proposed fix
 fn put(&mut self, bytes: &[u8]) -> BlobHash {
     let hash = blob_hash(bytes);
-    if self.blobs.contains_key(&hash) {
-        return hash;
-    }
-    self.byte_count += bytes.len();
-    self.blobs.insert(hash, Arc::from(bytes));
+    if let std::collections::hash_map::Entry::Vacant(e) = self.blobs.entry(hash) {
+        self.byte_count += bytes.len();
+        e.insert(Arc::from(bytes));
+    }
     hash
 }

Same applies to put_verified (Lines 113-115).

🤖 Prompt for AI Agents
In `@crates/echo-cas/src/memory.rs` around lines 98 - 106, The put method
currently calls blob_hash then probes the blobs map twice via contains_key and
insert; change both put and put_verified to use the HashMap entry API (e.g.,
blobs.entry(hash)) to perform a single lookup: on Vacant(entry) insert
Arc::from(bytes) and increment byte_count only when you actually insert, and on
Occupied return the existing hash without modifying byte_count; apply the same
entry-based logic in put_verified to avoid the double lookup and ensure
byte_count stays correct.

- Upgrade thiserror 1 → 2 (leaf crate, no compatibility concern)
- Add put_verified happy-path test (store, retrieve, idempotence)
- Add pin idempotence test (set semantics: double-pin keeps count at 1)
- Document is_over_budget budget semantics (exclusive) and byte_count
  monotonic invariant (no removal until Phase 3 GC)
- Remove empty [dev-dependencies] section
- Update bytes 1.11.0 → 1.11.1 (RUSTSEC-2026-0007, workspace-wide)
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@crates/echo-cas/src/memory.rs`:
- Around line 286-296: Add a test that asserts the critical boundary where
stored bytes equal the configured limit remains within budget: create a
MemoryTier via MemoryTier::with_limits(n), call put to store exactly n bytes,
assert !store.is_over_budget(), then put one more byte and assert
store.is_over_budget(); reference the existing with_limits, put, and
is_over_budget methods so the test will catch off-by-one regressions (e.g., `>`
vs `>=`) and ensure the budget semantics remain exclusive.
- Around line 110-120: In put_verified, avoid computing blob_hash(bytes) when
the expected blob is already present: first check
self.blobs.contains_key(&expected) and return Ok(()) if found; only then compute
let computed = blob_hash(bytes) and compare to expected, returning
CasError::HashMismatch on mismatch; if absent, insert into self.blobs and update
self.byte_count — this also removes the redundant second contains_key probe
before insert. Use the existing symbols put_verified, expected, blob_hash,
self.blobs, and self.byte_count to locate and change the logic.

Comment on lines +110 to +120
fn put_verified(&mut self, expected: BlobHash, bytes: &[u8]) -> Result<(), CasError> {
let computed = blob_hash(bytes);
if computed != expected {
return Err(CasError::HashMismatch { expected, computed });
}
if !self.blobs.contains_key(&computed) {
self.byte_count += bytes.len();
self.blobs.insert(computed, Arc::from(bytes));
}
Ok(())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

put_verified hashes unconditionally — skip the BLAKE3 computation when the blob already exists.

On the wire-protocol ingestion path (Phase 3), put_verified will be the primary entry point. Currently it computes the full BLAKE3 hash of every incoming payload before checking membership. For large blobs that are already stored (idempotent retransmits, dedup hits), this is wasted work. An early contains_key on the expected hash short-circuits the entire function:

♻️ Proposed fix (also folds in the entry API fix from the prior review)
 fn put_verified(&mut self, expected: BlobHash, bytes: &[u8]) -> Result<(), CasError> {
+    // Fast path: blob already stored — skip hashing entirely.
+    if self.blobs.contains_key(&expected) {
+        return Ok(());
+    }
     let computed = blob_hash(bytes);
     if computed != expected {
         return Err(CasError::HashMismatch { expected, computed });
     }
-    if !self.blobs.contains_key(&computed) {
-        self.byte_count += bytes.len();
-        self.blobs.insert(computed, Arc::from(bytes));
-    }
+    self.byte_count += bytes.len();
+    self.blobs.insert(computed, Arc::from(bytes));
     Ok(())
 }

This is semantically safe: if we already hold the blob for expected, the caller's bytes are irrelevant — the store is content-addressed and the hash is the identity. The second contains_key + insert double-probe also disappears because after the early return, we know the key is absent.

🤖 Prompt for AI Agents
In `@crates/echo-cas/src/memory.rs` around lines 110 - 120, In put_verified, avoid
computing blob_hash(bytes) when the expected blob is already present: first
check self.blobs.contains_key(&expected) and return Ok(()) if found; only then
compute let computed = blob_hash(bytes) and compare to expected, returning
CasError::HashMismatch on mismatch; if absent, insert into self.blobs and update
self.byte_count — this also removes the redundant second contains_key probe
before insert. Use the existing symbols put_verified, expected, blob_hash,
self.blobs, and self.byte_count to locate and change the logic.

Comment on lines +286 to +296
#[test]
fn with_limits_and_over_budget() {
let mut store = MemoryTier::with_limits(10);
assert!(!store.is_over_budget());
store.put(b"12345"); // 5 bytes, within budget
assert!(!store.is_over_budget());
store.put(b"1234567"); // +7 = 12, over budget
assert!(store.is_over_budget());
// Put still succeeds — budget is advisory.
assert_eq!(store.len(), 2);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing the most critical boundary: exactly max_bytes stored.

You went to the trouble of documenting that the budget is exclusive (exactly max_bytes is within budget), but the test jumps from 5 (under) to 12 (over) — it never asserts the boundary itself. If someone later changes > to >= in is_over_budget, this test suite won't catch it. That's the one off-by-one this doc note exists to protect against.

🧪 Proposed boundary assertion (add inside the existing test or as a new test)
 #[test]
 fn with_limits_and_over_budget() {
     let mut store = MemoryTier::with_limits(10);
     assert!(!store.is_over_budget());
     store.put(b"12345"); // 5 bytes, within budget
     assert!(!store.is_over_budget());
+    store.put(b"67890"); // +5 = 10, exactly at budget (exclusive → NOT over)
+    assert!(!store.is_over_budget());
     store.put(b"1234567"); // +7 = 12, over budget
+    // Note: byte_count is now 17 (5 + 5 + 7), not 12 — three distinct blobs.
     assert!(store.is_over_budget());
     // Put still succeeds — budget is advisory.
-    assert_eq!(store.len(), 2);
+    assert_eq!(store.len(), 3);
 }

Or as a standalone test:

#[test]
fn budget_boundary_exactly_at_limit() {
    let mut store = MemoryTier::with_limits(4);
    store.put(b"abcd"); // 4 bytes == max_bytes
    assert!(!store.is_over_budget(), "exactly max_bytes must be within budget (exclusive)");
    store.put(b"e"); // 5 bytes > max_bytes
    assert!(store.is_over_budget());
}
🤖 Prompt for AI Agents
In `@crates/echo-cas/src/memory.rs` around lines 286 - 296, Add a test that
asserts the critical boundary where stored bytes equal the configured limit
remains within budget: create a MemoryTier via MemoryTier::with_limits(n), call
put to store exactly n bytes, assert !store.is_over_budget(), then put one more
byte and assert store.is_over_budget(); reference the existing with_limits, put,
and is_over_budget methods so the test will catch off-by-one regressions (e.g.,
`>` vs `>=`) and ensure the budget semantics remain exclusive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant