Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ members = [
"crates/echo-wasm-bindings",
"crates/echo-wesley-gen",
"crates/echo-dry-tests",
"crates/echo-cas",
"xtask"
]
resolver = "2"
Expand All @@ -35,6 +36,7 @@ rust-version = "1.90.0"

[workspace.dependencies]
echo-app-core = { version = "0.1.0", path = "crates/echo-app-core" }
echo-cas = { version = "0.1.0", path = "crates/echo-cas" }
echo-config-fs = { version = "0.1.0", path = "crates/echo-config-fs" }
echo-dind-tests = { version = "0.1.0", path = "crates/echo-dind-tests" }
echo-dry-tests = { version = "0.1.0", path = "crates/echo-dry-tests" }
Expand Down
17 changes: 17 additions & 0 deletions crates/echo-cas/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# SPDX-License-Identifier: Apache-2.0
# © James Ross Ω FLYING•ROBOTS <https://github.com/flyingrobots>
[package]
name = "echo-cas"
version = "0.1.0"
edition = "2021"
license.workspace = true
repository.workspace = true
rust-version.workspace = true
description = "Content-addressed blob store for Echo"
readme = "README.md"
keywords = ["echo", "cas", "content-addressed"]
categories = ["data-structures"]

[dependencies]
blake3 = "1.5"
thiserror = "2"
6 changes: 6 additions & 0 deletions crates/echo-cas/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
<!-- SPDX-License-Identifier: Apache-2.0 OR MIND-UCAL-1.0 -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

License mismatch: README declares Apache-2.0 OR MIND-UCAL-1.0 but Cargo.toml inherits workspace license Apache-2.0 only.

This is a compliance inconsistency. Cargo.tomllicense.workspace = true → root workspace declares license = "Apache-2.0". This README introduces a second license (MIND-UCAL-1.0) that doesn't appear anywhere in the manifest chain. Either:

  1. Update the workspace/crate license field to match the dual-license, or
  2. Drop MIND-UCAL-1.0 from the README header.

Shipping contradictory license declarations is a legal landmine.

🤖 Prompt for AI Agents
In `@crates/echo-cas/README.md` at line 1, The README's SPDX header ("Apache-2.0
OR MIND-UCAL-1.0") conflicts with the workspace Cargo.toml which uses
license.workspace = true and the root workspace license = "Apache-2.0"; fix by
making the license declarations consistent: either update the workspace/root
Cargo.toml to include the dual-license string "Apache-2.0 OR MIND-UCAL-1.0" (so
all crates inherit the OR license) or remove "MIND-UCAL-1.0" from the README
SPDX header; check and update the crate's Cargo.toml/license or the root
workspace license field accordingly to ensure Cargo.toml, license.workspace, and
README.md match.

<!-- © James Ross Ω FLYING•ROBOTS <https://github.com/flyingrobots> -->

# echo-cas

Content-addressed blob store for Echo.
145 changes: 145 additions & 0 deletions crates/echo-cas/src/lib.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
// SPDX-License-Identifier: Apache-2.0
// © James Ross Ω FLYING•ROBOTS <https://github.com/flyingrobots>
//! Content-addressed blob store for Echo.
//!
//! `echo-cas` provides a [`BlobStore`] trait for content-addressed storage keyed by
//! BLAKE3 hash. Phase 1 ships [`MemoryTier`] — sufficient for the in-browser website
//! demo. Disk/cold tiers, wire protocol, and GC come in Phase 3.
//!
//! # Hash Domain Policy
//!
//! CAS hash is content-only: `BLAKE3(bytes)` with no domain prefix. Two blobs with
//! identical bytes are the same CAS blob regardless of semantic type. This is by
//! design — deduplication is a feature, not a bug. Domain separation happens at the
//! typed-reference layer above (`TypedRef`: `schema_hash` + `type_id` + `layout_hash` +
//! `value_hash`).
//!
//! # Determinism Invariant
//!
//! No public API exposes store iteration order. CAS determinism is content-level
//! (same bytes → same hash), not collection-level. Any future `list`/`iter` API must
//! return results sorted by [`BlobHash`].
#![forbid(unsafe_code)]
#![deny(missing_docs, rust_2018_idioms, unused_must_use)]
#![deny(
clippy::all,
clippy::pedantic,
clippy::nursery,
clippy::cargo,
clippy::unwrap_used,
clippy::expect_used,
clippy::panic,
clippy::todo,
clippy::unimplemented,
clippy::dbg_macro,
clippy::print_stdout,
clippy::print_stderr
)]
#![allow(
clippy::must_use_candidate,
clippy::return_self_not_must_use,
clippy::unreadable_literal,
clippy::missing_const_for_fn,
clippy::suboptimal_flops,
clippy::redundant_pub_crate,
clippy::many_single_char_names,
clippy::module_name_repetitions,
clippy::use_self
)]

mod memory;
pub use memory::MemoryTier;

use std::sync::Arc;

/// A 32-byte BLAKE3 content hash.
///
/// Thin newtype over `[u8; 32]` following the `NodeId`/`TypeId` pattern from
/// `warp-core`. The inner bytes are public for zero-cost access; the `Display`
/// impl renders lowercase hex for logging and error messages.
#[repr(transparent)]
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Debug)]
pub struct BlobHash(pub [u8; 32]);
Comment on lines +60 to +62
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

BlobHash(pub [u8; 32]) — public inner field undermines type-level integrity guarantees.

Anyone can write BlobHash([0u8; 32]) and hand it to get, pin, or put_verified. The type doesn't encode "this is a real BLAKE3 hash" — it's just a bag of bytes wearing a trenchcoat. Today the only consequence is a None from get or a spurious pin on a phantom hash, but once GC or network protocol code trusts BlobHash as proof-of-work, unvalidated construction becomes a footgun.

Consider making the inner field private and offering:

  • blob_hash(bytes) as the only public constructor (already exists).
  • A from_bytes([u8; 32]) for deserialization / wire protocol (Phase 3), documented as "caller asserts this came from BLAKE3".

You can still expose as_bytes() and keep #[repr(transparent)] for FFI. The cost is one extra function; the payoff is that BlobHash in a function signature means something.

If this is a deliberate "value type, no invariants" choice (like NodeId in warp-core), at minimum add a doc warning on the struct: "Constructing a BlobHash from non-BLAKE3 bytes is legal but semantically meaningless."

🤖 Prompt for AI Agents
In `@crates/echo-cas/src/lib.rs` around lines 60 - 62, Change BlobHash to hide its
inner field (make the [u8; 32] private) so callers cannot construct it directly;
keep #[repr(transparent)] and the derives, expose a single safe public
constructor blob_hash(bytes: [u8;32]) (already present) for normal creation, add
a from_bytes([u8;32]) constructor intended for deserialization/wire use with a
doc comment "caller asserts this came from BLAKE3", and keep an as_bytes(&self)
-> &[u8;32] accessor for read-only access; if you intentionally allowed public
construction, add a clear doc warning on the BlobHash struct that constructing
from non-BLAKE3 bytes is legal but semantically meaningless.


impl BlobHash {
/// View the hash as a byte slice.
pub fn as_bytes(&self) -> &[u8; 32] {
&self.0
}
}

impl std::fmt::Display for BlobHash {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
for byte in &self.0 {
write!(f, "{byte:02x}")?;
}
Ok(())
}
}
Comment on lines +71 to +78
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Display impl: 32 individual write! calls per hash.

Each write!(f, "{byte:02x}") is a formatting + write call. For a 32-byte hash that's 32 calls through the Formatter machinery. Not catastrophic, but if you ever log hashes in a hot path (and you will), this adds up.

A single-pass approach:

Proposed alternative
 impl std::fmt::Display for BlobHash {
     fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
-        for byte in &self.0 {
-            write!(f, "{byte:02x}")?;
-        }
-        Ok(())
+        let mut buf = [0u8; 64];
+        for (i, &b) in self.0.iter().enumerate() {
+            let hi = b >> 4;
+            let lo = b & 0x0F;
+            buf[i * 2] = HEX_CHARS[hi as usize];
+            buf[i * 2 + 1] = HEX_CHARS[lo as usize];
+        }
+        // SAFETY: hex chars are always valid UTF-8.
+        f.write_str(unsafe { std::str::from_utf8_unchecked(&buf) })
     }
 }
+
+const HEX_CHARS: &[u8; 16] = b"0123456789abcdef";

...but you forbid(unsafe_code). So the safe equivalent:

Safe single-allocation approach
 impl std::fmt::Display for BlobHash {
     fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
-        for byte in &self.0 {
-            write!(f, "{byte:02x}")?;
-        }
-        Ok(())
+        let hex: String = self.0.iter().map(|b| format!("{b:02x}")).collect();
+        f.write_str(&hex)
     }
 }

Not a hill to die on for Phase 1, but flag it for when you're profiling.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
impl std::fmt::Display for BlobHash {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
for byte in &self.0 {
write!(f, "{byte:02x}")?;
}
Ok(())
}
}
impl std::fmt::Display for BlobHash {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
let hex: String = self.0.iter().map(|b| format!("{b:02x}")).collect();
f.write_str(&hex)
}
}
🤖 Prompt for AI Agents
In `@crates/echo-cas/src/lib.rs` around lines 71 - 78, The Display impl for
BlobHash (fn fmt on BlobHash) does 32 separate write! calls via the Formatter;
instead, build the full hex string first and do a single write to the Formatter.
In fmt, allocate a String with capacity self.0.len()*2, use
std::fmt::Write::write_fmt or write! into that String in the loop to append each
byte as "{:02x}", then call f.write_str(&hex_string) once; this keeps the
existing Display behavior but reduces Formatter invocations while remaining safe
(no unsafe code).


/// Compute the BLAKE3 content hash of `bytes`.
///
/// No domain prefix — the content IS the identity. See module-level docs for
/// hash domain policy.
pub fn blob_hash(bytes: &[u8]) -> BlobHash {
let hash = blake3::hash(bytes);
BlobHash(*hash.as_bytes())
}

/// Errors that can occur during CAS operations.
#[derive(Debug, Clone, PartialEq, Eq, thiserror::Error)]
pub enum CasError {
/// Blob bytes did not match the declared hash.
#[error("[CAS_HASH_MISMATCH] expected {expected}, computed {computed}")]
HashMismatch {
/// The hash that was declared/expected.
expected: BlobHash,
/// The hash actually computed from the bytes.
computed: BlobHash,
},
}

/// Content-addressed blob store.
///
/// Implementations store opaque byte blobs keyed by their BLAKE3 hash. The trait
/// is intentionally synchronous and object-safe for Phase 1. Async methods will be
/// added (likely as a separate `AsyncBlobStore` trait) when disk/network tiers
/// demand it.
///
/// # Absence Semantics
///
/// [`get`](BlobStore::get) returns `None` for missing blobs — this is **not** an
/// error. CAS is a lookup table: missing blobs are expected (not-yet-fetched,
/// GC'd, never stored). Error variants are reserved for integrity violations.
pub trait BlobStore {
/// Compute hash and store. Returns the content hash.
fn put(&mut self, bytes: &[u8]) -> BlobHash;

/// Store with a pre-computed hash. Rejects if `BLAKE3(bytes) != expected`.
///
/// On mismatch the store is unchanged and a [`CasError::HashMismatch`] is
/// returned. This method exists for receivers of `WANT`/`PROVIDE` messages
/// who already possess the hash.
///
/// # Errors
///
/// Returns [`CasError::HashMismatch`] if the computed hash differs from
/// `expected`.
fn put_verified(&mut self, expected: BlobHash, bytes: &[u8]) -> Result<(), CasError>;

/// Retrieve blob by hash. Returns `None` if not stored — absence is not an
/// error.
fn get(&self, hash: &BlobHash) -> Option<Arc<[u8]>>;
Comment on lines +130 to +132
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

get returns Arc<[u8]> — this bakes the in-memory representation into the trait contract.

When DiskTier or ColdTier arrive in Phase 3, they may want to return Bytes, Cow<[u8]>, memory-mapped slices, or streaming readers. Returning Arc<[u8]> forces every tier to allocate an Arc even if the data is already in a zero-copy buffer.

Options to future-proof:

  1. Return Bytes (from the bytes crate) — cheaply cloneable, backed by Arc or mmap.
  2. Use an associated type: type Blob: AsRef<[u8]> — each tier picks its optimal representation.
  3. Accept the coupling for Phase 1 and add a // FIXME(phase3): ... comment so it's tracked.

Option 3 is the minimum viable action right now. Just don't let this silently calcify.

🤖 Prompt for AI Agents
In `@crates/echo-cas/src/lib.rs` around lines 130 - 132, The current trait method
signature fn get(&self, hash: &BlobHash) -> Option<Arc<[u8]>> bakes an
Arc-backed in-memory representation into the trait contract; add a FIXME comment
above the get signature (reference symbols: get, BlobHash, Arc<[u8]>) stating
FIXME(phase3): avoid forcing Arc<[u8]> — consider returning bytes::Bytes or
using an associated type (e.g., type Blob: AsRef<[u8]>) or another zero-copy
representation in Phase 3, so future DiskTier/ColdTier implementations can
choose mmap/Bytes/Cow/streams instead of allocating an Arc.


/// Check existence without retrieving.
fn has(&self, hash: &BlobHash) -> bool;

/// Mark hash as a retention root.
///
/// Legal on missing blobs (pre-pin intent). Pin semantics are set-based (not
/// reference-counted) in Phase 1.
fn pin(&mut self, hash: &BlobHash);

/// Remove retention root. No-op if not pinned or not stored.
fn unpin(&mut self, hash: &BlobHash);
}
Loading