Skip to content

feat: add ZarrsArray#147

Draft
LDeakin wants to merge 4 commits intomainfrom
ld/feat/array_impl
Draft

feat: add ZarrsArray#147
LDeakin wants to merge 4 commits intomainfrom
ld/feat/array_impl

Conversation

@LDeakin
Copy link
Member

@LDeakin LDeakin commented Feb 8, 2026

This is a POC for using zarrs' Array from python. Native zarrs performance in Python 🚀🚀🚀. It just layers over zarr.Array and replaces __getitem__ and __setitem__. The advantage over the codec pipeline is that all chunking logic / reassembly happens on the Rust side.

It'd be amazing if there was some way to make this usable by zarr-python (similarly to the codec pipeline) so dask etc can use it.

Also maybe it should be lazy by default, currently opt-in with .lazy

Round trip benchmark

image

Disclosure: I haven't reviewed this thoroughly, Claude did most of the implementation.

@d-v-b
Copy link

d-v-b commented Feb 8, 2026

I love this!

Also maybe it should be lazy by default, currently opt-in with .lazy

See zarr-developers/zarr-python#3678. Telling claude to copy tensorstore works great!

@ilan-gold
Copy link
Collaborator

ilan-gold commented Feb 12, 2026

I am very much so in favor of this because I really abuse this chunk-iteration API and do notice this performance hit. Do you think the performance hit comes from creating many small rust classes? Or is it in zarr-python before we even see the individual chunk selections?

Just wondering out loud for the current state of things, could we make https://github.com/zarrs/zarrs-python/blob/main/python/zarrs/utils.py simpler? I wonder if the creation of a list of selection-out-shape objects or selections/outs/shapes individually specifically could be rust-ified if we could guarantee from zarr-python that the input to it is always a tuple[slice, ...] and we could describe that as Vec<PySlice> or similar for the inputs (and then similar idea for outputs + shape)? I guess I'm just wondering how "strong" we can make the python so that batch_info in our make_chunk_info_for_rust_with_indices function could be turned into one big object that the rust could understand simply instead of our creating many small ones.

I would be very open to dropping this numpy.ndarray support in service of that goal, especially since it predates our fallback support IIRC

Similar/related to this could be an API that the base CodecPipeline in zarr-python exposes for allowing the codec pipeline to construct its own indexer that gets passed down to CodecPipeline.read i.e., a custom batch_info (or some similar change).

Directly to the question of

It'd be amazing if there was some way to make this usable by zarr-python (similarly to the codec pipeline) so dask etc can use it

could be just monkey-patching the Array import from zarr-python i.e., replacing the import-time object, something like setattr(zarr, "Array", OurArray) or something. Very evil and maybe dangerous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants