feat: add `ZarrsArray` by LDeakin · Pull Request #147 · zarrs/zarrs-python

LDeakin · 2026-02-08T03:34:46Z

This is a POC for using zarrs' Array from python. Native zarrs performance in Python 🚀🚀🚀. It just layers over zarr.Array and replaces __getitem__ and __setitem__. The advantage over the codec pipeline is that all chunking logic / reassembly happens on the Rust side.

It'd be amazing if there was some way to make this usable by zarr-python (similarly to the codec pipeline) so dask etc can use it.

Also maybe it should be lazy by default, currently opt-in with .lazy

Round trip benchmark

Disclosure: I haven't reviewed this thoroughly, Claude did most of the implementation.

d-v-b · 2026-02-08T10:10:13Z

I love this!

Also maybe it should be lazy by default, currently opt-in with .lazy

See zarr-developers/zarr-python#3678. Telling claude to copy tensorstore works great!

ilan-gold · 2026-02-12T22:02:26Z

I am very much so in favor of this because I really abuse this chunk-iteration API and do notice this performance hit. Do you think the performance hit comes from creating many small rust classes? Or is it in zarr-python before we even see the individual chunk selections?

Just wondering out loud for the current state of things, could we make https://github.com/zarrs/zarrs-python/blob/main/python/zarrs/utils.py simpler? I wonder if the creation of a list of selection-out-shape objects or selections/outs/shapes individually specifically could be rust-ified if we could guarantee from zarr-python that the input to it is always a tuple[slice, ...] and we could describe that as Vec<PySlice> or similar for the inputs (and then similar idea for outputs + shape)? I guess I'm just wondering how "strong" we can make the python so that batch_info in our make_chunk_info_for_rust_with_indices function could be turned into one big object that the rust could understand simply instead of our creating many small ones.

I would be very open to dropping this numpy.ndarray support in service of that goal, especially since it predates our fallback support IIRC

Similar/related to this could be an API that the base CodecPipeline in zarr-python exposes for allowing the codec pipeline to construct its own indexer that gets passed down to CodecPipeline.read i.e., a custom batch_info (or some similar change).

Directly to the question of

It'd be amazing if there was some way to make this usable by zarr-python (similarly to the codec pipeline) so dask etc can use it

could be just monkey-patching the Array import from zarr-python i.e., replacing the import-time object, something like setattr(zarr, "Array", OurArray) or something. Very evil and maybe dangerous.

LDeakin added 4 commits February 8, 2026 13:06

feat: add ZarrsArray

293e5fc

fix: update test to use moved nparray_to_unsafe_cell_slice

195f0a2

fix: use of deprecated pyo3 methods

8ca7cde

chore: update stubs

91e10c5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `ZarrsArray`#147

feat: add `ZarrsArray`#147
LDeakin wants to merge 4 commits intomainfrom
ld/feat/array_impl

LDeakin commented Feb 8, 2026

Uh oh!

d-v-b commented Feb 8, 2026

Uh oh!

ilan-gold commented Feb 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

LDeakin commented Feb 8, 2026

Round trip benchmark

Uh oh!

d-v-b commented Feb 8, 2026

Uh oh!

ilan-gold commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ilan-gold commented Feb 12, 2026 •

edited

Loading