Please share your use-cases for Zarr (to help inform benchmarking) #1486

JackKelly · 2023-08-02T16:16:24Z

JackKelly
Aug 2, 2023

Please share your use-case(s) for Zarr, to help inform our performance benchmarking.

xref: #1479

If possible, please share:

A brief description of your use-case
Performance requirements (if any)
Chunk sizes
Compression
Operating system
Storage medium
Do you use Xarray and/or Dask with Zarr?
Where does your data come from?
If you use xarray: How do you index into your data? Using integer array indexes, or using coordinate labels (e.g. datetimes)?
What do your access patterns look like when reading Zarr? For example, do you always read the entire dataset? Or do you only read single, random elements? Or do you read randomly located slices?

(EDIT: I added questions 9 and 10 on the 7th August)

Two use-cases have already been shared in this thread.

JackKelly · 2023-08-02T16:32:26Z

JackKelly
Aug 2, 2023
Author

I'll share our use-case at Open Climate Fix:

Description: We're ultimately trying to build better forecasts for solar power. We train large machine learning models (CNNs, RNNs, and transformers) on two large Zarr datasets: satellite imagery (dimensions: time, x, y, channel) and numerical weather predictions (dimensions: init_time, step, x, y, channel). When training our ML models, we need to deliver about 500 "training examples" to the GPU per second. A typical training example might consist of satellite data of size 4, 128, 128, 10; and the NWPs might be of size 1, 12, 64, 64, 10. Each example is sampled uniformly from random locations within the training dataset. We frequently adjust how we sample from the data, and the shape of each example. It's super-important to be able to quickly try out new ideas with respect to how we sample the training data.
Performance requirements: At the moment (using Zarr-Python and Zarr v2) we can't train our ML models directly from our Zarr datasets. It's too slow. Instead we prepare training batches ahead-of-time. But we'd really love to stop having to prepare batches ahead of time. Instead, we'd love to be able to train directly from the Zarrs. But that requires the ability to read and decompress hundreds of thousands of chunks per second (assuming we use very small chunk sizes within shards). In terms of total bandwidth, we often need to sustain on the order of 1 gigabyte per second to the GPU.
Chunk sizes: Varies a lot. With sharding, we'd probably want to use tiny chunk sizes (4 kB???) to minimise the quantity of "wasted" reads. Here are some quick calculations on the chunks sizes we might want.
Compression: We're using zstd (from blosc2) at the moment. We tried JPEG-XL but it was far too slow to decode.
Operating system: Linux
Storage medium: Most of our ML training is done on our on-premises GPU server from PCIe4 SSDs. It'd be lovely if we could also train from our HDD-based storage server (ZFS mirrors). We do some training in the cloud from cloud buckets (both GCP and AWS). We also store some of our data as public Zarrs on Hugging Face (zipped) and on Google Cloud Public Datasets.
Do you use Xarray and/or Dask with Zarr?: Yes to both! (Although, TBH, we don't often use Dask when training on-the-fly because it introduces too much latency)
Where does your data come from?: Satellite data is from EUMETSAT native files (packed 10-bit). NWP data is from GRIB2 files provided by the UK Met Office. It takes ages to convert from EUMETSAT native files to Zarr. We'd love to speed that up, too!
Indexing: We use label-based indexing (geographical location and datetime).
Access pattern: We take random crops.

0 replies

normanrz · 2023-08-03T08:13:34Z

normanrz
Aug 3, 2023
Maintainer

Our use case at WEBKNOSSOS:

Description: We work with 3D bioimaging data. We store the data in small chunks (e.g. 32 ** 3) in larger shards (e.g. 1024 ** 3). We have both images (hardly compressible) and segmentation maps (highly compressible). Our access pattern is reading/writing entire shards in an HPC environment (mostly Python code) and random access to individual chunks from an interactive visualization tool (web-based, no Python).
Performance requirements: Reading random chunks needs to have high throughput and low latency. Also, the number of inodes needs to be kept small. Bulk reading and writing is not our bottleneck.
Chunk sizes: Typically (32, 32, 32) chunk sizes within (1024, 1024, 1024) shards.
Compression: transpose(F), endian(little), blosc(zstd)
Operating system: Linux and Mac (for dev)
Storage medium: HPC distributed file systems (e.g. GPFS), HDDs, cloud object storage, static file HTTP servers
Do you use Xarray and/or Dask with Zarr?: No, but we have our own multi-node scheduler that is similar to dask.
Where does your data come from?: The bioimaging data is coming from various kinds of microscopes, including electron, micro-CT X-Ray, light-sheet, fluorescence microscopes.

0 replies

emfdavid · 2023-08-04T15:52:57Z

emfdavid
Aug 4, 2023

Camus Energy's reading NOAA weather data to forecast electric load and generation

A brief description of your use-case
The blog linked above describes our use case (ML for the electric grid - similar to Jack). For the purpose of this discussion though there are some key differences. We are not (yet!) using the NN's that Jack is using for PV forecasting. Our focus is on forecasting metered load on 0 to 48 hour timescale. We use a combination of autocorrelated features and weather features with XgBoost. We are reading HRRR grib2 files from GCP hosted by NODD. Here is the key difference: (for now) our compute cost on the high latency access is cheaper than building the local high performance training sets that Jack describes. We use kerchunk & fsspec under zarr and xarray to read the gridded timeseries (we store only byte range offsets to the grib2 blobs in the NODD buckets!). We build aggregations to flexibly access different variables and slices of the data. The flexibility, reduced cost of storage and lower operational overhead do come a the cost of CPU/Wall time during training and prediction. More than half of our wall time is spent waiting to read and extract the point time series from HRRR. The xarray operation seems to only use 1 to 2 cores at most right now.
Performance requirements (if any)
If we could max out the network IO on our host VM, even if we still had a lot of IO wait, that would be more than fine for now.
Chunk sizes
We are reading the native grib2 messages as byte range offsets via kerchunk and decoding with the grib c library. Here is the kerchunked zarr data for one step of the surface temperature variable. I believe the effective chunk size of the compressed data is the difference between the two numbers at the bottom.

    "t/.zarray": "{\"chunks\":[1,1059,1799],\"compressor\":null,\"dtype\":\"<f8\",\"fill_value\":9999.0,\"filters\":[{\"dtype\":\"float64\",\"id\":\"grib\",\"var\":\"t\"}],\"order\":\"C\",\"shape\":[1,1059,1799],\"zarr_format\":2}",
    "t/.zattrs": "{\"NV\":0,\"_ARRAY_DIMENSIONS\":[\"valid_time\",\"x\",\"y\"],\"cfName\":\"air_temperature\",\"cfVarName\":\"t\",\"dataDate\":20210601,\"dataTime\":0,\"dataType\":\"fc\",\"endStep\":0,\"gridDefinitionDescription\":\"Lambert Conformal can be
 secant or tangent, conical or bipolar\",\"gridType\":\"lambert\",\"missingValue\":9999,\"name\":\"Temperature\",\"numberOfPoints\":1905141,\"paramId\":130,\"shortName\":\"t\",\"stepType\":\"instant\",\"stepUnits\":1,\"typeOfLevel\":\"surface\",\"units\":\"
K\"}",
    "t/0.0.0": [
      "gcs://high-resolution-rapid-refresh/hrrr.20210601/conus/hrrr.t00z.wrfsfcf00.grib2",
      33399570,
      1357877
    ],

Compression
I have only gone far enough into cfgrib to be frustrated deal with thread safety issues. I believe the messages are compressed but not sure.
Operating system
Linux - mix of Arm64 and Amd64 ('cause picking one would be too easy)
Storage medium
Cloud storage buckets (GCS... maybe S3 from GCP if the data we need is only on AWS. The NODD owned buckets cover the egress cost so 🤷)
Do you use Xarray and/or Dask with Zarr?
Yes. Here is an example of what we are doing that I put together to track down a segfault (See above frustration with thread safety in CFGrib/Kerchunk). Operationally, I am working inside a scikit-learn pipeline, loading data in a process forked by joblib/loky. I have tried various dask configurations and even done some profiling of the threads using Yappi but it is really hard to nail down what is blocking. Coming up with a rigorous profiling approach is fantastic!
Where does your data come from?
NOAA NODD HRRR I will be adding GFS, GEFS and GOES satellite data later this summer!

I think the overall direction, unblocking the python GIL at key points in this stack using Rust is the way to go.
A few other folks working in this space as well:

0 replies

MosGeo · 2023-08-05T04:28:23Z

MosGeo
Aug 5, 2023

Ome-Zarr might be of interest. It is the new standard and it seems everyone in microscopy is converging on. It defines a specific structure and metadata to store microscope images. More info: https://github.com/ome/ngff

The zarr implementation: https://github.com/ome/ome-zarr-py

0 replies

mpiannucci · 2023-08-07T00:42:17Z

mpiannucci
Aug 7, 2023

RPS Ocean Science is using zarr to archive, distribute, and visualize model data as a part of IOOS and RPS' Next Generation Data Management Project.

A brief description of your use-case
Similar to @emfdavid, we are using kerchunk, keeping data in their native GRIB or NetCDF formats in the cloud, and using zarr to serve aggregated virtual datasets to data consumers. we are using xpublish to build both customized and standard APIs for using in end user applications
Performance requirements (if any)
We need to serve zarr datasets as part of webservices, spanning from WMS endpoints to netcdf subsets all created on the fly. For this to be viable total responses should be able to be computed/fetched asynchronously and returned to the user in a reasonable time frame.
Chunk sizes
We do not mess with chunk sizes, and defer to the unmodified model data output chunk sizes within grib or netcdf containers.
Compression
Typically uncompressed
Operating system
Linux, macos
Storage medium
AWS S3
Do you use Xarray and/or Dask with Zarr?
Yes
Where does your data come from?
NOAA NODD, IOOS Regional Data Affiliates.

This stack is working really well for us, the next step is figuring out more efficient ways to fetch the data from the cloud as part of web service backends while handling many requests at once. Coordinating async chunk fetching and cacheing is tough with xarray and dask for many reasons, but figuring out how to improve this is vital to improving our web services.

0 replies

JackKelly · 2023-08-07T08:19:42Z

JackKelly
Aug 7, 2023
Author

Thanks so much for all these use-cases! This will be super-useful!

It's very useful to see these kerchunk use-cases. I must admit I had been largely focused on improving the performance of Zarr datasets stored as Zarr on disk. But we clearly need to also think about how to speed up reading GRIB2 & NetCDF. (Which is fine: we were already thinking of splitting our work into at least two parts: General purpose, fast, async, batched IO. And a "Zarr front end". The IO backend could potentially also be used for reading GRIB and NetCDF).

I've added two more questions to the list (q9 and q10). But please don't feel obliged to update existing answers!

0 replies

mpiannucci · 2023-08-07T10:40:41Z

mpiannucci
Aug 7, 2023

In my experience, the speed of reading netcdf and grib tends to be reasonable, a single GFS GRIB chunk decode using cfgrib takes typically on the order of 30 ms uncached. This on the slower side too, spatially smaller datasets are much faster too, often decoding in under 10 ms. (I am also working on gribberish experimenting with improving grib reading and decoding with rust, which is showing some promise to be faster for most types of grib packing)

We are serving map tiles of the data though, so we are reading chunks of the entire world to make small pictures which is not efficient, but we are locked to using source data as distributed by NOAA or others for robustness reasons.

So the bottleneck for us is usually IO, as we are forced to read chunks of at least a mb or larger.

I hope this is illustrative and helpful!

0 replies

JackKelly · 2023-08-07T14:25:33Z

JackKelly
Aug 7, 2023
Author

For completeness: Also see this comment about "super-zarr" parallelism, especially dask... Perhaps our benchmarking workloads should include workloads which use dask

0 replies

ap-- · 2023-08-13T08:18:17Z

ap--
Aug 13, 2023

Here is another use case:

ML on digital pathology whole slide images

Description: Model training on large digital pathology whole slide image datasets in tiff based formats.
Performance requirements: Ideally we'd saturate the GPU during training while randomly sampling from the original files. Every now and then, when time allows, I'm experimenting with kerchunk, zarr and decoding using nvjpeg and nvjpeg2k to decompress directly to gpu.
Chunk sizes: One of the common tiff based file formats here usually contains (240, 240, 3), or (256, 256, 3) sized chunks on the relevant pyramidal layer.
Compression: Usually jpeg or jpeg2000
Operating system: Linux, local development happens mostly on macos
Storage medium: We train either from fast local ssd raids, from fast network storage, or directly from blob storage in the cloud
Do you use Xarray and/or Dask with Zarr?: Currently not.
Where does your data come from?: Internal, but there are lots of public example datasets that we can leverage for benchmarks (i.e. https://registry.opendata.aws/tcga/)
Access pattern: Random crops (usually ~ 2x2 or ~ 4x4 chunks)

0 replies

cywhale · 2023-08-15T13:50:23Z

cywhale
Aug 15, 2023

Our use case: Marine Heatwaves web API

A brief description of your use-case:
- Query marine heatwaves evaluated from 0.25-degree gridded NOAA OISST v2.1

Performance requirements (if any):
- For web API, we need fast load, and for appending other data, like SST, if averaging needed, then the computation on dask workers for the mean value of loaded SST from zarr is required. In this case, data load is fast enough, but the computation of mean may be time-consuming and may be not affordable for our local server.

Chunk sizes: (auto) date: 24, lat: 720, lon: 1440
Compression: none
Operating system: Ubuntu 20.04
Storage medium: local server
Do you use Xarray and/or Dask with Zarr? Xrray/Dask
Where does your data come from? NOAA OISST v2.1
If you use xarray: How do you index into your data? Using integer array indexes, or using coordinate labels (e.g. datetimes)? corrdinate labels
What do your access patterns look like when reading Zarr? For example, do you always read the entire dataset? Or do you only read single, random elements? Or do you read randomly located slices? Slicing by user query first, usually a bounding box of lon/lat and a period of time/decades

2 replies

jhamman Aug 15, 2023
Maintainer

Thanks for the use case @cywhale. One question: are you serving Zarr with your API or is Zarr only used on the backend? Either way, you may want to check out the Xpublish project which seems to have some overlap with your API.

cywhale Aug 15, 2023

@jhamman Use zarr in backend, and use FastAPI to handle web API. Data loaded from zarr, and subsetting data are furthur manipulated by dask, pandas, or polars in my applications. Great to know Xpublish that may be suitable for future use cases. Thanks.

TomNicholas · 2023-08-15T16:02:57Z

TomNicholas
Aug 15, 2023

Distributed Data Processing with Cubed

A brief description of your use-case

Processing large datasets using Cubed. Cubed is a serverless distributed framework that drops into xarray in place of dask, and tries to limit RAM usage in computations by writing intermediate steps to persistent storage (an idea taken from Pangeo's
Rechunker). It's an interesting use case of Zarr because a Cubed processing pipeline is limited by the rate at which we can read from and write to the intermediate Zarr stores.

Performance requirements (if any)

We haven't done much benchmarking (there is some comparison to using dask for the same problem in this blog post), but any IO speedup should make a big difference for Cubed (xref cubed-dev/cubed#187).

Chunk sizes

We've been recommending ~100MB chunks but I don't think this is a requirement in general.

Compression

No strong opinions AFAIK.

Operating system

In production Cubed is primarily designed to be deployed on serverless infrastructure, so Linux.

Storage medium

Blob storage in the cloud. The whole idea of Cubed is to have many serverless workers reading and writing to cloud storage in parallel.

Do you use Xarray and/or Dask with Zarr?

We can call cubed from xarray using cubed-xarray, see the blog post for details.

Where does your data come from?

Could be anything that you want to perform numpy-like operations on - Cubed is a domain-agnostic processing framework like dask.array is.

If you use xarray: How do you index into your data? Using integer array indexes, or using coordinate labels (e.g. datetimes)?

Potentially both. We want to support the whole xarray API.

What do your access patterns look like when reading Zarr? For example, do you always read the entire dataset? Or do you only read single, random elements? Or do you read randomly located slices?

Mostly similar to the initial read step that is typical when using dask. So we likely read either the entire dataset, or some large subset of it, in a pattern that corresponds to the analysis the user wants to do (e.g. a spatially-contiguous subset, or downsampling in time).

Then as Cubed performs reductions generally the size of the intermediate stores we create get smaller, and we are reading and writing those in their entirety at each step.

cc @tomwhite

0 replies

itsgifnotjiff · 2023-08-16T01:29:33Z

itsgifnotjiff
Aug 16, 2023

A brief description of your use-case
Geospatial data with model run and forecast time dimensions, lon lat (custom grids) levels (pressure or geopotential) and a bunch of data vars on the Petabyte scale daily.
Performance requirements (if any)
Quick writes and reasonable read speeds
Chunk sizes
chunk one forecast timke step at a time full spatial extent
Compression
maximum lossless (right now zarr is 3-4x bigger than our files and the inodes on a GPFS is insane for a zarr store)
5.Operating system
Linux (Redhat or Ubuntu)
Storage medium
GPFS Hard drives distributed but local for now
Do you use Xarray and/or Dask with Zarr?
Of course.
Where does your data come from?
Generated by cron jobs daily multiple times a day from observations.
If you use xarray: How do you index into your data? Using integer array indexes, or using coordinate labels (e.g. datetimes)?
Both. Levels are floats, model run and forecast hour are dattimes and other coordinates tend to be floats.
What do your access patterns look like when reading Zarr? For example, do you always read the entire dataset? Or do you only read
single, random elements? Or do you read randomly located slices?
HvPlot the whole dataset for exploration or very specific .sel , .isel, .where for filtering through all or some dimensions.

Biggest limiter is conserving metadata, custom grids and writing the data into an absurd ammount of storage and inodes.

0 replies

bluppfisk · 2023-09-06T13:27:32Z

bluppfisk
Sep 6, 2023

A brief description of your use-case -

We use Distributed Temperature Sensing (DTS) and Distributed Acoustic Sensing (DAS) technology, which produces arrays with a dimension of 2,000 x 5,000 each second. Data processing is done close to data production, and a first step is normalisation and standardisation of the data into an intermediate "buffer" from which new pipelines are started for further processing and feature extraction. Both this buffer and the final processed data are stored in zarr format. A 3rd dimension is added in some cases where we convert to the frequency domain.

Due to limitations on storage space, and because there's no need to keep our buffers indefinitely, we have extended zarr to support "trimmable" stores, i.e. where you can trim the start of the time axis while extending the end of the time axis as new data is appended and older data becomes obsolete.

We're planning to open source this code.

An API exposes the various data.

Performance requirements (if any) -

Our production system should be able to store 35 MB/s of data from its raw format into the buffer store, while also running the various postprocessing pipelines. We've chosen a high-end AMD Ryzen 5 with 128 GB of RAM.

Chunk sizes -

20,000 (time, 10s) x 1,000 (positions) for the preprocessing and frequency conversion. The time size is defined by incoming data size, which dictates further processing in chunks of 10 seconds.

Since the interrogator switches between fibers, there's a lot of time without data for the other fibers. Zarr does not need to write the empty chunks which saves a lot of space, although it does require some creative indexing and seeking.

Compression -

We use the standard blosc compressor which seems fast enough for our needs

Operating system -

Debian Linux (Bullseye)

Storage medium -

For one application, we have 10 x 16TB Seagate Ironwolfs in RAID10 with a Highpoint Rocket 720 controller to achieve a single BTRFS volume of about 80 TB.

Do you use Xarray and/or Dask with Zarr?

We use xarray in combination with some custom code. Currently dask is not used in this real-time pipeline, only for data analysis (for which we apply). We use celery to parallellise processing.

Where does your data come from?

From a DAS or DTS interrogator as HDF5 files. We also want to be able to process streamed data, e.g. via Apache Kafka in the AVRO format.

If you use xarray: How do you index your data? Using integer array indexes, or using coordinate labels (e.g. datetimes)?

We use integer array indexes with our custom implementation of the trimmable/shifted zarr store (also handling the timestamp - integer conversion under the hood).

What do your access patterns look like when reading Zarr? For example, do you always read the entire dataset? Or do you only read single, random elements? Or do you read randomly located slices?

We read slices (aligned with our chunk definition) for our data processing, whereas random slices for the API access.

0 replies

hansukyang · 2023-12-15T19:01:26Z

hansukyang
Dec 15, 2023

Fantastic thread! Very interesting read - thanks everyone for sharing. Here is our use case.

A brief description of your use-case

At Oikolab, we provide weather & climate data (ERA5/ERA5Land/GFS/HRRR etc.) to analysts - our target users are climate analysts or Excel/MATLAB warriors who might have a need for climate data and know enough to request specific dataset or parameters to feed into their workflow but don't have time to go to the primary sources or learn about all the tools (reference). We also provide weather parameters that are not normally in the primary dataset such as web-bulb temperature, which are calculated on the fly.

Most users look for location-based time-series data, say 100m wind data or solar data over 10 years, but we also have users who are looking to download a regional area (e.g. CONUS) say in NetCDF and just looking for a faster way to do this than going to the primary sources - in seconds or minutes rather than hours. Business users tend to track data for multiple locations so we provide ability to query with a list of lat/lon pairs, usually up to several hundreds at a time.

Performance requirements (if any)

We run these on commodity Ubuntu servers so we need to run our scripts with reasonable memory. For raw data processing, we’re normally limited by the network bandwidth (10Gps) for fetching/uploading data but we’ve had some issues in the past with memory leak (dask/distributed#5960) that’s not released back to the OS after each request which is a problem for running API servers.

Chunk sizes

We found 10~20mb to be a sweet spot for most of our datasets, although this needs to be balanced by the total number of chunks. There is some waste here as we read more data than needed typically but this is not the bottleneck for our use-case.

Compression

Nothing fancy here - we use the default as we’re mostly concerned with compression/decompression speed.

Operating system

Ubuntu/Debian mostly, and Mac for dev.

Storage medium

We use a combination of cloud-hosted S3 services (Wasabi, Digital Ocean etc.) and local attached volumes. Recently tried out R2 but found it to have lower rate limits than the others in terms number of objects written to it per second.

Do you use Xarray and/or Dask with Zarr?

Yes.

Where does your data come from?

National weather agencies such as NCEP, ECMWF. Most of these are in GRIB2 format so we run cronjobs to process the data. In our pipeline, we also use tools such as Wgrib2 and CDO for working with GRIB formats as we found them to be easier to work with than via cfgrib/Python.

If you use xarray: How do you index into your data? Using integer array indexes, or using coordinate labels (e.g. datetimes)?

We use datetimes.

What do your access patterns look like when reading Zarr? For example, do you always read the entire dataset? Or do you only read single, random elements? Or do you read randomly located slices?

For our API servers, entire datasets are read lazily on start up and applicable subsets are loaded as per user request - which could be a single point or a multiple points, or a region. They are almost always sliced by time and sometime by bounding box as per request.

0 replies

LDeakin · 2024-01-03T06:49:09Z

LDeakin
Jan 3, 2024

Description: At the Department of Materials Physics, Australian National University, we do acquisition and analysis of 3D (X-ray microCT, 100GB+) and 2D (SEM) images. In the past, we have used netCDF to store our datasets. We are now transitioning to Zarr V3 (with OME-NGFF/Zarr) to support web-based visualisation, improve compression, and improve interoperability.
Performance requirements: Most of our data pipelines are compute-bound rather than IO-bound, and we tend to only write and read data once. Regardless, I would expect to store and retrieve data at several GB/s on a node (depending on the compression method).
Chunk sizes: We've just been using $32^3$ chunks in $1024^3$ shards, but we might start using GPU sparse image block shapes for the chunks. Sharding is essential because of inode limits on our systems.
Compression: Currently, blosclz with bitshuffling seems to be a good all-rounder for our data, but we might use zlib for long-term storage. bitround is also very effective for our data.
Operating system: Linux.
Storage medium: Mostly a Lustre filesystem on a supercomputer. We are also serving data from this lustre filesystem over HTTP for visualisation.
Xarray/Dask: We plan to when Zarr V3 is supported. We use Xarray and Dask a bit with our netCDF datasets.
Where does your data come from?: MicroCT for 3D. Mostly SEM for 2D.
Xarray indexing: Integer array indexes.
Reading access pattern: Our image analysis pipelines mostly read entire datasets. Each chunk/shard is decoded once, and their elements are scattered as required. We just use neuroglancer for visualisation, which reads chunks.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please share your use-cases for Zarr (to help inform benchmarking) #1486

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 15 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Please share your use-cases for Zarr (to help inform benchmarking) #1486

Replies: 15 comments · 2 replies

JackKelly Aug 2, 2023 Author

normanrz Aug 3, 2023 Maintainer

JackKelly Aug 7, 2023 Author

JackKelly Aug 7, 2023 Author

ML on digital pathology whole slide images

jhamman Aug 15, 2023 Maintainer

Distributed Data Processing with Cubed

Replies: 15 comments 2 replies

JackKelly
Aug 2, 2023
Author

normanrz
Aug 3, 2023
Maintainer

JackKelly
Aug 7, 2023
Author

JackKelly
Aug 7, 2023
Author

jhamman Aug 15, 2023
Maintainer