Skip to content

Commit

Permalink
[Docs] Improve virtual ref docs (#284)
Browse files Browse the repository at this point in the history
* Improve vritual ref docs

* More detail
  • Loading branch information
mpiannucci authored Oct 16, 2024
1 parent 91086b3 commit d124bed
Show file tree
Hide file tree
Showing 3 changed files with 75 additions and 8 deletions.
13 changes: 7 additions & 6 deletions docs/docs/icechunk-python/configuration.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# Configuration

When creating and opening Icechunk stores, there are a two different sets of configuration to be aware of:
- `StorageConfig` - for configuring access to the object store or filesystem
- `StoreConfig` - for configuring the behavior of the Icechunk Store itself

- [`StorageConfig`](./reference.md#icechunk.StorageConfig) - for configuring access to the object store or filesystem
- [`StoreConfig`](./reference.md#icechunk.StoreConfig) - for configuring the behavior of the Icechunk Store itself

## Storage Config

Expand All @@ -15,7 +16,7 @@ When using Icechunk with s3 compatible storage systems, credentials must be prov
=== "From environment"

With this option, the credentials for connecting to S3 are detected automatically from your environment.
This is usually the best choice if you are connecting from within an AWS environment (e.g. from EC2).
This is usually the best choice if you are connecting from within an AWS environment (e.g. from EC2). [See the API](./reference.md#icechunk.StorageConfig.s3_from_env)

```python
icechunk.StorageConfig.s3_from_env(
Expand All @@ -26,7 +27,7 @@ When using Icechunk with s3 compatible storage systems, credentials must be prov

=== "Provide credentials"

With this option, you provide your credentials and other details explicitly.
With this option, you provide your credentials and other details explicitly. [See the API](./reference.md#icechunk.StorageConfig.s3_from_config)

```python
icechunk.StorageConfig.s3_from_config(
Expand All @@ -47,7 +48,7 @@ When using Icechunk with s3 compatible storage systems, credentials must be prov
=== "Anonymous"

With this option, you connect to S3 anonymously (without credentials).
This is suitable for public data.
This is suitable for public data. [See the API](./reference.md#icechunk.StorageConfig.s3_anonymous)

```python
icechunk.StorageConfig.s3_anonymous(
Expand All @@ -59,7 +60,7 @@ When using Icechunk with s3 compatible storage systems, credentials must be prov

### Filesystem Storage

Icechunk can also be used on a local filesystem by providing a path to the location of the store
Icechunk can also be used on a [local filesystem](./reference.md#icechunk.StorageConfig.filesystem) by providing a path to the location of the store

=== "Local filesystem"

Expand Down
45 changes: 44 additions & 1 deletion docs/docs/icechunk-python/virtual.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,4 +156,47 @@ Finally, let's make a plot of the sea surface temperature!
ds.sst.isel(time=26, zlev=0).plot(x='lon', y='lat', vmin=0)
```

![oisst](../assets/datasets/oisst.png)
![oisst](../assets/datasets/oisst.png)

## Virtual Reference API

While `VirtualiZarr` is the easiest way to create virtual datasets with Icechunk, the Store API that it uses to create the datasets in Icechunk is public. `IcechunkStore` contains a [`set_virtual_ref`](./reference.md#icechunk.IcechunkStore.set_virtual_ref) method that specifies a virtual ref for a specified chunk.

### Virtual Reference Storage Support

Currently, Icechunk supports two types of storage for virtual references:

#### S3 Compatible

References to files accessible via S3 compatible storage.

##### Example

Here is how we can set the chunk at key `c/0` to point to a file on an s3 bucket,`mybucket`, with the prefix `my/data/file.nc`:

```python
store.set_virtual_ref('c/0', 's3://mybucket/my/data/file.nc', offset=1000, length=200)
```

##### Configuration

S3 virtual references require configuring credential for the store to be able to access the specified s3 bucket. See [the configuration docs](./configuration.md#virtual-reference-storage-config) for instructions.


#### Local Filesystem

References to files accessible via local filesystem. This requires any file paths to be **absolute** at this time.

##### Example

Here is how we can set the chunk at key `c/0` to point to a file on my local filesystem located at `/path/to/my/file.nc`:

```python
store.set_virtual_ref('c/0', 'file:///path/to/my/file.nc', offset=20, length=100)
```

No extra configuration is necessary for local filesystem references.

### Virtual Reference File Format Support

Currently, Icechunk supports `HDF5` and `netcdf4` files for use in virtual references. See the [tracking issue](https://github.com/earth-mover/icechunk/issues/197) for more info.
25 changes: 24 additions & 1 deletion icechunk-python/python/icechunk/_icechunk_python.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -253,6 +253,8 @@ class KeyNotFound(Exception):
): ...

class StoreConfig:
"""Configuration for an IcechunkStore"""

# The number of concurrent requests to make when fetching partial values
get_partial_values_concurrency: int | None
# The threshold at which to inline chunks in the store in bytes. When set,
Expand All @@ -270,7 +272,28 @@ class StoreConfig:
inline_chunk_threshold_bytes: int | None = None,
unsafe_overwrite_refs: bool | None = None,
virtual_ref_config: VirtualRefConfig | None = None,
): ...
):
"""Create a StoreConfig object with the given configuration options
Parameters
----------
get_partial_values_concurrency: int | None
The number of concurrent requests to make when fetching partial values
inline_chunk_threshold_bytes: int | None
The threshold at which to inline chunks in the store in bytes. When set,
chunks smaller than this threshold will be inlined in the store. Default is
512 bytes when not specified.
unsafe_overwrite_refs: bool | None
Whether to allow overwriting refs in the store. Default is False. Experimental.
virtual_ref_config: VirtualRefConfig | None
Configurations for virtual references such as credentials and endpoints
Returns
-------
StoreConfig
A StoreConfig object with the given configuration options
"""
...

async def async_pyicechunk_store_exists(storage: StorageConfig) -> bool: ...
def pyicechunk_store_exists(storage: StorageConfig) -> bool: ...
Expand Down

0 comments on commit d124bed

Please sign in to comment.