Skip to content

Commit

Permalink
merge
Browse files Browse the repository at this point in the history
  • Loading branch information
LouiseDck committed Sep 11, 2024
2 parents 9f070a4 + be3a78b commit 664dec6
Show file tree
Hide file tree
Showing 14 changed files with 210 additions and 119 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,6 @@ WORKDIR /app

# run some compilation / build task (if needed)
RUN pixi install -a
RUN pixi run -e rverse bash scripts/setup.sh
RUN pixi run -e rverse bash book/disk_based/scripts/setup.sh

CMD ["pixi", "run", "pipeline"]
71 changes: 15 additions & 56 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,58 +4,35 @@ This book is a collection of notebooks and explanations for the workshop on **Po

## Installation

For the best polyglot experience, we recommend using [Pixi](https://pixi.sh/latest/) to manage your development environment. Once setup and in a clean shell without any active Python (`deactivate`) or Conda environments (`conda deactivate`), you can install all dependencies with the following command:
For the best polyglot experience on Linux, we recommend using [Pixi](https://pixi.sh/latest/) to manage your development environment. Environment creation support for Pixi on Windows and MacOS ARM is currently limited for R packages. Installation of the R dependencies in Pixi is more difficult, because Pixi does not support [post-link scripts](https://github.com/prefix-dev/pixi/issues/1573) and the bioconda channel for bioconductor packages does not yet support [osx-arm64](https://github.com/bioconda/bioconda-recipes/issues/33333).

```bash
pixi i -e dev
```

Installation of the R dependencies in Pixi is more difficult, because Pixi does not support [post-link scripts](https://github.com/prefix-dev/pixi/issues/1573) and the bioconda channel for bioconductor packages does not yet support [osx-arm64](https://github.com/bioconda/bioconda-recipes/issues/33333).

To fully install the R dependencies used in the notebooks, use a script via the following command:
In a clean Linux shell without any active Python (`deactivate`) or Conda environments (`conda deactivate`), you can install all dependencies with the following command:

```bash
pixi r setup_R
pixi install
```

## Usage

To run the notebooks in this notebooks yourself, you can use VSCode or RStudio. For VSCode, install the [Quarto extension in VSCode](https://quarto.org/docs/tools/vscode.html) to render the notebooks correctly. Click Run Cell and select the kernel `dev` located at the path `.pixi/envs/dev/bin/python`. For more information, see [this issue](https://github.com/prefix-dev/pixi/issues/411). For R, be sure to install the R extension and set the Rpath and Rterm in VSCode Settings to the correct path e.g. `${workspaceFolder}/.pixi/envs/default/bin/R`.

For RStudio, you have to [install RStudio globally](https://quarto.org/docs/tools/rstudio.html) and start using the Pixi task runner or make sure that `dev` environment installed in this project folder is used within RStudio.:

```bash
pixi r rstudio
```
For MacOS ARM and Windows, we recommend using Docker. For R users, we recommend `renv` to manage the R and Python dependencies.

## Development
## Linux

For development, some common tasks are available via the Pixi task runner. If you want more control, you can start a development shell using `pixi shell -e dev`.

To run the tests in the test environment, you can use the following command:

```bash
pixi r pytest
```

To run a auto-reloading preview server of the workshop book, use the following to start quarto in the dev environment. If you instead use quarto globally, the jupyter python3 kernel will not point to the Pixi dev environment:
To run the pipeline on Linux, use the following command:

```bash
pixi r preview
pixi run pipeline
```

Use the Render option in the Quarto extension in VSCode when editing the slides for an auto-reloading preview.
## Docker

To render the slides correctly in the workshop book site using [embedio](https://github.com/coatless-quarto/embedio), use the following command to create the revealjs and pdf version of the slides. This requires Docker to be installed and running on your system:
To run the pipeline with Docker, use the following command. The image is ~5GB and the pipeline can require a lot of working memory ~20GB, so make sure to increase the RAM allocated to Docker in your settings. Note that the usecase/data/ and scripts/ folders are mounted to the Docker container, so you can edit the scripts and access the data.

```bash
pixi r render_slides
docker pull berombau/polygloty-docker:latest
docker run -it -v $(pwd)/usecase/data:/app/usecase/data -v $(pwd)/book/disk_based/scripts:/app/scripts berombau/polygloty-docker:latest pixi run pipeline
```

## renv

I'm currently experiencing issues with getting `pixi` to install `rpy2`. As a temporary workaround, I'm using `renv` to manage the R dependencies.

### First time setup

To install the R and Python dependencies, use the following command. Start a new R session with `R` or run within `RStudio`:
Expand All @@ -65,7 +42,7 @@ install.packages("renv")
renv::restore()
```

On MacOS ARM, you will need [extra configuration](https://firas.io/posts/r_macos/) and patience to be able to build some of the packages.
On MacOS ARM, you will need [extra configuration](https://firas.io/posts/r_macos/) and patience to be able to build some of the packages. The Docker approach is recommended for MacOS ARM.

### Adding new packages

Expand Down Expand Up @@ -104,34 +81,16 @@ source renv/python/virtualenvs/renv-python-3.12/bin/activate
quarto render
```

## Pixi and Docker

Environment creation support for Pixi on Windows and MacOS ARM is currently limited for R packages. Only Linux and Docker are supported for the full pipeline.

### Linux

To run the pipeline on Linux, use the following command:

```bash
pixi run pipeline
```

### Docker

To run the pipeline with Docker, use the following command. The image is ~5GB and the pipeline can require a lot of working memory ~20GB, so make sure to increase the RAM allocated to Docker in your settings. Note that the usecase/data/ and scripts/ folders are mounted to the Docker container, so you can edit the scripts and access the data.

```bash
docker pull berombau/polygloty-docker:latest
docker run -it -v $(pwd)/usecase/data:/app/usecase/data -v $(pwd)/scripts:/app/scripts berombau/polygloty-docker:latest pixi run pipeline
```
## Extra

### Extra: building the Docker image yourself
### Building the Docker image yourself

To edit and build the Docker image yourself, use can use the following command.:

```bash
docker build -t polygloty-docker .
docker run -it -v $(pwd)/usecase/data:/app/usecase/data -v $(pwd)/scripts:/app/scripts polygloty-docker pixi run pipeline
docker run -it -v $(pwd)/usecase/data:/app/usecase/data -v $(pwd)/book/disk_based/scripts:/app/scripts polygloty-docker pixi run pipeline
```

To publish it to Docker Hub, use the following command:
Expand Down
14 changes: 14 additions & 0 deletions book/disk_based/_general_file_formats.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
| File Format | Python | R | Sparse matrix | Large images | Lazy chunk loading | Remote storage |
|-------------|--------|---|---------------|-------------|--------------------|----------------|
| RDS |||||||
| Pickle |||||||
| CSV |||||||
| JSON |||||||
| TIFF |||||||
| [.npy](https://numpy.org/doc/stable/reference/generated/numpy.lib.format.html#module-numpy.lib.format) |||||||
| [Parquet](https://parquet.apache.org/) |||||||
| [Feather](https://arrow.apache.org/docs/python/feather.html) |||||||
| [Lance](https://github.com/lancedb/lance) | ● | ○ | ● | ○ | ● | ●
| [HDF5](https://www.hdfgroup.org/) |||||||
| [Zarr](https://zarr.readthedocs.io/en/stable/) |||||||
| [TileDB](https://tiledb.io/) |||||||
11 changes: 11 additions & 0 deletions book/disk_based/_specialized_file_formats.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
| File Format | Python | R | Sparse matrix | Large images | Lazy chunk loading | Remote storage |
|-------------|--------|---|---------------|-------------|--------------------|----------------|
| [Seurat RDS](https://satijalab.org/seurat/) | ○ | ● | ○ | ◐ | ○ | ○
| [Indexed OME-TIFF](http://viv.gehlenborglab.org/#data-preparation) |||||||
| [h5Seurat](https://mojaveazure.github.io/seurat-disk/index.html) |||||||
| [Loom HDF5](https://loompy.org/) |||||||
| [AnnData h5ad](https://anndata.readthedocs.io/en/latest/anndata.zarr.html) |||||||
| [AnnData Zarr](https://anndata.readthedocs.io/en/latest/anndata.zarr.html) |||||||
| [TileDB-SOMA](https://tiledb.com/open-source/life-sciences) |||||||
| [TileDB-BioImaging](https://tiledb.com/open-source/life-sciences) |||||||
| [SpatialData Zarr](https://spatialdata.scverse.org/en/stable/) |||||||
Loading

0 comments on commit 664dec6

Please sign in to comment.