Skip to content

Commit

Permalink
update release workflow
Browse files Browse the repository at this point in the history
  • Loading branch information
ioangatop committed Mar 22, 2024
2 parents e8df175 + 4fa3e02 commit 4d46fc0
Show file tree
Hide file tree
Showing 27 changed files with 316 additions and 278 deletions.
9 changes: 5 additions & 4 deletions .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
---
name: Publish Develop Docs
name: Docs

on:
pull_request:
push:
branches:
- main
Expand Down Expand Up @@ -30,6 +29,8 @@ jobs:
run: |
git config user.email "[email protected]"
git config user.name "GitHub Action"
- name: Build Documentation
- name: Deploy Documentation
run: |
nox -s docs -- gh-deploy --force --remote-branch gh-pages
git fetch origin gh-pages:gh-pages
nox -s docs -- deploy --update-aliases dev
git push origin gh-pages
24 changes: 9 additions & 15 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ on:
jobs:
release-pypi:
runs-on: ubuntu-latest
permissions:
id-token: write
contents: write
steps:
- uses: actions/checkout@v4
- name: Setting up PDM
Expand Down Expand Up @@ -41,20 +44,11 @@ jobs:
git config --local user.name "GitHub Action"
git fetch origin gh-pages:gh-pages
tag="${{ github.ref_name }}"
DOC_VERSION="0.0.0dev5"
DOC_VERSION=${tag%.*}
nox -s deploy_docs -- --alias-type=copy --update-aliases "$DOC_VERSION" latest
git push origin gh-pages
# - name: Publish package distributions to PyPI
# run: pdm publish --no-build
# env:
# PDM_PUBLISH_USERNAME: ${{ secrets.PYPI_USERNAME }}
# PDM_PUBLISH_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
# - name: Create Release
# uses: actions/create-release@main
# env:
# GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# with:
# tag_name: ${{ github.ref }}
# release_name: v${{ github.ref }}
# draft: true
# prerelease: ${{ steps.check_version.outputs.PRERELEASE }}
- name: Publish package distributions to PyPI
run: nox -s publish -- --no-build
env:
PDM_PUBLISH_USERNAME: ${{ secrets.PYPI_USERNAME }}
PDM_PUBLISH_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
142 changes: 102 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,85 +1,147 @@
<div align="center">

<img src="./docs/images/eva-logo.png" width="400">
<img src="https://github.com/kaiko-ai/eva/blob/main/docs/images/eva-logo.png?raw=true" width="400">

<br />

_Oncology FM Evaluation Framework by kaiko.ai_


<a href="https://www.apache.org/licenses/LICENSE-2.0">
<img src="https://img.shields.io/badge/License-Apache%202.0-blue?style=flat-square" />
</a>

<br />
<br />
[![PyPI](https://img.shields.io/pypi/v/kaiko-eva.svg?logo=python)](https://pypi.python.org/pypi/kaiko-eva)
[![CI](https://github.com/kaiko-ai/eva/workflows/CI/badge.svg)](https://github.com/kaiko-ai/eva/actions?query=workflow%3ACI)
[![license](https://img.shields.io/badge/License-Apache%202.0-blue.svg?labelColor=gray)](https://github.com/kaiko-ai/eva#license)

<p align="center">
<a href="#installation">Installation</a> •
<a href="#how-to-use">How To Use</a> •
<a href="#datasets">Datasets</a> •
<a href="#contributing">Contribute</a>
<a href="https://github.com/kaiko-ai/eva#installation">Installation</a> •
<a href="https://github.com/kaiko-ai/eva#how-to-use">How To Use</a> •
<a href="https://kaiko-ai.github.io/eva/">Documentation</a> •
<a href="https://kaiko-ai.github.io/eva/dev/datasets/">Datasets</a> •
<a href="https://github.com/kaiko-ai/eva#benchmarks">Benchmarks</a> <br>
<a href="https://github.com/kaiko-ai/eva#contributing">Contribute</a> •
<a href="https://github.com/kaiko-ai/eva#acknowledgements">Acknowledgements</a>
</p>

</div>

---

### _About_
<br />

`eva` is [kaiko.ai](https://kaiko.ai/)'s evaluation framework for oncology foundation models (FMs). Check out the documentation (LINK TO BE ADDED) for more information.
_`eva`_ is an evaluation framework for oncology foundation models (FMs) by [kaiko.ai](https://kaiko.ai/). Check out the [documentation](https://kaiko-ai.github.io/eva/) for more information.

### Highlights:
- Easy and reliable benchmark of Oncology FMs
- Automatic embedding inference and evaluation of a downstream task
- Native support of popular medical [datasets](https://kaiko-ai.github.io/eva/dev/datasets/) and models
- Produce statistics over multiple evaluation fits and multiple metrics

## Installation

*Note: this section will be revised for the public package when publishing eva*
Simple installation from PyPI:
```sh
# to install the core version only
pip install kaiko-eva

- Create and activate a virtual environment with Python 3.10+
# to install the expanded `vision` version
pip install 'kaiko-eva[vision]'

- Install *eva* and the *eva-vision* package with:
# to install everything
pip install 'kaiko-eva[all]'
```

To install the latest version of the `main` branch:
```sh
pip install "kaiko-eva[all] @ git+https://github.com/kaiko-ai/eva.git"
```
pip install 'kaiko-eva[vision]'

You can verify that the installation was successful by executing:
```sh
eva --version
```

- To be able to use the existing configs, download them from the [*eva* GitHub repo](https://github.com/kaiko-ai/eva/tree/main) and move them to directory where you installed *eva*.
## How To Use

### Run *eva*
_eva_ can be used directly from the terminal as a CLI tool as follows:
```sh
eva {fit,predict,predict_fit} --config url/or/path/to/the/config.yaml
```

Now you can run a complete *eva* workflow, for example with:
For example, to perform a downstream evaluation of DINO ViT-S/16 on the BACH dataset with linear probing by first inferring the embeddings and performing 5 sequential fits, execute:
```sh
eva predict_fit --config https://raw.githubusercontent.com/kaiko-ai/eva/main/configs/vision/dino_vit/offline/bach.yaml
```
eva fit --config configs/vision/dino_vit/online/bach.yaml

> [!NOTE]
> All the datasets that support automatic download in the repo have by default the option to automatically download set to false. For automatic download you have to manually set download=true.

To view all the possibles, execute:
```sh
eva --help
```
This will:

- Download and extract the dataset, if it has not been downloaded before.
- Fit a model consisting of the frozen FM-backbone and a classification head on the train split.
- Evaluate the trained model on the validation split and report the results.
For more information, please refer to the [documentation](https://kaiko-ai.github.io/eva/dev/user-guide/tutorials/offline_vs_online/) and [tutorials](https://kaiko-ai.github.io/eva/dev/user-guide/advanced/replicate_evaluations/).

For more information, documentation and tutorials, refer to the documentation (LINK TO BE ADDED).
## Benchmarks

## Datasets
In this section you will find model benchmarks which were generated with _eva_.

The following datasets are supported natively:
### Table I: WSI patch-level benchmark

### Vision
<br />

<div align="center">

| Model | BACH | CRC | MHIST | PCam/val | PCam/test |
|--------------------------------------------------|-------|-------|-------|----------|-----------|
| ViT-S/16 _(random)_ <sup>[1]</sup> | 0.410 | 0.617 | 0.501 | 0.753 | 0.728 |
| ViT-S/16 _(ImageNet)_ <sup>[1]</sup> | 0.695 | 0.935 | 0.831 | 0.864 | 0.849 |
| ViT-B/8 _(ImageNet)_ <sup>[1]</sup> | 0.710 | 0.939 | 0.814 | 0.870 | 0.856 |
| DINO<sub>(p=16)</sub> <sup>[2]</sup> | 0.801 | 0.934 | 0.768 | 0.889 | 0.895 |
| Phikon <sup>[3]</sup> | 0.725 | 0.935 | 0.777 | 0.912 | 0.915 |
| ViT-S/16 _(kaiko.ai)_ <sup>[4]</sup> | 0.797 | 0.943 | 0.828 | 0.903 | 0.893 |
| ViT-S/8 _(kaiko.ai)_ <sup>[4]</sup> | 0.834 | 0.946 | 0.832 | 0.897 | 0.887 |
| ViT-B/16 _(kaiko.ai)_ <sup>[4]</sup> | 0.810 | 0.960 | 0.826 | 0.900 | 0.898 |
| ViT-B/8 _(kaiko.ai)_ <sup>[4]</sup> | 0.865 | 0.956 | 0.809 | 0.913 | 0.921 |
| ViT-L/14 _(kaiko.ai)_ <sup>[4]</sup> | 0.870 | 0.930 | 0.809 | 0.908 | 0.898 |

_Table I: Linear probing evaluation of FMs on patch-level downstream datasets.<br> We report averaged balanced accuracy
over 5 runs, with an average standard deviation of ±0.003._

#### Patch-level pathology datasets:
- [BACH](./docs/datasets/bach.md)
- [CRC](./docs/datasets/crc.md)
- [MHIST](./docs/datasets/mhist.md)
- [PatchCamelyon](./docs/datasets/patch_camelyon.md)
</div>

<br />

#### Radiology datasets:
- [TotalSegmentator](./docs/datasets/total_segmentator.md)
_References_:
1. _"Emerging properties in self-supervised vision transformers”_
2. _"Benchmarking self-supervised learning on diverse pathology datasets”_
3. _"Scaling self-supervised learning for histopathology with masked image modeling”_
4. _"Towards Training Large-Scale Pathology Foundation Models: from TCGA to Hospital Scale”_

## Contributing

_eva_ is an open source project and welcomes contributions of all kinds. Please checkout the [developer](./docs/DEVELOPER_GUIDE.md) and [contributing guide](./docs/CONTRIBUTING.md) for help on how to do so.

All contributors must follow the [code of conduct](./docs/CODE_OF_CONDUCT.md).


## Acknowledgements

Our codebase is built using multiple opensource contributions

<div align="center">

[![python](https://img.shields.io/badge/-Python-blue?logo=python&logoColor=white)](https://github.com/pre-commit/pre-commit)
[![pytorch](https://img.shields.io/badge/PyTorch-ee4c2c?logo=pytorch&logoColor=white)](https://pytorch.org/get-started/locally/)
[![lightning](https://img.shields.io/badge/-⚡️_Lightning-792ee5?logo=pytorchlightning&logoColor=white)](https://pytorchlightning.ai/)<br>
[![black](https://img.shields.io/badge/Code%20Style-Black-black.svg?labelColor=gray)](https://black.readthedocs.io/en/stable/)
[![isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![Checked with pyright](https://microsoft.github.io/pyright/img/pyright_badge.svg)](https://microsoft.github.io/pyright/)<br>
[![pdm-managed](https://img.shields.io/badge/pdm-managed-blueviolet)](https://pdm-project.org)
[![Nox](https://img.shields.io/badge/%F0%9F%A6%8A-Nox-D85E00.svg)](https://github.com/wntrblm/nox)
[![Built with Material for MkDocs](https://img.shields.io/badge/Material_for_MkDocs-526CFE?logo=MaterialForMkDocs&logoColor=white)](https://squidfunk.github.io/mkdocs-material/)

</div>

---
<div align="center">
<img src="./docs/images/kaiko-logo.png" width="200">
<img src="https://github.com/kaiko-ai/eva/blob/main/docs/images/kaiko-logo.png?raw=true" width="200">
</div>
6 changes: 3 additions & 3 deletions docs/datasets/bach.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# BACH

The BACH dataset consists of microscopy and WSI images, of which we use only the microscopy images. These are 408 labelled images from 4 classes ("Normal", "Benign", "Invasive", "InSitu"). This dataset was used for the "BACH Grand Challenge on Breast Cancer Histology images".
The BACH dataset consists of microscopy and WSI images, of which we use only the microscopy images. These are 408 labeled images from 4 classes ("Normal", "Benign", "Invasive", "InSitu"). This dataset was used for the "BACH Grand Challenge on Breast Cancer Histology images".


## Raw data
Expand All @@ -17,7 +17,7 @@ The BACH dataset consists of microscopy and WSI images, of which we use only the
| **Magnification (μm/px)** | 20x (0.42) |
| **Files format** | `.tif` images |
| **Number of images** | 408 (102 from each class) |
| **Splits in use** | one labelled split |
| **Splits in use** | one labeled split |


### Organization
Expand All @@ -26,7 +26,7 @@ The data `ICIAR2018_BACH_Challenge.zip` from [zenodo](https://zenodo.org/records

```
ICAR2018_BACH_Challenge
├── Photos # All labelled patches used by eva
├── Photos # All labeled patches used by eva
│ ├── Normal
│ │ ├── n032.tif
│ │ └── ...
Expand Down
10 changes: 5 additions & 5 deletions docs/datasets/crc.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# CRC

The CRC-HE dataset consists of labelled patches (9 classes) from colorectal cancer (CRC) and normal tissue. We use the `NCT-CRC-HE-100K` dataset for training and validation and the `CRC-VAL-HE-7K for testing`.
The CRC-HE dataset consists of labeled patches (9 classes) from colorectal cancer (CRC) and normal tissue. We use the `NCT-CRC-HE-100K` dataset for training and validation and the `CRC-VAL-HE-7K for testing`.

The `NCT-CRC-HE-100K-NONORM` consists of 100,000 images without applied color normalization. The `CRC-VAL-HE-7K` consists of 7,180 image patches from 50 patients without overlap with `NCT-CRC-HE-100K-NONORM`.

Expand Down Expand Up @@ -45,18 +45,18 @@ from [zenodo](https://zenodo.org/records/1214456) are organized as follows:

```
NCT-CRC-HE-100K # All images used for training
├── ADI # All labelled patches belonging to the 1st class
├── ADI # All labeled patches belonging to the 1st class
│ ├── ADI-AAAFLCLY.tif
│ ├── ...
├── BACK # All labelled patches belonging to the 2nd class
├── BACK # All labeled patches belonging to the 2nd class
│ ├── ...
└── ...
NCT-CRC-HE-100K-NONORM # All images used for training
├── ADI # All labelled patches belonging to the 1st class
├── ADI # All labeled patches belonging to the 1st class
│ ├── ADI-AAAFLCLY.tif
│ ├── ...
├── BACK # All labelled patches belonging to the 2nd class
├── BACK # All labeled patches belonging to the 2nd class
│ ├── ...
└── ...
Expand Down
4 changes: 2 additions & 2 deletions docs/datasets/patch_camelyon.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# PatchCamelyon


The PatchCamelyon benchmark is a image classification dataset with 327,680 color images (96 x 96px) extracted from histopathologic scans of lymph node sections. Each image is annotated with a binary label indicating presence of metastatic tissue.
The PatchCamelyon benchmark is an image classification dataset with 327,680 color images (96 x 96px) extracted from histopathologic scans of lymph node sections. Each image is annotated with a binary label indicating presence of metastatic tissue.

## Raw data

Expand All @@ -23,7 +23,7 @@ The PatchCamelyon benchmark is a image classification dataset with 327,680 color

### Splits

The datasource provides train/validation/test splits
The data source provides train/validation/test splits

| Splits | Train | Validation | Test |
|---|---------------|--------------|--------------|
Expand Down
4 changes: 2 additions & 2 deletions docs/datasets/total_segmentator.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ The TotalSegmentator dataset is a radiology image-segmentation dataset with 1228
| **Image dimension** | ~300 x ~300 x ~350 (number of slices) x 1 (grey scale) * |
| **Files format** | `.nii` ("NIFTI") images |
| **Number of images** | 1228 |
| **Splits in use** | one labelled split |
| **Splits in use** | one labeled split |

/* image resolution and number of slices per image vary

Expand All @@ -37,7 +37,7 @@ Totalsegmentator_dataset_v201

- The dataset class `TotalSegmentator` supports download the data on runtime with the initialized argument
`download: bool = True`.
- For the multilabel classification task, every mask with at least one positive pixel is gets the label "1", all others get the label "0".
- For the multilabel classification task, every mask with at least one positive pixel it gets the label "1", all others get the label "0".
- For the multilabel classification task, the `TotalSegmentator` class creates a manifest file with one row/slice and the columns: `path`, `slice`, `split` and additional 117 columns for each class.
- The 3D images are treated as 2D. Every 25th slice is sampled and treated as individual image
- The splits with the following sizes are created after ordering images by filename:
Expand Down
Loading

0 comments on commit 4d46fc0

Please sign in to comment.