Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more details for Hugging Face PyTorch DLCs for Inference #83

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 16 additions & 6 deletions containers/pytorch/inference/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,20 @@ Additionally, if you're willing to run the Docker container in GPUs you will nee

## Run

Before running this container, you will need to select any supported model from the [Hugging Face Hub offering for `transformers`](https://huggingface.co/models?library=transformers&sort=trending), as well as the task that the model runs as e.g. text-classification.
Before running this container, you will need to select any supported model from the Hugging Face Hub offering for [`transformers`](https://huggingface.co/models?library=transformers&sort=trending), [`diffusers`](https://huggingface.co/models?library=diffusers&sort=trending), and [`sentence-transformers`](https://huggingface.co/models?library=sentence-transformers&sort=trending), as well as the task that the model runs.

The Hugging Face PyTorch DLCs for Inference come with a pre-defined entrypoint, so to run those you only need to define the environment variable values of the model and task that you want to deploy, being the `HF_MODEL_ID` and `HF_TASK` respectively. Besides those, you can also define a wide range of environment variable values supported within the [`huggingface-inference-toolkit`](https://github.com/huggingface/huggingface-inference-toolkit) as detailed [here](https://github.com/huggingface/huggingface-inference-toolkit?tab=readme-ov-file#%EF%B8%8F-environment-variables).

> [!NOTE]
> As [huggingface-inference-toolkit](https://github.com/huggingface/huggingface-inference-toolkit) is built to be fully compatible with Google Vertex AI, then you can also set the environment variables defined by Vertex AI such as `AIP_MODE=PREDICTION`, `AIP_HTTP_PORT=8080`, `AIP_PREDICT_ROUTE=/predict`, `AIP_HEALTH_ROUTE=/health`, and some more. To read about all the exposed environment variables in Vertex AI please check [Vertex AI Documentation - Custom container requirements for prediction](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables).

### Supported Tasks

You can find a list of the supported tasks with a brief introduction, links to the documentation and an example on how to use those within the Hugging Face PyTorch DLC for Inference [here](./TASKS.md).

### Supported Hardware

The Hugging Face PyTorch DLCs for Inference are available for both CPU and GPU, and you can select the container based on the hardware you have available.

- **CPU**

Expand All @@ -47,12 +60,9 @@ Before running this container, you will need to select any supported model from
us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-inference-cu121.2-2.transformers.4-44.ubuntu2204.py311
```

> [!NOTE]
> As [huggingface-inference-toolkit](https://github.com/huggingface/huggingface-inference-toolkit) is built to be fully compatible with Google Vertex AI, then you can also set the environment variables defined by Vertex AI such as `AIP_MODE=PREDICTION`, `AIP_HTTP_PORT=8080`, `AIP_PREDICT_ROUTE=/predict`, `AIP_HEALTH_ROUTE=/health`, and some more. To read about all the exposed environment variables in Vertex AI please check [Vertex AI Documentation - Custom container requirements for prediction](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables).

## Test

Once the Docker container is running, you can start sending requests to the `/predict` endpoint which is the default endpoint exposed by the PyTorch Inference containers (unless overridden with `AIP_PREDICT_ROUTE` on build time).
Once the Docker container is running, you can start sending requests to the `/predict` endpoint which is the default endpoint exposed by the Hugging Face PyTorch DLCs for Inference (unless overridden with `AIP_PREDICT_ROUTE` on run time).

```bash
curl http://0.0.0.0:5000/predict \
Expand All @@ -65,7 +75,7 @@ curl http://0.0.0.0:5000/predict \
```

> [!NOTE]
> The [huggingface-inference-toolkit](https://github.com/huggingface/huggingface-inference-toolkit) is powered by the `pipeline` method within `transformers`, that means that the payload will be different based on the model that you're deploying. So on, before sending requests to the deployed model, you will need to first check which is the task that the `pipeline` method and the model support and are running. To read more about the `pipeline` and the supported tasks please check [Transformers Documentation - Pipelines](https://huggingface.co/docs/transformers/en/main_classes/pipelines).
> You can see which are the expected input and output payloads for each task, as that's conditioned by the `HF_TASK` environment variable value set during the `docker run` command, [here](./TASKS.md).

## Optional

Expand Down
84 changes: 84 additions & 0 deletions containers/pytorch/inference/TASKS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
## Hugging Face PyTorch DLC for Inference - Supported Tasks

Please find below all the supported tasks for each library at the time of writing this document:

### Transformers (WIP)

<details>
<summary>text-classification</summary>
</details>

### Sentence Transformers

<details>
<summary>sentence-similarity</summary>
Sentence Similarity is the task of determining how similar two texts are. Sentence similarity models convert input texts into vectors (embeddings) that capture semantic information and calculate how close (similar) they are between them. This task is particularly useful for information retrieval and clustering/grouping.

It can be used via the [`huggingface-inference-toolkit`](https://github.com/huggingface/huggingface-inference-toolkit) (running on top of the `SentenceTransformer` class from the [`sentence-transformers`](https://github.com/UKPLab/sentence-transformers) library) by setting the `HF_TASK` environment variable to `sentence-similarity` and the `HF_MODEL_ID` to the model ID of the model you want to deploy.

Below you can find an example with the environment variable values:

```bash
HF_MODEL_ID=BAAI/bge-m3
HF_TASK=sentence-similarity
```

More information about the sentence-similarity task at [Hugging Face Documentation - Sentence Similarity](https://huggingface.co/tasks/sentence-similarity) and at [Sentence Transformers Documentation - Sentence Transformer](https://sbert.net/docs/quickstart.html#sentence-transformer), and explore [all the supported sentence-similarity models on the Hugging Face Hub](https://huggingface.co/models?pipeline_tag=sentence-similarity&library=sentence-transformers&sort=trending).

</details>

<details>
<summary>sentence-embeddings</summary>
Sentence Embeddings is the task of converting input texts into vectors (embeddings) that capture semantic information. Sentence embeddings models are useful for a wide range of taskssuch as semantic textual similarity, semantic search, clustering, classification, paraphrase mining, and more.

It can be used via the [`huggingface-inference-toolkit`](https://github.com/huggingface/huggingface-inference-toolkit) (running on top of the `SentenceTransformer` class from the [`sentence-transformers`](https://github.com/UKPLab/sentence-transformers) library) by setting the `HF_TASK` environment variable to `sentence-embeddings` and the `HF_MODEL_ID` to the model ID of the model you want to deploy.

Below you can find an example with the environment variable values:

```bash
HF_MODEL_ID=BAAI/bge-m3
HF_TASK=sentence-embeddings
```

More information about the sentence-embeddings task at [Sentence Transformers Documentation - Sentence Transformer](https://sbert.net/docs/quickstart.html#sentence-transformer), and explore [all the supported sentence-similarity models on the Hugging Face Hub](https://huggingface.co/models?library=sentence-transformers&sort=trending).

</details>

<details>
<summary>sentence-ranking</summary>
Sentence Ranking is the task of determining the relevance of a text to a query. Sentence ranking models convert input texts into vectors (embeddings) that capture semantic information and calculate how relevant they are to a query. This task is particularly useful for information retrieval and search engines.

It can be used via the [`huggingface-inference-toolkit`](https://github.com/huggingface/huggingface-inference-toolkit) (running on top of the `CrossEncoder` class from the [`sentence-transformers`](https://github.com/UKPLab/sentence-transformers) library) by setting the `HF_TASK` environment variable to `sentence-ranking` and the `HF_MODEL_ID` to the model ID of the model you want to deploy.

Below you can find an example with the environment variable values:

```bash
HF_MODEL_ID=BAAI/bge-reranker-v2-m3
HF_TASK=sentence-ranking
```

More information about the sentence-ranking task at [Sentence Transformers Documentation - Cross Encoder](https://sbert.net/docs/quickstart.html#cross-encoder), and explore [all the supported sentence-ranking models on the Hugging Face Hub](https://huggingface.co/models?pipeline_tag=text-classification&library=sentence-transformers&sort=trending).

</details>

### Diffusers

<details>
<summary>text-to-image</summary>
Text-to-Image is a task that generates images from input text. These models can be used to generate and modify images based on text prompts.

It can be used via the [`huggingface-inference-toolkit`](https://github.com/huggingface/huggingface-inference-toolkit) (running on top of the `AutoPipelineForText2Image` from the [`diffusers`](https://github.com/huggingface/diffusers) library) by setting the `HF_TASK` environment variable to `text-to-image` and the `HF_MODEL_ID` to the model ID of the model you want to deploy.

Below you can find an example with the environment variable values:

```bash
HF_MODEL_ID=black-forest-labs/FLUX.1-dev
HF_TASK=text-to-image
```

More information about the text-to-image task at [Hugging Face Documentation - Text to Image](https://huggingface.co/tasks/text-to-image), and explore [all the supported text-to-image models on the Hugging Face Hub](https://huggingface.co/models?pipeline_tag=text-to-image&library=diffusers&sort=trending).

</details>

> [!NOTE]
> More tasks and models will be supported in the future, so please check [`huggingface-inference-toolkit`](https://github.com/huggingface/huggingface-inference-toolkit) for the latest updates.