Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
osanseviero authored Sep 6, 2023
1 parent 4a21912 commit 433cc0f
Showing 1 changed file with 30 additions and 14 deletions.
44 changes: 30 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,21 @@ to power Hugging Chat, the Inference API and Inference Endpoint.

</div>

##
## Table of contents

- [Get Started](#get-started)
- [API Documentation](#api-documentation)
- [Using a private or gated model](#using-a-private-or-gated-model)
- [A note on Shared Memory](#a-note-on-shared-memory-shm)
- [Distributed Tracing](#distributed-tracing)
- [Local Install](#local-install)
- [CUDA Kernels](#cuda-kernels)
- [Optimized architectures](#optimized-architectures)
- [Run Falcon](#run-falcon)
- [Run](#run)
- [Quantization](#quantization)
- [Develop](#develop)
- [Testing](#testing)

Text Generation Inference (TGI) is a toolkit for deploying and serving Large Language Models (LLMs). TGI enables high-performance text generation for the most popular open-source LLMs, including Llama, Falcon, StarCoder, BLOOM, GPT-NeoX, and [more](https://huggingface.co/docs/text-generation-inference/supported_models). TGI implements many features, such as:

Expand Down Expand Up @@ -63,19 +77,7 @@ To see all options to serve your models (in the [code](https://github.com/huggin
text-generation-launcher --help
```

## Optimized architectures

TGI works out of the box to serve optimized models in [this list](https://huggingface.co/docs/text-generation-inference/supported_models).

Other architectures are supported on a best-effort basis using:

`AutoModelForCausalLM.from_pretrained(<model>, device_map="auto")`

or

`AutoModelForSeq2SeqLM.from_pretrained(<model>, device_map="auto")`

## API documentation
### API documentation

You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route. The
Swagger UI is also available at: [https://huggingface.github.io/text-generation-inference](https://huggingface.github.io/text-generation-inference).
Expand Down Expand Up @@ -164,6 +166,20 @@ the kernels by using the `DISABLE_CUSTOM_KERNELS=True` environment variable.

Be aware that the official Docker image has them enabled by default.

## Optimized architectures

TGI works out of the box to serve optimized models in [this list](https://huggingface.co/docs/text-generation-inference/supported_models).

Other architectures are supported on a best-effort basis using:

`AutoModelForCausalLM.from_pretrained(<model>, device_map="auto")`

or

`AutoModelForSeq2SeqLM.from_pretrained(<model>, device_map="auto")`



## Run Falcon

### Run
Expand Down

0 comments on commit 433cc0f

Please sign in to comment.