Skip to content

Commit

Permalink
Update READMEs badges and links (#1)
Browse files Browse the repository at this point in the history
* init

* Update README.md

* Update README.md

* Update README.md
  • Loading branch information
chunnienc authored May 14, 2024
1 parent 0df5b6b commit 62576a9
Show file tree
Hide file tree
Showing 4 changed files with 12 additions and 12 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ PyPi Package (Linx) | [![](https://github.com/google-ai-edge/ai-edge-torch/ac
* Python versions: 3.9, 3.10, 3.11
* Operating system: Linux
* PyTorch: ![torch](https://img.shields.io/badge/torch-2.4.0.dev20240429-blue)
* TensorFlow: [![tf-nightly](https://img.shields.io/badge/tf--nightly-2.17.0.dev20240430-blue)](https://pypi.org/project/tf-nightly/)
* TensorFlow: [![tf-nightly](https://img.shields.io/badge/tf--nightly-2.17.0.dev20240509-blue)](https://pypi.org/project/tf-nightly/)

<!-- requirement badges are updated by ci/update_nightly_versions.py -->

Expand Down
10 changes: 5 additions & 5 deletions ai_edge_torch/generative/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,13 +57,13 @@ Once you re-author the model and validate its numerical accuracy, you can conver
For example, in the `generative/examples/test_models/toy_model_with_kv_cache.py`, you can define inputs for both signatures:

Sample inputs for the `prefill` signature:
https://github.com/google-ai-edge/ai-edge-torch-archive/blob/1791dec62f1d3f60e7fe52138640d380f58b072d/ai_edge_torch/generative/examples/test_models/toy_model_with_kv_cache.py#L105-L108
https://github.com/google-ai-edge/ai-edge-torch/blob/853301630f2b2455bd2e2f73d8a47e1a1534c91c/ai_edge_torch/generative/examples/test_models/toy_model_with_kv_cache.py#L105-L108

Sample inputs for the `decode` signature:
https://github.com/google-ai-edge/ai-edge-torch-archive/blob/1791dec62f1d3f60e7fe52138640d380f58b072d/ai_edge_torch/generative/examples/test_models/toy_model_with_kv_cache.py#L111-L114
https://github.com/google-ai-edge/ai-edge-torch/blob/853301630f2b2455bd2e2f73d8a47e1a1534c91c/ai_edge_torch/generative/examples/test_models/toy_model_with_kv_cache.py#L111-L114

Then export the model to TFLite with:
https://github.com/google-ai-edge/ai-edge-torch-archive/blob/1791dec62f1d3f60e7fe52138640d380f58b072d/ai_edge_torch/generative/examples/test_models/toy_model_with_kv_cache.py#L133-L139
https://github.com/google-ai-edge/ai-edge-torch/blob/853301630f2b2455bd2e2f73d8a47e1a1534c91c/ai_edge_torch/generative/examples/test_models/toy_model_with_kv_cache.py#L133-L139

Please note that using the `prefill` and `decode` method conventions are required for easy integration into the Mediapipe LLM Inference API.
<br/>
Expand All @@ -78,7 +78,7 @@ The user needs to implement the entire LLM Pipeline themselves, and call TFLite

This approach provides users with the most control. For example, they can implement streaming, get more control over system memory or implement advanced features such as constrained grammar decoding, speculative decoding etc.

A very simple text generation pipeline based on a decoder-only-transformer is provided [here](https://github.com/google-ai-edge/ai-edge-torch-archive/blob/main/ai_edge_torch/generative/examples/c%2B%2B/text_generator_main.cc) for reference. Note that this example serves as a starting point, and users are expected to implement their own pipelines based on their model's specific requirements.
A very simple text generation pipeline based on a decoder-only-transformer is provided [here](https://github.com/google-ai-edge/ai-edge-torch/blob/main/ai_edge_torch/generative/examples/c%2B%2B/text_generator_main.cc) for reference. Note that this example serves as a starting point, and users are expected to implement their own pipelines based on their model's specific requirements.

#### Use MediaPipe LLM Inference API

Expand All @@ -105,7 +105,7 @@ model-explorer 'gemma-2b.tflite'

<img width="890" alt="Gemma-2b visualization demo" src="screenshots/gemma-tflite.png">

For an end-to-end example showing how to author, convert, quantize and execute, please refer to the steps [here](https://github.com/google-ai-edge/ai-edge-torch-archive/blob/main/ai_edge_torch/generative/examples/README.md)
For an end-to-end example showing how to author, convert, quantize and execute, please refer to the steps [here](https://github.com/google-ai-edge/ai-edge-torch/blob/main/ai_edge_torch/generative/examples/README.md)
<br/>

## What to expect
Expand Down
10 changes: 5 additions & 5 deletions ai_edge_torch/generative/examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,10 @@ For each of the example models, we have a model definition file (e.g. tiny_llama
Here we use `TinyLlama` as an example to walk you through the authoring steps.

#### Define model's structure
https://github.com/google-ai-edge/ai-edge-torch-archive/blob/e54638dd4a91ec09115f9ded1bd5540f3f1a4e68/ai_edge_torch/generative/examples/tiny_llama/tiny_llama.py#L43-L74
https://github.com/google-ai-edge/ai-edge-torch/blob/853301630f2b2455bd2e2f73d8a47e1a1534c91c/ai_edge_torch/generative/examples/tiny_llama/tiny_llama.py#L46-L77

#### Define model's forward function
https://github.com/google-ai-edge/ai-edge-torch-archive/blob/e54638dd4a91ec09115f9ded1bd5540f3f1a4e68/ai_edge_torch/generative/examples/tiny_llama/tiny_llama.py#L79-L101
https://github.com/google-ai-edge/ai-edge-torch/blob/853301630f2b2455bd2e2f73d8a47e1a1534c91c/ai_edge_torch/generative/examples/tiny_llama/tiny_llama.py#L79-L104

Now, you will have an `nn.Module` named `TinyLlama`, the next step is to restore the weights from orginal checkpoint into the new model.

Expand All @@ -37,12 +37,12 @@ place to simplify the `state_dict` mapping process (`utilities/loader.py`).
The user needs to provide a layer name tempelate (TensorNames) for the source
model. This tempelate is then used to create an updated `state_dict` that works
with the mapped model. The tensor map includes the following fields:
https://github.com/google-ai-edge/ai-edge-torch-archive/blob/3b753d80fdf00872baac523dc727b87b3dc271e7/ai_edge_torch/generative/utilities/loader.py#L120-L134
https://github.com/google-ai-edge/ai-edge-torch/blob/853301630f2b2455bd2e2f73d8a47e1a1534c91c/ai_edge_torch/generative/utilities/loader.py#L94-L109

The fields that have a default value of `None` are optional and should only be
populated if they are relevant to the model architecture. For `TinyLlama`, we
will define the following `TENSOR_NAMES`:
https://github.com/google-ai-edge/ai-edge-torch-archive/blob/e54638dd4a91ec09115f9ded1bd5540f3f1a4e68/ai_edge_torch/generative/examples/tiny_llama/tiny_llama.py#L27-L40
https://github.com/google-ai-edge/ai-edge-torch/blob/853301630f2b2455bd2e2f73d8a47e1a1534c91c/ai_edge_torch/generative/examples/tiny_llama/tiny_llama.py#L30-L43

With the `TensorNames` defined, a user can simply use the loading utils to load
an instance of the mapped model. For instance:
Expand All @@ -59,7 +59,7 @@ using a few input samples before proceeding to the conversion step.

### Model conversion
In this step, we use the `ai_edge_torch`'s standard multi-signature conversion API to convert PyTorch `nn.Module` to a single TFLite flatbuffer for on-device execution. For example, in `tiny_llama/convert_to_tflite.py`, we use this python code to convert the `TinyLLama` model to a multi-signature TFLite model:
https://github.com/google-ai-edge/ai-edge-torch-archive/blob/3b753d80fdf00872baac523dc727b87b3dc271e7/ai_edge_torch/generative/examples/tiny_llama/convert_to_tflite.py#L22-L53
https://github.com/google-ai-edge/ai-edge-torch/blob/853301630f2b2455bd2e2f73d8a47e1a1534c91c/ai_edge_torch/generative/examples/tiny_llama/convert_to_tflite.py#L26-L61

Once converted, you will get a `.tflite` model which will be ready for on-device execution. Note that the `.tflite` model generated uses static shapes. Inside the generated `.tflite` model, there will be two signatures defined (two entrypoints to the model):
1) `prefill`: taking 2 tensor inputs `prefill_tokens`, `prefill_input_pos`. With shape `(BATCH_SIZE, PREFILL_SEQ_LEN)` and `(PREFILL_SEQ_LEN)`.
Expand Down
2 changes: 1 addition & 1 deletion ai_edge_torch/generative/layers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,4 @@ Currently, the library provides the following configuration class for you to cus

## High-Level function boundary for performance
We introduce High-Level Function Boundary (HLFB) as a way of annotating performance-critical pieces of the model (e.g. `scaled_dot_product_attention`, or `KVCache`). HLFB allows the converter to lower the annotated blocks to performant TFLite custom ops. Following is an example of applying HLFB to `SDPA`:
https://github.com/google-ai-edge/ai-edge-torch-archive/blob/3b753d80fdf00872baac523dc727b87b3dc271e7/ai_edge_torch/generative/layers/attention.py#L74-L122
https://github.com/google-ai-edge/ai-edge-torch/blob/853301630f2b2455bd2e2f73d8a47e1a1534c91c/ai_edge_torch/generative/layers/attention.py#L74-L122

0 comments on commit 62576a9

Please sign in to comment.