This document describes the end-to-end workflow for Text-to-image generative AI models across the Neural Engine backend.
Supported Text-to-image Generative AI models:
The inference and accuracy of the above pretrained models are verified in the default configs.
Create a python environment, optionally with autoconf for jemalloc support.
conda create -n <env name> python=3.8 [autoconf]
conda activate <env name>
Note: Make sure pip <=23.2.2
Check that gcc
version is higher than 9.0.
gcc -v
Install Intel® Extension for Transformers, please refer to installation.
# Install from pypi
pip install intel-extension-for-transformers
# Or, install from source code
cd <intel_extension_for_transformers_folder>
pip install -v .
Install required dependencies for this example
cd <intel_extension_for_transformers_folder>/examples/huggingface/pytorch/text-to-image/deployment/stable_diffusion
pip install -r requirements.txt
pip install transformers==4.28.1
# Preload libjemalloc.so may improve the performance when inference under multi instance.
conda install jemalloc==5.2.1 -c conda-forge -y
export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libjemalloc.so
# Using weight sharing can save memory and may improve the performance when multi instances.
export WEIGHT_SHARING=1
export INST_NUM=<inst num>
Note: This step is optional.
The stable diffusion mainly includes three sub models:
- Text Encoder
- Unet
- Vae Decoder.
Here we take the CompVis/stable-diffusion-v1-4 as an example.
Export FP32 ONNX models from the hugginface diffusers module, command as follows:
python prepare_model.py --input_model=CompVis/stable-diffusion-v1-4 --output_path=./model
By setting --bf16 to export FP32 and BF16 models.
python prepare_model.py --input_model=CompVis/stable-diffusion-v1-4 --output_path=./model --bf16
For INT8 quantized mode, we only support runwayml/stable-diffusion-v1-5 for now. You need to get a quantized INT8 model first through QAT, Please refer the link. Then by setting --qat_int8 to export INT8 models, to export INT8 model.
python prepare_model.py --input_model=runwayml/stable-diffusion-v1-5 --output_path=./model --qat_int8
Export three FP32 onnx sub models of the stable diffusion to Nerual Engine IR.
# running the follow bash comand to get all IR.
bash export_model.sh --input_model=model --precision=fp32
Export three BF16 onnx sub models of the stable diffusion to Nerual Engine IR.
# running the follow bash comand to get all IR.
bash export_model.sh --input_model=model --precision=bf16
Export mixed FP32 & dynamic quantized Int8 IR.
bash export_model.sh --input_model=model --precision=fp32 --cast_type=dynamic_int8
Export mixed BF16 & QAT quantized Int8 IR.
bash export_model.sh --input_model=model --precision=qat_int8
Python API command as follows:
# FP32 IR
python run_executor.py --ir_path=./fp32_ir --mode=latency --input_model=CompVis/stable-diffusion-v1-4
# Mixed FP32 & dynamic quantized Int8 IR.
python run_executor.py --ir_path=./fp32_dynamic_int8_ir --mode=latency --input_model=CompVis/stable-diffusion-v1-4
# BF16 IR
python run_executor.py --ir_path=./bf16_ir --mode=latency --input_model=CompVis/stable-diffusion-v1-4
# QAT INT8 IR
python run_executor.py --ir_path=./qat_int8_ir --mode=latency --input_model=runwayml/stable-diffusion-v1-5
Frechet Inception Distance(FID) metric is used to evaluate the accuracy. This case we check the FID scores between the pytorch image and engine image.
By setting --accuracy to check FID socre. Python API command as follows:
# FP32 IR
python run_executor.py --ir_path=./fp32_ir --mode=accuracy --input_model=CompVis/stable-diffusion-v1-4
# Mixed FP32 & dynamic quantized Int8 IR
python run_executor.py --ir_path=./fp32_dynamic_int8_ir --mode=accuracy --input_model=CompVis/stable-diffusion-v1-4
# BF16 IR
python run_executor.py --ir_path=./bf16_ir --mode=accuracy --input_model=CompVis/stable-diffusion-v1-4
# QAT INT8 IR
python run_executor.py --ir_path=./qat_int8_ir --mode=accuracy --input_model=runwayml/stable-diffusion-v1-5
Try using one sentence to create a picture!
# Running FP32 models or BF16 models, just import differnt IR.
# FP32 models
python run_executor.py --ir_path=./fp32_ir --input_model=CompVis/stable-diffusion-v1-4
# BF16 models
python run_executor.py --ir_path=./bf16_ir --input_model=CompVis/stable-diffusion-v1-4
Note:
- The default pretrained model is "CompVis/stable-diffusion-v1-4".
- The default prompt is "a photo of an astronaut riding a horse on mars" and the default output name is "astronaut_rides_horse.png".
- The ir directory should include three IR for text_encoder, unet and vae_decoder.
Input: a photo of an astronaut riding a horse on mars
Batch Size: 1
Model | FP32 | BF16 |
---|---|---|
CompVis/stable-diffusion-v1-4 | 10.33 (s) | 3.02 (s) |
Note: Performance results test on 06/09/2023 with Intel(R) Xeon(R) Platinum 8480+. Performance varies by use, configuration and other factors. See platform configuration for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks
Manufacturer | Quanta Cloud Technology Inc |
Product Name | QuantaGrid D54Q-2U |
OS | CentOS Stream 8 |
Kernel | 5.16.0-rc1-intel-next-00543-g5867b0a2a125 |
Microcode | 0x2b000111 |
IRQ Balance | Eabled |
CPU Model | Intel(R) Xeon(R) Platinum 8480+ |
Base Frequency | 2.0GHz |
Maximum Frequency | 3.8GHz |
CPU(s) | 224 |
Thread(s) per Core | 2 |
Core(s) per Socket | 56 |
Socket(s) | 2 |
NUMA Node(s) | 2 |
Turbo | Enabled |
FrequencyGoverner | Performance |