Name	Name	Last commit message	Last commit date
parent directory ..
images	images
README.md	README.md
diffusion_utils.py	diffusion_utils.py
diffusion_utils_img2img.py	diffusion_utils_img2img.py
export_ir.py	export_ir.py
export_model.sh	export_model.sh
prepare_model.py	prepare_model.py
quantization_modules.py	quantization_modules.py
requirements.txt	requirements.txt
run_executor.py	run_executor.py

Step-by-Step

This document describes the end-to-end workflow for Text-to-image generative AI models across the Neural Engine backend.

Supported Text-to-image Generative AI models:

The inference and accuracy of the above pretrained models are verified in the default configs.

Prerequisite

Prepare Python Environment

Create a python environment, optionally with autoconf for jemalloc support.

conda create -n <env name> python=3.8 [autoconf]
conda activate <env name>

Note: Make sure pip <=23.2.2

Check that gcc version is higher than 9.0.

gcc -v

Install Intel® Extension for Transformers, please refer to installation.

# Install from pypi
pip install intel-extension-for-transformers

# Or, install from source code
cd <intel_extension_for_transformers_folder>
pip install -v .

Install required dependencies for this example

cd <intel_extension_for_transformers_folder>/examples/huggingface/pytorch/text-to-image/deployment/stable_diffusion

pip install -r requirements.txt
pip install transformers==4.28.1

Environment Variables (Optional)

# Preload libjemalloc.so may improve the performance when inference under multi instance.
conda install jemalloc==5.2.1 -c conda-forge -y
export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libjemalloc.so

# Using weight sharing can save memory and may improve the performance when multi instances.
export WEIGHT_SHARING=1
export INST_NUM=<inst num>

Note: This step is optional.

End-to-End Workflow

1. Prepare Models

The stable diffusion mainly includes three sub models:

Text Encoder
Unet
Vae Decoder.

Here we take the CompVis/stable-diffusion-v1-4 as an example.

1.1 Download Models

Export FP32 ONNX models from the hugginface diffusers module, command as follows:

python prepare_model.py --input_model=CompVis/stable-diffusion-v1-4 --output_path=./model

By setting --bf16 to export FP32 and BF16 models.

python prepare_model.py --input_model=CompVis/stable-diffusion-v1-4 --output_path=./model --bf16

For INT8 quantized mode, we only support runwayml/stable-diffusion-v1-5 for now. You need to get a quantized INT8 model first through QAT, Please refer the link. Then by setting --qat_int8 to export INT8 models, to export INT8 model.

python prepare_model.py --input_model=runwayml/stable-diffusion-v1-5 --output_path=./model --qat_int8

1.2 Compile Models

Export three FP32 onnx sub models of the stable diffusion to Nerual Engine IR.

# running the follow bash comand to get all IR.
bash export_model.sh --input_model=model --precision=fp32

Export three BF16 onnx sub models of the stable diffusion to Nerual Engine IR.

# running the follow bash comand to get all IR.
bash export_model.sh --input_model=model --precision=bf16

Export mixed FP32 & dynamic quantized Int8 IR.

bash export_model.sh --input_model=model --precision=fp32 --cast_type=dynamic_int8

Export mixed BF16 & QAT quantized Int8 IR.

bash export_model.sh --input_model=model --precision=qat_int8

2. Performance

Python API command as follows:

# FP32 IR
python run_executor.py --ir_path=./fp32_ir --mode=latency --input_model=CompVis/stable-diffusion-v1-4

# Mixed FP32 & dynamic quantized Int8 IR.
python run_executor.py --ir_path=./fp32_dynamic_int8_ir --mode=latency --input_model=CompVis/stable-diffusion-v1-4

# BF16 IR
python run_executor.py --ir_path=./bf16_ir --mode=latency --input_model=CompVis/stable-diffusion-v1-4

# QAT INT8 IR
python run_executor.py --ir_path=./qat_int8_ir --mode=latency --input_model=runwayml/stable-diffusion-v1-5

3. Accuracy

Frechet Inception Distance(FID) metric is used to evaluate the accuracy. This case we check the FID scores between the pytorch image and engine image.

By setting --accuracy to check FID socre. Python API command as follows:

# FP32 IR
python run_executor.py --ir_path=./fp32_ir --mode=accuracy --input_model=CompVis/stable-diffusion-v1-4

# Mixed FP32 & dynamic quantized Int8 IR
python run_executor.py --ir_path=./fp32_dynamic_int8_ir --mode=accuracy --input_model=CompVis/stable-diffusion-v1-4

# BF16 IR
python run_executor.py --ir_path=./bf16_ir --mode=accuracy --input_model=CompVis/stable-diffusion-v1-4

# QAT INT8 IR
python run_executor.py --ir_path=./qat_int8_ir --mode=accuracy --input_model=runwayml/stable-diffusion-v1-5

4. Try Text to Image

Try using one sentence to create a picture!

# Running FP32 models or BF16 models, just import differnt IR.
# FP32 models
python run_executor.py --ir_path=./fp32_ir --input_model=CompVis/stable-diffusion-v1-4

# BF16 models
python run_executor.py --ir_path=./bf16_ir --input_model=CompVis/stable-diffusion-v1-4

Note:

The default pretrained model is "CompVis/stable-diffusion-v1-4".

The default prompt is "a photo of an astronaut riding a horse on mars" and the default output name is "astronaut_rides_horse.png".

The ir directory should include three IR for text_encoder, unet and vae_decoder.

5. Validated Result

5.1 Latency (s)

Input: a photo of an astronaut riding a horse on mars

Batch Size: 1

Model	FP32	BF16
CompVis/stable-diffusion-v1-4	10.33 (s)	3.02 (s)

Note: Performance results test on 06/09/2023 with Intel(R) Xeon(R) Platinum 8480+. Performance varies by use, configuration and other factors. See platform configuration for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks

5.2 Platform Configuration

Manufacturer	Quanta Cloud Technology Inc
Product Name	QuantaGrid D54Q-2U
OS	CentOS Stream 8
Kernel	5.16.0-rc1-intel-next-00543-g5867b0a2a125
Microcode	0x2b000111
IRQ Balance	Eabled
CPU Model	Intel(R) Xeon(R) Platinum 8480+
Base Frequency	2.0GHz
Maximum Frequency	3.8GHz
CPU(s)	224
Thread(s) per Core	2
Core(s) per Socket	56
Socket(s)	2
NUMA Node(s)	2
Turbo	Enabled
FrequencyGoverner	Performance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stable_diffusion

stable_diffusion

README.md

Step-by-Step

Prerequisite

Prepare Python Environment

Environment Variables (Optional)

End-to-End Workflow

1. Prepare Models

1.1 Download Models

1.2 Compile Models

2. Performance

3. Accuracy

4. Try Text to Image

5. Validated Result

5.1 Latency (s)

5.2 Platform Configuration

Files

stable_diffusion

Directory actions

More options

Directory actions

More options

Latest commit

History

stable_diffusion

Folders and files

parent directory

README.md

Step-by-Step

Prerequisite

Prepare Python Environment

Environment Variables (Optional)

End-to-End Workflow

1. Prepare Models

1.1 Download Models

1.2 Compile Models

2. Performance

3. Accuracy

4. Try Text to Image

5. Validated Result

5.1 Latency (s)

5.2 Platform Configuration