Intel® Neural Compressor validated examples with multiple compression techniques. The typical examples link can be found in example tables, and the performance/accuracy results is available here.
-
Validated Quantization Examples
1.1. TensorFlow Models with Intel TensorFlow 2.12.0
1.2. TensorFlow Models with Intel® Extension for TensorFlow* 1.2.0
1.3. PyTorch Models with Torch 2.0.1+cpu in PTQ Mode
1.4. PyTorch Models with Torch 2.0.1+cpu in QAT Mode
1.5. PyTorch Models with Intel® Extension for PyTorch* 2.0.1+cpu
-
Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime
System summary: Test by Intel on 06/19/2023. 1-node, 1x Intel(R) Xeon(R) Platinum 8480+ @3.8GHz, 56 cores/socket, HT On, Turbo On, Total Memory 256GB (16x16GB DDR5 4800 MT/s [4800 MT/s]), BIOS 3A14.TEL2P1, microcode 0x2b0001b0,
CentOS Stream 8, gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), DL Models, Frameworks: TensorFlow/ONNXRT/PyTorch, Datatype: FP32/INT8/BF16.
Using 1 socket, 4 cores/instance, 14 instances and batch size 1 to benchmark most of the model.
Using 1 socket, 56 cores/instance, 1 instance and batch size 1 for some large models performance measurement.
Performance varies by use, configuration and other factors.
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks
Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
ResNet50 v1.0 | pb | 74.12% | 74.27% | -0.21% | 2721.21 | 638.25 | 4.26x |
ResNet50 v1.5 | pb | 76.23% | 76.46% | -0.31% | 2123.70 | 552.94 | 3.84x |
ResNet101 | pb | 77.50% | 76.45% | 1.37% | 1477.29 | 432.29 | 3.42x |
Inception V1 | pb | 70.44% | 69.74% | 1.01% | 3267.92 | 1266.03 | 2.58x |
Inception V2 | pb | 74.38% | 73.97% | 0.57% | 2399.76 | 1098.67 | 2.18x |
Inception V3 | pb | 76.71% | 76.75% | -0.05% | 1593.59 | 508.58 | 3.13x |
Inception V4 | pb | 80.18% | 80.27% | -0.11% | 1032.10 | 249.39 | 4.14x |
Inception ResNet V2 | pb | 80.34% | 80.40% | -0.07% | 427.28 | 185.60 | 2.30x |
MobileNet V1 | pb | 71.78% | 70.96% | 1.16% | 5503.87 | 1791.62 | 3.07x |
MobileNet V2 | pb | 72.52% | 71.76% | 1.07% | 3639.83 | 1864.72 | 1.95x |
VGG16 | pb | 72.64% | 70.89% | 2.47% | 1538.21 | 236.22 | 6.51x |
VGG19 | pb | 72.69% | 71.01% | 2.37% | 1368.21 | 196.94 | 6.95x |
ResNetV2 50 | pb | 70.44% | 69.64% | 1.15% | 1105.19 | 657.45 | 1.68x |
ResNetV2 101 | pb | 72.65% | 71.87% | 1.08% | 716.49 | 369.95 | 1.94x |
ResNetV2 152 | pb | 73.07% | 72.37% | 0.97% | 508.60 | 269.31 | 1.89x |
Densenet 121 | pb | 73.59% | 72.89% | 0.97% | 617.94 | 498.43 | 1.24x |
Densenet 161 | pb | 76.35% | 76.29% | 0.08% | 372.04 | 242.05 | 1.54x |
Densenet 169 | pb | 74.34% | 74.65% | -0.41% | 496.41 | 411.94 | 1.21x |
EfficientNet B0 | ckpt | 76.14% | 76.76% | -0.81% | 748.42 | 709.43 | 1.05x |
SSD ResNet50 V1 | pb | 37.88% | 38.00% | -0.31% | 134.81 | 31.06 | 4.34x |
SSD MobileNet V1 | pb | 22.98% | 23.13% | -0.64% | 1273.79 | 671.84 | 1.90x |
SSD ResNet50 v1 | ckpt | 37.89% | 38.00% | -0.30% | 136.53 | 27.88 | 4.90x |
SSD MobileNet v1 | ckpt | 22.96% | 23.13% | -0.72% | 1235.03 | 477.83 | 2.58x |
SSD ResNet34 | pb | 21.70% | 22.09% | -1.76% | 179.37 | 13.96 | 12.85x |
Faster R-CNN Inception ResNet V2 | pb | 37.47% | 38.31% | -2.18% | 5.39 | 3.01 | 1.79x |
Faster R-CNN Inception ResNet V2 | SavedModel | 37.79% | 38.31% | -1.34% | 5.35 | 1.89 | 2.83x |
Faster R-CNN ResNet101 | pb | 30.32% | 30.39% | -0.23% | 156.71 | 23.50 | 6.67x |
Faster R-CNN ResNet101 | SavedModel | 30.33% | 30.39% | -0.20% | 152.21 | 18.50 | 8.23x |
Faster R-CNN ResNet50 | pb | 26.64% | 26.59% | 0.21% | 173.07 | 28.83 | 6.00x |
YOLOv3 | pb | 82.13% | 82.35% | -0.28% | 211.67 | 87.89 | 2.41x |
BERT large SQuAD | pb | 92.47 | 92.99 | -0.56% | 46.87 | 16.65 | 2.82x |
BERT large SQuAD (ONNX Model Zoo) | pb | 92.42 | 92.98 | -0.61% | 42.35 | 17.03 | 2.49x |
BERT base MRPC | ckpt | 86.03% | 86.52% | -0.57% | 424.94 | 174.10 | 2.44x |
Transformer LT | pb | 25.77 | 25.86 | -0.34% | 42.11 | 22.11 | 1.90x |
Transformer lt MLPerf | pb | 27.10 | 27.17 | -0.25% | 9.82 | 4.29 | 2.29x |
Wide Deep large DS | pb | 77.75% | 77.67% | 0.10% | 55612.97 | 43479.53 | 1.28x |
Model | Example | Accuracy | Performance 1s56c1ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
Mask R-CNN Inception V2 | pb | 28.60% | 28.73% | -0.44% | 39.35 | 23.84 | 1.65x |
Mask R-CNN Inception V2 | ckpt | 28.60% | 28.73% | -0.44% | 40.21 | 23.90 | 1.68x |
GPT2 | pb | 66.89% | 67.57% | -1.00% | 9.67 | 7.22 | 1.34x |
Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
ResNet50 v1.0 | pb | 74.16% | 74.27% | -0.15% | 2716.04 | 569.18 | 4.77x |
ResNet50 v1.5 | pb | 76.27% | 76.46% | -0.26% | 2683.90 | 476.14 | 5.64x |
Inception V1 | pb | 69.59% | 69.74% | -0.22% | 2349.32 | 1035.63 | 2.27x |
Inception V2 | pb | 73.75% | 73.97% | -0.30% | 2399.93 | 930.62 | 2.58x |
Inception V4 | pb | 80.03% | 80.27% | -0.31% | 763.85 | 262.22 | 2.91x |
MobileNet V1 | pb | 70.61% | 70.96% | -0.48% | 4003.12 | 1677.22 | 2.39x |
MobileNet V2 | pb | 71.15% | 71.76% | -0.85% | 2766.36 | 2643.21 | 1.05x |
VGG16 | pb | 70.84% | 70.89% | -0.07% | 1495.88 | 238.52 | 6.27x |
VGG19 | pb | 71.03% | 71.01% | 0.03% | 1372.91 | 199.52 | 6.88x |
ResNetV2 50 | pb | 69.43% | 69.64% | -0.30% | 1457.53 | 630.41 | 2.31x |
ResNetV2 101 | pb | 71.84% | 71.87% | -0.05% | 842.53 | 338.44 | 2.49x |
ResNetV2 152 | pb | 72.26% | 72.37% | -0.15% | 645.86 | 231.63 | 2.79x |
EfficientNet B0 | ckpt | 76.76% | 76.76% | 0.00% | 938.82 | 707.22 | 1.33x |
EfficientNet V2 B0 | SavedModel | 78.63% | 78.62% | 0.01% | 1533.95 | 1258.45 | 1.22x |
SSD MobileNet V1 | pb | 22.90% | 23.13% | -0.99% | 981.29 | 647.07 | 1.52x |
SSD MobileNet v1 | ckpt | 22.92% | 23.13% | -0.89% | 850.31 | 444.12 | 1.91x |
Faster R-CNN Inception ResNet V2 | pb | 38.02% | 38.31% | -0.74% | 7.08 | 2.93 | 2.42x |
Faster R-CNN Inception ResNet V2 | SavedModel | 38.18% | 38.31% | -0.32% | 6.61 | 2.79 | 2.37x |
YOLOv3 | pb | 80.27% | 82.35% | -2.53% | 543.50 | 80.59 | 6.74x |
BERT large SQuAD | pb | 92.67 | 92.97 | -0.33% | 72.27 | 18.39 | 3.93x |
BERT base MRPC | ckpt | 86.28% | 86.28% | 0.00% | 947.96 | 233.07 | 4.07x |
DistilBERT base | pb | 90.48% | 91.06% | -0.64% | 788.64 | 462.35 | 1.71x |
Transformer LT | pb | 25.73 | 25.86 | -0.47% | 42.07 | 29.21 | 1.44x |
Transformer lt MLPerf | pb | 27.13 | 27.17 | -0.14% | 10.43 | 4.84 | 2.15x |
Wide Deep large DS | pb | 77.66% | 77.67% | -0.02% | 51958.00 | 39974.56 | 1.30x |
Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
ResNet18 | static | 69.61% | 69.76% | -0.22% | 1631.83 | 662.13 | 2.46x |
ResNet50 | static | 75.92% | 76.15% | -0.30% | 1162.83 | 330.92 | 3.51x |
Inception V3 | static | 69.47% | 69.52% | -0.07% | 968.67 | 334.53 | 2.90x |
ResNeSt50 | static | 80.80% | 81.04% | -0.30% | 394.38 | 40.76 | 9.67x |
ResNeXt101_32x8d | static | 78.94% | 79.31% | -0.46% | 558.59 | 108.42 | 5.15x |
Efficientnet_b0 | static | 76.89% | 77.67% | -1.01% | 703.73 | 656.12 | 1.07x |
Efficientnet_b3 | static | 77.82% | 78.54% | -0.93% | 510.58 | 391.05 | 1.31x |
Efficientnet_b7 | static | 73.55% | 73.92% | -0.50% | 233.29 | 150.09 | 1.55x |
Peleenet | static | 71.85% | 72.10% | -0.35% | 857.72 | 585.60 | 1.46x |
YOLO V3 | static | 55.09% | 54.93% | 0.31% | 160.97 | 60.60 | 2.66x |
SSD ResNet34 | static | 19.52 | 19.63 | -0.58% | 141.67 | 11.75 | 12.05x |
Roberta base MRPC | static | 92.69% | 93.59% | -0.96% | 407.78 | 174.53 | 2.34x |
CamemBERT base MRPC | static | 88.93% | 89.28% | -0.39% | 402.78 | 173.56 | 2.32x |
DistilBERT base MRPC | dynamic | 90.20% | 90.27% | -0.07% | 748.28 | 343.54 | 2.18x |
DistilBERT base MRPC | static | 89.53% | 90.27% | -0.82% | 804.57 | 343.24 | 2.34x |
ALBERT base MRPC | static | 92.63% | 92.63% | 0.00% | 352.44 | 162.26 | 2.17x |
91.60% | 92.25% | -0.71% | 302.57 | 183.57 | 1.65x | ||
Xlm Roberta MRPC | static | 88.36% | 88.62% | -0.29% | 404.61 | 173.71 | 2.33x |
Xlm Roberta MRPC | dynamic | 88.24% | 88.24% | 0.00% | 382.72 | 174.63 | 2.19x |
BERT base MRPC | static | 89.63% | 90.42% | -0.87% | 407.58 | 173.66 | 2.35x |
BERT base COLA | static | 54.51% | 53.39% | 2.10% | 414.72 | 173.86 | 2.39x |
BERT base STSB | static | 87.55% | 88.05% | -0.57% | 413.76 | 173.34 | 2.39x |
BERT base SST-2 | static | 91.51% | 92.32% | -0.87% | 410.87 | 173.63 | 2.37x |
BERT large COLA | static | 62.84% | 63.35% | -0.80% | 138.89 | 51.65 | 2.69x |
BERT base RTE | static | 72.56% | 72.56% | 0.00% | 385.23 | 173.32 | 2.22x |
BERT large MRPC | static | 90.22% | 90.38% | -0.17% | 141.61 | 51.67 | 2.74x |
BERT large QNLI | static | 90.87% | 91.54% | -0.74% | 407.84 | 173.52 | 2.35x |
BERT large RTE | static | 73.29% | 74.01% | -0.98% | 141.64 | 51.33 | 2.76x |
BERT large RTE | dynamic | 71.48% | 74.01% | -3.41% | 126.49 | 51.34 | 2.46x |
BERT large SQuAD | static | 92.27 | 93.16 | -0.95% | 37.61 | 16.57 | 2.27x |
GPT J WikiText | static | 3.36 | 2.34 | NA | 0.87 | 0.28 | 3.15x |
Reformer Crime and Punishment | static | 1.88 | 1.87 | 0.23% | 449.73 | 364.78 | 1.23x |
lvwerra/pegasus-samsum | static | 42.50 | 42.67 | -0.39% | 101.32 | 37.80 | 2.68x |
Model | Example | Accuracy | Performance 1s56c1ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
openai/whisper-large | dynamic | 97.07% | 96.96% | 0.12% | 0.60 | 0.47 | 1.28x |
abeja/gpt-neox-japanese-2.7b | static | 4.30 | 3.52 | 22.06% | 1.03 | 0.56 | 1.84x |
Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
ResNet18 | static | 69.74% | 69.76% | -0.03% | 1723.70 | 654.17 | 2.63x |
ResNet50 | static | 76.05% | 76.15% | -0.12% | 1141.22 | 306.04 | 3.73x |
ResNeXt101_32x8d | static | 79.28% | 79.31% | -0.04% | 558.92 | 106.82 | 5.23x |
MobileNet V2 | static | 69.73% | 71.84% | -2.93% | 1379.34 | 729.22 | 1.89x |
BERT base MRPC | static | 89.70% | 90.40% | -0.77% | 389.77 | 173.54 | 2.25x |
Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
ResNet18 | static | 75.98% | 76.15% | -0.22% | 1980.94 | 672.93 | 2.94x |
ResNet50 | static | 69.56% | 69.76% | -0.29% | 5032.32 | 1500.16 | 3.35x |
ResNeXt101_32x16d_wsl | static | 84.04% | 84.17% | -0.15% | 533.60 | 78.84 | 6.77x |
SSD ResNet34 | static | 19.93% | 20.00% | -0.38% | 84.02 | 15.68 | 5.36x |
bert-large-uncased-whole-word-masking-finetuned-squad | static | 92.93 | 93.16 | -0.25% | 161.44 | 22.19 | 7.27x |
distilbert-base-uncased-distilled-squad | static | 86.09 | 86.84 | -0.86% | 556.19 | 149.79 | 3.71x |
Model | Example | Accuracy | Performance 1s56c1ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
EleutherAI/gpt-j-6B | static | 78.60% | 79.20% | -0.76% | 4.87 | 1.55 | 3.14x |
Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
ResNet50 V1.5 | qlinearops | 72.16% | 72.29% | -0.19% | 1412.05 | 710.02 | 1.99x |
ResNet50 V1.5 | qdq | 72.14% | 72.29% | -0.22% | 1564.39 | 712.38 | 2.20x |
ResNet50 V1.5 MLPerf | qlinearops | 76.11% | 76.46% | -0.46% | 1377.47 | 719.66 | 1.91x |
ResNet50 V1.5 MLPerf | qdq | 76.13% | 76.46% | -0.44% | 1446.69 | 703.40 | 2.06x |
ResNet50 V1.5 (ONNX Model Zoo) | qlinearops | 74.82% | 74.99% | -0.22% | 1579.31 | 747.73 | 2.11x |
ResNet50 V1.5 (ONNX Model Zoo) | qdq | 74.82% | 74.99% | -0.23% | 1508.21 | 749.43 | 2.01x |
MobileNet V2 | qlinearops | 65.49% | 66.89% | -2.09% | 6950.77 | 4214.56 | 1.65x |
MobileNet V2 | qdq | 65.49% | 66.89% | -2.10% | 6881.60 | 4192.78 | 1.64x |
MobileNet V2 (ONNX Model Zoo) | qlinearops | 68.38% | 69.48% | -1.59% | 6563.24 | 3804.18 | 1.73x |
MobileNet V2 (ONNX Model Zoo) | qdq | 68.38% | 69.48% | -1.59% | 6631.12 | 3922.70 | 1.69x |
VGG16 | qlinearops | 66.56% | 66.69% | -0.19% | 423.44 | 158.01 | 2.68x |
VGG16 | qdq | 66.59% | 66.69% | -0.15% | 571.02 | 161.69 | 3.53x |
VGG16 (ONNX Model Zoo) | qlinearops | 72.33% | 72.40% | -0.09% | 598.92 | 163.53 | 3.66x |
VGG16 (ONNX Model Zoo) | qdq | 72.33% | 72.40% | -0.09% | 594.66 | 164.39 | 3.62x |
MobileNet V3 MLPerf | qlinearops | 75.56% | 75.74% | -0.24% | 5473.90 | 2567.96 | 2.13x |
MobileNet V3 MLPerf | qdq | 75.56% | 75.74% | -0.24% | 5455.36 | 2563.80 | 2.13x |
ShuffleNet V2 (ONNX Model Zoo) | qlinearops | 66.09% | 66.36% | -0.41% | 6818.46 | 3839.67 | 1.78x |
ShuffleNet V2 (ONNX Model Zoo) | qdq | 66.09% | 66.36% | -0.41% | 5750.72 | 3861.83 | 1.49x |
GoogleNet (ONNX Model Zoo) | qlinearops | 67.71% | 67.79% | -0.12% | 1783.63 | 1095.06 | 1.63x |
GoogleNet (ONNX Model Zoo) | qdq | 67.73% | 67.79% | -0.09% | 1755.03 | 1071.04 | 1.64x |
SqueezeNet (ONNX Model Zoo) | qlinearops | 56.54% | 56.87% | -0.57% | 9918.09 | 5639.89 | 1.76x |
SqueezeNet (ONNX Model Zoo) | qdq | 56.54% | 56.87% | -0.57% | 9423.22 | 5501.30 | 1.71x |
CaffeNet (ONNX Model Zoo) | qlinearops | 56.21% | 56.30% | -0.16% | 3363.62 | 1015.06 | 3.31x |
CaffeNet (ONNX Model Zoo) | qdq | 56.25% | 56.30% | -0.09% | 3276.82 | 798.28 | 4.10x |
AlexNet (ONNX Model Zoo) | qlinearops | 54.73% | 54.79% | -0.10% | 2104.66 | 985.33 | 2.14x |
AlexNet (ONNX Model Zoo) | qdq | 54.71% | 54.79% | -0.14% | 2054.60 | 745.36 | 2.76x |
ZFNet (ONNX Model Zoo) | qlinearops | 55.84% | 55.96% | -0.21% | 864.73 | 456.41 | 1.89x |
ZFNet (ONNX Model Zoo) | qdq | 55.86% | 55.96% | -0.18% | 866.80 | 455.75 | 1.90x |
Inception V1 (ONNX Model Zoo) | qlinearops | 67.21% | 67.24% | -0.05% | 1802.03 | 1170.74 | 1.54x |
Inception V1 (ONNX Model Zoo) | qdq | 67.21% | 67.24% | -0.05% | 1813.29 | 1164.87 | 1.56x |
EfficientNet (ONNX Model Zoo) | qlinearops | 76.98% | 77.11% | -0.17% | 2615.12 | 1349.97 | 1.94x |
EfficientNet (ONNX Model Zoo) | qdq | 76.99% | 77.11% | -0.16% | 2343.94 | 1322.86 | 1.77x |
DenseNet (ONNX Model Zoo) | qlinearops | 60.53% | 60.96% | -0.70% | 630.80 | 499.98 | 1.26x |
SSD (ONNX Model Zoo) | qlinearops | 18.83% | 18.98% | -0.77% | 56.69 | 14.56 | 3.89x |
SSD (ONNX Model Zoo) | qdq | 18.62% | 18.98% | -1.89% | 57.54 | 14.55 | 3.95x |
SSD MobileNet V1 | qlinearops | 22.44% | 23.10% | -2.86% | 1288.14 | 878.69 | 1.47x |
SSD MobileNet V1 | qdq | 22.44% | 23.10% | -2.86% | 1173.88 | 851.00 | 1.38x |
SSD MobileNet V1 (ONNX Model Zoo) | qlinearops | 22.96% | 23.02% | -0.27% | 1114.65 | 825.47 | 1.35x |
SSD MobileNet V1 (ONNX Model Zoo) | qdq | 22.96% | 23.02% | -0.27% | 1056.30 | 792.66 | 1.33x |
SSD MobileNet V2 | qlinearops | 23.87% | 24.67% | -3.25% | 788.51 | 669.72 | 1.18x |
YOLOv3 (ONNX Model Zoo) | qlinearops | 27.01% | 28.73% | -5.99% | 140.21 | 110.43 | 1.27x |
YOLOv4 (ONNX Model Zoo) | qlinearops | 32.30% | 33.71% | -4.19% | 72.95 | 64.95 | 1.12x |
DUC (ONNX Model Zoo) | qlinearops | 81.63% | 81.92% | -0.36% | 9.12 | 4.96 | 1.84x |
Tiny YOLOv3 (ONNX Model Zoo) | qlinearops | 11.83% | 12.42% | -4.73% | 1163.39 | 993.96 | 1.17x |
Ultra Face (ONNX Model Zoo) | qlinearops | 83.23% | 83.65% | -0.49% | 8501.08 | 1922.19 | 4.42x |
Emotion FERPlus (ONNX Model Zoo) | qlinearops | 7.97% | 8.00% | -0.35% | 3552.60 | 3114.19 | 1.14x |
ArcFace (ONNX Model Zoo) | qlinearops | 99.80% | 99.80% | 0.00% | 558.78 | 246.87 | 2.26x |
BERT base MRPC | qlinearops | 85.54% | 86.03% | -0.57% | 399.04 | 226.03 | 1.77x |
BERT base MRPC | qdq | 85.54% | 86.03% | -0.57% | 392.26 | 223.21 | 1.76x |
BERT base MRPC | integerops | 85.29% | 86.03% | -0.85% | 474.99 | 222.71 | 2.13x |
DistilBERT base MRPC | qdq | 84.56% | 84.56% | 0.00% | 557.05 | 399.46 | 1.39x |
DistilBERT base MRPC | integerops | 85.54% | 84.56% | 1.16% | 963.92 | 399.36 | 2.41x |
Mobile bert MRPC | qdq | 85.54% | 86.28% | -0.85% | 529.98 | 394.46 | 1.34x |
Mobile bert MRPC | integerops | 85.54% | 86.28% | -0.85% | 603.66 | 398.15 | 1.52x |
Roberta base MRPC | integerops | 90.93% | 89.95% | 1.09% | 485.74 | 223.54 | 2.17x |
BERT SQuAD (ONNX Model Zoo) | integerops | 80.29 | 80.67 | -0.47% | 187.63 | 95.88 | 1.96x |
MobileBERT SQuAD MLPerf (ONNX Model Zoo) | integerops | 89.87 | 90.03 | -0.17% | 144.88 | 124.08 | 1.17x |
BiDAF (ONNX Model Zoo) | integerops | 65.93% | 66.08% | -0.23% | 2757.83 | 2279.38 | 1.21x |
GPT2 lm head WikiText (ONNX Model Zoo) | integerops | 31.98 | 29.00 | 10.31% | 15.35 | 9.73 | 1.58x |
BERT base cased MRPC (HuggingFace) | qlinearops | 90.21% | 90.42% | -0.23% | 357.89 | 211.81 | 1.69x |
BERT base uncased MRPC (HuggingFace) | integerops | 89.58% | 90.42% | -0.93% | 472.44 | 211.65 | 2.23x |
Roberta base MRPC (HuggingFace) | qlinearops | 91.00% | 91.38% | -0.41% | 365.03 | 214.66 | 1.70x |
Roberta base MRPC (HuggingFace) | integerops | 90.85% | 91.38% | -0.58% | 489.85 | 212.20 | 2.31x |
XLM Roberta base MRPC (HuggingFace) | qlinearops | 89.37% | 90.10% | -0.81% | 302.49 | 212.76 | 1.42x |
XLM Roberta base MRPC (HuggingFace) | integerops | 89.66% | 90.10% | -0.50% | 343.75 | 213.09 | 1.61x |
Camembert base MRPC (HuggingFace) | qlinearops | 89.28% | 89.28% | 0.00% | 270.01 | 215.48 | 1.25x |
Camembert base MRPC (HuggingFace) | integerops | 89.19% | 89.28% | -0.10% | 491.01 | 212.92 | 2.31x |
MiniLM L12 H384 uncased MRPC (HuggingFace) | qlinearops | 90.13% | 90.97% | -0.93% | 1051.67 | 583.85 | 1.80x |
MiniLM L12 H384 uncased MRPC (HuggingFace) | integerops | 91.07% | 90.97% | 0.10% | 1076.27 | 589.80 | 1.82x |
DistilBERT base uncased SST-2 (HuggingFace) | qlinearops | 90.71% | 91.06% | -0.38% | 896.69 | 396.85 | 2.26x |
DistilBERT base uncased SST-2 (HuggingFace) | integerops | 90.25% | 91.06% | -0.88% | 753.88 | 396.59 | 1.90x |
Albert base v2 SST-2 (HuggingFace) | qlinearops | 91.40% | 92.32% | -0.99% | 274.17 | 210.87 | 1.30x |
Albert base v2 SST-2 (HuggingFace) | integerops | 91.86% | 92.32% | -0.50% | 271.85 | 211.18 | 1.29x |
MiniLM L6 H384 uncased SST-2 (HuggingFace) | qlinearops | 89.45% | 90.14% | -0.76% | 2022.40 | 1124.12 | 1.80x |
MiniLM L6 H384 uncased SST-2 (HuggingFace) | integerops | 89.91% | 90.14% | -0.26% | 2010.50 | 1127.41 | 1.78x |
MiniLM L6 H384 uncased SST-2 (HuggingFace) | qlinearops | 87.70% | 88.29% | -0.67% | 401.24 | 211.92 | 1.89x |
MiniLM L6 H384 uncased SST-2 (HuggingFace) | integerops | 88.19% | 88.29% | -0.12% | 494.84 | 212.01 | 2.33x |
Electra small discriminator MRPC (HuggingFace) | qlinearops | 89.57% | 89.83% | -0.29% | 1804.17 | 1154.99 | 1.56x |
Electra small discriminator MRPC (HuggingFace) | integerops | 89.27% | 89.83% | -0.63% | 1961.57 | 1158.86 | 1.69x |
BERT mini MRPC (HuggingFace) | qlinearops | 86.70% | 86.52% | 0.21% | 4986.29 | 3444.92 | 1.45x |
BERT mini MRPC (HuggingFace) | integerops | 86.16% | 86.52% | -0.41% | 5603.86 | 3320.38 | 1.69x |
Xlnet base cased MRPC (HuggingFace) | qlinearops | 89.74% | 89.86% | -0.13% | 108.36 | 91.63 | 1.18x |
Xlnet base cased MRPC (HuggingFace) | integerops | 89.58% | 89.86% | -0.31% | 108.27 | 92.24 | 1.17x |
BART large MRPC (HuggingFace) | qlinearops | 91.77% | 91.20% | 0.63% | 58.98 | 51.23 | 1.15x |
BART large MRPC (HuggingFace) | integerops | 92.36% | 91.20% | 1.28% | 96.02 | 51.12 | 1.88x |
DeBERTa v3 base MRPC (HuggingFace) | qlinearops | 91.85% | 92.23% | -0.40% | 161.42 | 147.11 | 1.10x |
DeBERTa v3 base MRPC (HuggingFace) | integerops | 92.39% | 92.23% | 0.17% | 170.50 | 147.28 | 1.16x |
Spanbert SQuAD (HuggingFace) | qlinearops | 91.14 | 91.98 | -0.91% | 69.94 | 42.36 | 1.65x |
Spanbert SQuAD (HuggingFace) | integerops | 91.40 | 91.98 | -0.63% | 80.06 | 42.62 | 1.88x |
Bert base multilingual cased SQuAD (HuggingFace) | qlinearops | 88.42 | 89.13 | -0.79% | 71.67 | 42.36 | 1.69x |
Bert base multilingual cased SQuAD (HuggingFace) | integerops | 88.70 | 89.13 | -0.48% | 79.42 | 42.32 | 1.88x |
DistilBert base uncased SQuAD (HuggingFace) | qlinearops | 86.33 | 86.86 | -0.62% | 112.14 | 67.59 | 1.66x |
DistilBert base uncased SQuAD (HuggingFace) | integerops | 86.05 | 86.86 | -0.94% | 159.29 | 67.70 | 2.35x |
BERT large uncased whole word masking SQuAD (HuggingFace) | qlinearops | 92.34 | 93.16 | -0.88% | 24.56 | 12.71 | 1.93x |
BERT large uncased whole word masking SQuAD (HuggingFace) | integerops | 92.99 | 93.16 | -0.18% | 26.76 | 12.72 | 2.10x |
Roberta large SQuAD v2 (HuggingFace) | qlinearops | 89.03 | 89.02 | 0.02% | 16.85 | 12.95 | 1.30x |
Roberta large SQuAD v2 (HuggingFace) | integerops | 89.04 | 89.02 | 0.02% | 26.85 | 12.95 | 2.07x |
GPT2 WikiText (HuggingFace) | qlinearops | 30.25 | 29.00 | 4.33% | 12.63 | 9.76 | 1.29x |
GPT2 WikiText (HuggingFace) | integerops | 29.68 | 29.00 | 2.36% | 13.54 | 9.72 | 1.39x |
DistilGPT2 WikiText (HuggingFace) | qlinearops | 44.93 | 43.43 | 3.46% | 20.45 | 16.72 | 1.22x |
DistilGPT2 WikiText (HuggingFace) | integerops | 44.62 | 43.43 | 2.74% | 21.91 | 16.73 | 1.31x |
LayoutLM FUNSD (HuggingFace) | qlinearops | 78.15% | 78.35% | -0.25% | 60.41 | 43.95 | 1.37x |
LayoutLM FUNSD (HuggingFace) | integerops | 77.58% | 78.35% | -0.98% | 65.82 | 43.83 | 1.50x |
LayoutLMv3 FUNSD (HuggingFace) | qlinearops | 89.85% | 90.49% | -0.71% | 31.12 | 29.13 | 1.07x |
LayoutLMv3 FUNSD (HuggingFace) | integerops | 90.07% | 90.49% | -0.46% | 35.01 | 27.92 | 1.25x |
Model | Example | Accuracy | Performance 1s56c1ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
Faster R-CNN (ONNX Model Zoo) | qlinearops | 34.06% | 34.37% | -0.88% | 3.99 | 3.28 | 1.21x |
Faster R-CNN (ONNX Model Zoo) | qdq | 33.98% | 34.37% | -1.12% | 4.00 | 3.37 | 1.19x |
Mask R-CNN (ONNX Model Zoo) | qlinearops | 33.13% | 33.72% | -1.74% | 3.36 | 2.95 | 1.14x |
Mask R-CNN (ONNX Model Zoo) | qdq | 33.29% | 33.72% | -1.28% | 3.38 | 2.98 | 1.14x |
FCN (ONNX Model Zoo) | qlinearops | 64.54% | 64.98% | -0.67% | 28.19 | 12.60 | 2.24x |
FCN (ONNX Model Zoo) | qdq | 64.54% | 64.98% | -0.67% | 28.22 | 12.56 | 2.25x |
Model | Task Dataset |
Dense Accuracy Sparse Accuracy |
Relative Drop | Sparsity ratio Sparsity Pattern |
Comments Balanced or unbalanced ratio |
---|---|---|---|---|---|
Bert-Mini | question answering SQuAD-v1.1 |
f1=76.87 f1=76.2 |
-0.80% | 80% structured 4x1 |
snip momentum unbalanced |
Bert-Mini | question answering SQuAD-v1.1 |
f1=76.87 f1=76.2 |
-0.80% | 80% structured 4x1 |
snip momentum unbalanced |
Bert-Mini | question answering SQuAD-v1.1 |
f1=76.87 f1=77.62 |
+0.98% | 50% structured 2:4 |
snip momentum balanced |
Distilbert-base-uncased | question answering SQuAD-v1.1 |
f1=86.90 f1=86.15 |
-0.86% | 80% structured 4x1 |
snip momentum unbalanced |
Distilbert-base-uncased | question answering SQuAD-v1.1 |
f1=86.90 f1=87.50 |
+0.69% | 50% structured 2:4 |
snip momentum balanced |
Bert-base-uncased | question answering SQuAD-v1.1 |
f1=88.59 f1=87.78 |
-0.92% | 80% structured 4x1 |
snip momentum unbalanced |
Bert-base-uncased | question answering SQuAD-v1.1 |
f1=88.59 f1=89.40 |
+0.91% | 50% structured 2:4 |
snip momentum balanced |
Bert-large | question answering SQuAD-v1.1 |
f1=91.23 f1=90.91 |
-0.35% | 80% structured 4x1 |
snip momentum unbalanced |
Bert-large | question answering SQuAD-v1.1 |
f1=91.23 f1=91.67 |
+0.48% | 50% structured 2:4 |
snip momentum balanced |
Bert-Mini | text classification MRPC |
f1=87.52 f1=87.22 |
-0.34% | 90% structured 4x1 |
snip momentum unbalanced |
Bert-Mini | text classification MRPC |
f1=87.52 f1=87.33 |
-0.22% | 90% structured 4x1 |
snip momentum balanced |
Bert-Mini | text classification MRPC |
f1=87.52 f1=86.89 |
-0.72% | 50% structured 2:4 |
snip momentum balanced |
Bert-Mini | text classification MRPC |
f1=87.52 f1=86.8 |
-0.83% | 60% structured per channel |
snip momentum unbalanced |
Distilbert-base-uncased | text classification MRPC |
f1=90.26 f1=89.85 |
-0.46% | 90% structured 4x1 |
snip momentum unbalanced |
Distilbert-base-uncased | text classification MRPC |
f1=90.26 f1=90.88 |
+0.69% | 50% structured 2:4 |
snip momentum balanced |
Bert-Mini | text classification SST-2 |
accuracy=87.61 accuracy=86.92 |
-0.79% | 90% structured 4x1 |
snip momentum unbalanced |
Bert-Mini | text classification SST-2 |
accuracy=87.61 accuracy=87.73 |
+0.14% | 50% structured 2:4 |
snip momentum balanced |
Bert-Mini | text classification SST-2 |
accuracy=87.61 accuracy=86.92 |
-0.79% | 50% structured per channel |
snip momentum unbalanced |
ResNet50 | image recognition ImageNet |
top1 acc = 78.95 top1 acc = 80.10 |
-1.43% | 75% structured 2x1 |
snip momentum unbalanced |
YOLO-v5s6 | object detection COCO |
AP0.50:0.95/AP0.50=0.404/0.6 AP0.50:0.95/AP0.50=0.393/0.584 |
-2.72% | 80% unstructured |
snip momentum unbalanced |
Bert-Large | question answering SQuAD-v1.1 |
f1=91.34 f1=90.7 |
-0.07% | 80% structured 2x1 |
group lasso unbalanced |
Bert-Base | text classification MNLI |
[m, mm] = [84.57, 84.79] [m, mm] = [82.45, 83.27] |
[-2.51%, -1.80%] | 70% unstructured |
Prune once for all balanced |
Bert-Base | text classification MNLI |
[m, mm] = [84.57, 84.79] [m, mm] = [83.20, 84.11] |
[-1.62%, -0.80%] | 50% structured 1:2 |
Prune once for all balanced |
Bert-Base | text classification SST-2 |
accuracy = 92.32 accuracy = 91.51 |
-0.88% | 70% unstructured |
Prune once for all balanced |
Bert-Base | text classification SST-2 |
accuracy = 92.32 accuracy = 92.20 |
-0.13% | 50% structured 1:2 |
Prune once for all balanced |
Bert-Base | text classification SST-2 |
accuracy = 92.32 accuracy = 91.97 |
-0.38% | 20% unstructured |
gradient sensitivity balanced |
Bert-Base | text classification QQP |
[accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [90.48, 87.06] |
[-0.68%, -1.12%] | 70% unstructured |
Prune once for all balanced |
Bert-Base | text classification QQP |
[accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [90.92, 87.78] |
[-0.20%, -0.31%] | 50% structured 1:2 |
Prune once for all balanced |
Bert-Base | text classification QNLI |
accuracy = 91.54 accuracy = 90.39 |
-1.26% | 70% unstructured |
Prune once for all balanced |
Bert-Base | text classification QNLI |
accuracy = 91.54 accuracy = 90.87 |
-0.73% | 50% structured 1:2 |
Prune once for all balanced |
Bert-Base | question answering | [em, f1] = [79.34, 87.10] [em, f1] = [77.27, 85.75] |
[-2.61%, -1.54%] | 70% unstructured |
Prune once for all balanced |
Bert-Base | question answering | [em, f1] = [79.34, 87.10] [em, f1] = [78.03, 86.50] |
[-1.65%, -0.69%] | 50% structured 1:2 |
Prune once for all balanced |
Example Name | Dataset | Student (Metrics) |
Teacher (Metrics) |
Student With Distillation (Metrics Improvement) |
Student With Distributed Distillation (Metrics Improvement) |
---|---|---|---|---|---|
MobileNet example | CIFAR-10 | MobileNetV2-0.35 (0.7965 ACC) |
WideResNet40-2 (0.9522 ACC) |
0.8178 ACC (0.0213 ACC) |
0.8235 ACC (0.027 ACC) |
CNN example | CIFAR-100 | CNN-2 (0.5494 ACC) |
CNN-10 (0.7153 ACC) |
0.5540 ACC (0.0046 ACC) |
0.5523 ACC (0.0029 ACC) |
VGG example | CIFAR-100 | VGG-8-BN (0.7022 ACC) |
VGG-13-BN (0.7415 ACC) |
0.7025 ACC (0.0003 ACC) |
NA |
ResNet example | ImageNet | ResNet18 (0.6739 ACC) |
ResNet50 (0.7399 ACC) |
0.6845 ACC (0.0106 ACC) |
NA |
BlendCnn example | MRPC | BlendCnn (0.7034 ACC) |
BERT-Base (0.8382 ACC) |
0.7034 ACC (0 ACC) |
NA |
BiLSTM example | SST-2 | BiLSTM (0.8314 ACC) |
RoBERTa-Base (0.9403 ACC) |
0.9048 ACC (0.0734 ACC) |
NA |
DistilBERT example | SQuAD | DistilBERT (0.7323/0.8256 EM/F1) |
BERT-Base (0.8084/0.8814 EM/F1) |
0.7442/0.8371 EM/F1 (0.0119/0.0115 EM/F1) |
NA |
TinyBERT example | MNLI | TinyBERT (0.8018/0.8044 m/mm) |
BERT-Base (0.8363/0.8411 m/mm) |
0.8025/0.8074 m/mm (0.0007/0.0030 m/mm) |
NA |
BERT-3 example | QQP | BERT-3 (0.8626/0.8213 EM/F1) |
BERT-Base (0.9091/0.8782 EM/F1) |
0.8684/0.8259 EM/F1 (0.0058/0.0046 EM/F1) |
NA |
DistilRoBERTa example | COLA | DistilRoBERTa (0.6057 ACC) |
RoBERTa-Large (0.6455 ACC) |
0.6187 ACC (0.0130 ACC) |
NA |
Model (ONNX QDQ) | AWS c6i.2xlarge (Intel) CPU Execution Provider |
AWS c6a.2xlarge (AMD) CPU Execution Provider |
AWS c6g.2xlarge (ARM) CPU Execution Provider |
NVidia A100 CUDA Execution Provider |
---|---|---|---|---|
ResNet50 | 74.76% | 68.95% | 74.76% | 74.75% |
BERT-base | 85.54% | 84.56% | 85.54% | 84.31% |
ResNet50 V1.5 | 72.20% | 67.70% | 72.20% | 72.29% |
MobileNet V2 | 65.82% | 58.56% | 65.83% | 65.63% |
SSD MobileNet V1 | 22.45% | 16.53% | 22.45% | 22.35% |
DistilBERT base MRPC | 84.56% | 83.82% | 84.56% | 84.56% |
SqueezeNet | 56.54% | 53.52% | 56.54% | 56.55% |
SSD | 18.63% | 18.54% | 18.63% | 18.61% |
AlexNet | 54.71% | 47.06% | 54.71% | 54.79% |
CaffeNet | 56.25% | 52.35% | 56.27% | 56.24% |
GoogleNet | 67.73% | 63.56% | 67.72% | 67.76% |
ZFNet | 55.86% | 45.09% | 55.86% | 55.89% |
Inception V1 | 67.21% | 63.03% | 67.20% | 67.21% |
SSD MobileNet V1 (ONNX Model Zoo) | 22.86% | 16.94% | 22.80% | 22.87% |
Mobile bert MRPC | 85.54% | 84.56% | 85.54% | 85.54% |
Roberta base MRPC | 89.46% | 90.44% | 89.71% | 89.71% |
ResNet50 V1.5 MLPerf | 76.14% | 72.80% | 76.14% | 76.17% |
VGG16 | 66.69% | 64.25% | 66.69% | 66.64% |
VGG16 (ONNX Model Zoo) | 72.31% | 69.35% | 72.32% | 72.34% |
MobileNet V3 MLPerf | 75.57% | 70.78% | 75.56% | 75.52% |
EfficientNet | 77.61% | 76.52% | 77.56% | 77.60% |
MobileNet V2 (ONNX Model Zoo) | 68.51% | 62.48% | 68.58% | 68.48% |
ShuffleNet V2 | 66.12% | 58.41% | 66.11% | 66.11% |