Skip to content

Latest commit

 

History

History
3006 lines (2969 loc) · 63.5 KB

validated_model_list.md

File metadata and controls

3006 lines (2969 loc) · 63.5 KB

Validated Models

Intel® Neural Compressor validated examples with multiple compression techniques. The typical examples link can be found in example tables, and the performance/accuracy results is available here.

  1. Validated Quantization Examples

    1.1. TensorFlow Models with Intel TensorFlow 2.12.0

    1.2. TensorFlow Models with Intel® Extension for TensorFlow* 1.2.0

    1.3. PyTorch Models with Torch 2.0.1+cpu in PTQ Mode

    1.4. PyTorch Models with Torch 2.0.1+cpu in QAT Mode

    1.5. PyTorch Models with Intel® Extension for PyTorch* 2.0.1+cpu

    1.6. ONNX Models with ONNX Runtime 1.15.0

  2. Validated Pruning Examples

  3. Validated Knowledge Distillation Examples

  4. Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime

Validated Quantization Examples

System summary: Test by Intel on 06/19/2023. 1-node, 1x Intel(R) Xeon(R) Platinum 8480+ @3.8GHz, 56 cores/socket, HT On, Turbo On, Total Memory 256GB (16x16GB DDR5 4800 MT/s [4800 MT/s]), BIOS 3A14.TEL2P1, microcode 0x2b0001b0, CentOS Stream 8, gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), DL Models, Frameworks: TensorFlow/ONNXRT/PyTorch, Datatype: FP32/INT8/BF16.
Using 1 socket, 4 cores/instance, 14 instances and batch size 1 to benchmark most of the model.
Using 1 socket, 56 cores/instance, 1 instance and batch size 1 for some large models performance measurement.

Performance varies by use, configuration and other factors.
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks

TensorFlow Models with Intel TensorFlow 2.12.0

Model Example Accuracy Performance 1s4c14ins1bs
Throughput(samples/sec)
INT8 FP32 Accuracy Ratio
[(INT8-FP32)/FP32]
INT8 FP32 Performance Ratio
[INT8/FP32]
ResNet50 v1.0 pb 74.12% 74.27% -0.21% 2721.21 638.25 4.26x
ResNet50 v1.5 pb 76.23% 76.46% -0.31% 2123.70 552.94 3.84x
ResNet101 pb 77.50% 76.45% 1.37% 1477.29 432.29 3.42x
Inception V1 pb 70.44% 69.74% 1.01% 3267.92 1266.03 2.58x
Inception V2 pb 74.38% 73.97% 0.57% 2399.76 1098.67 2.18x
Inception V3 pb 76.71% 76.75% -0.05% 1593.59 508.58 3.13x
Inception V4 pb 80.18% 80.27% -0.11% 1032.10 249.39 4.14x
Inception ResNet V2 pb 80.34% 80.40% -0.07% 427.28 185.60 2.30x
MobileNet V1 pb 71.78% 70.96% 1.16% 5503.87 1791.62 3.07x
MobileNet V2 pb 72.52% 71.76% 1.07% 3639.83 1864.72 1.95x
VGG16 pb 72.64% 70.89% 2.47% 1538.21 236.22 6.51x
VGG19 pb 72.69% 71.01% 2.37% 1368.21 196.94 6.95x
ResNetV2 50 pb 70.44% 69.64% 1.15% 1105.19 657.45 1.68x
ResNetV2 101 pb 72.65% 71.87% 1.08% 716.49 369.95 1.94x
ResNetV2 152 pb 73.07% 72.37% 0.97% 508.60 269.31 1.89x
Densenet 121 pb 73.59% 72.89% 0.97% 617.94 498.43 1.24x
Densenet 161 pb 76.35% 76.29% 0.08% 372.04 242.05 1.54x
Densenet 169 pb 74.34% 74.65% -0.41% 496.41 411.94 1.21x
EfficientNet B0 ckpt 76.14% 76.76% -0.81% 748.42 709.43 1.05x
SSD ResNet50 V1 pb 37.88% 38.00% -0.31% 134.81 31.06 4.34x
SSD MobileNet V1 pb 22.98% 23.13% -0.64% 1273.79 671.84 1.90x
SSD ResNet50 v1 ckpt 37.89% 38.00% -0.30% 136.53 27.88 4.90x
SSD MobileNet v1 ckpt 22.96% 23.13% -0.72% 1235.03 477.83 2.58x
SSD ResNet34 pb 21.70% 22.09% -1.76% 179.37 13.96 12.85x
Faster R-CNN Inception ResNet V2 pb 37.47% 38.31% -2.18% 5.39 3.01 1.79x
Faster R-CNN Inception ResNet V2 SavedModel 37.79% 38.31% -1.34% 5.35 1.89 2.83x
Faster R-CNN ResNet101 pb 30.32% 30.39% -0.23% 156.71 23.50 6.67x
Faster R-CNN ResNet101 SavedModel 30.33% 30.39% -0.20% 152.21 18.50 8.23x
Faster R-CNN ResNet50 pb 26.64% 26.59% 0.21% 173.07 28.83 6.00x
YOLOv3 pb 82.13% 82.35% -0.28% 211.67 87.89 2.41x
BERT large SQuAD pb 92.47 92.99 -0.56% 46.87 16.65 2.82x
BERT large SQuAD (ONNX Model Zoo) pb 92.42 92.98 -0.61% 42.35 17.03 2.49x
BERT base MRPC ckpt 86.03% 86.52% -0.57% 424.94 174.10 2.44x
Transformer LT pb 25.77 25.86 -0.34% 42.11 22.11 1.90x
Transformer lt MLPerf pb 27.10 27.17 -0.25% 9.82 4.29 2.29x
Wide Deep large DS pb 77.75% 77.67% 0.10% 55612.97 43479.53 1.28x
Model Example Accuracy Performance 1s56c1ins1bs
Throughput(samples/sec)
INT8 FP32 Accuracy Ratio
[(INT8-FP32)/FP32]
INT8 FP32 Performance Ratio
[INT8/FP32]
Mask R-CNN Inception V2 pb 28.60% 28.73% -0.44% 39.35 23.84 1.65x
Mask R-CNN Inception V2 ckpt 28.60% 28.73% -0.44% 40.21 23.90 1.68x
GPT2 pb 66.89% 67.57% -1.00% 9.67 7.22 1.34x

TensorFlow Models with Intel® Extension for TensorFlow* 1.2.0

Model Example Accuracy Performance 1s4c14ins1bs
Throughput(samples/sec)
INT8 FP32 Accuracy Ratio
[(INT8-FP32)/FP32]
INT8 FP32 Performance Ratio
[INT8/FP32]
ResNet50 v1.0 pb 74.16% 74.27% -0.15% 2716.04 569.18 4.77x
ResNet50 v1.5 pb 76.27% 76.46% -0.26% 2683.90 476.14 5.64x
Inception V1 pb 69.59% 69.74% -0.22% 2349.32 1035.63 2.27x
Inception V2 pb 73.75% 73.97% -0.30% 2399.93 930.62 2.58x
Inception V4 pb 80.03% 80.27% -0.31% 763.85 262.22 2.91x
MobileNet V1 pb 70.61% 70.96% -0.48% 4003.12 1677.22 2.39x
MobileNet V2 pb 71.15% 71.76% -0.85% 2766.36 2643.21 1.05x
VGG16 pb 70.84% 70.89% -0.07% 1495.88 238.52 6.27x
VGG19 pb 71.03% 71.01% 0.03% 1372.91 199.52 6.88x
ResNetV2 50 pb 69.43% 69.64% -0.30% 1457.53 630.41 2.31x
ResNetV2 101 pb 71.84% 71.87% -0.05% 842.53 338.44 2.49x
ResNetV2 152 pb 72.26% 72.37% -0.15% 645.86 231.63 2.79x
EfficientNet B0 ckpt 76.76% 76.76% 0.00% 938.82 707.22 1.33x
EfficientNet V2 B0 SavedModel 78.63% 78.62% 0.01% 1533.95 1258.45 1.22x
SSD MobileNet V1 pb 22.90% 23.13% -0.99% 981.29 647.07 1.52x
SSD MobileNet v1 ckpt 22.92% 23.13% -0.89% 850.31 444.12 1.91x
Faster R-CNN Inception ResNet V2 pb 38.02% 38.31% -0.74% 7.08 2.93 2.42x
Faster R-CNN Inception ResNet V2 SavedModel 38.18% 38.31% -0.32% 6.61 2.79 2.37x
YOLOv3 pb 80.27% 82.35% -2.53% 543.50 80.59 6.74x
BERT large SQuAD pb 92.67 92.97 -0.33% 72.27 18.39 3.93x
BERT base MRPC ckpt 86.28% 86.28% 0.00% 947.96 233.07 4.07x
DistilBERT base pb 90.48% 91.06% -0.64% 788.64 462.35 1.71x
Transformer LT pb 25.73 25.86 -0.47% 42.07 29.21 1.44x
Transformer lt MLPerf pb 27.13 27.17 -0.14% 10.43 4.84 2.15x
Wide Deep large DS pb 77.66% 77.67% -0.02% 51958.00 39974.56 1.30x

PyTorch Models with Torch 2.0.1+cpu in PTQ Mode

Model Example Accuracy Performance 1s4c14ins1bs
Throughput(samples/sec)
INT8 FP32 Accuracy Ratio
[(INT8-FP32)/FP32]
INT8 FP32 Performance Ratio
[INT8/FP32]
ResNet18 static 69.61% 69.76% -0.22% 1631.83 662.13 2.46x
ResNet50 static 75.92% 76.15% -0.30% 1162.83 330.92 3.51x
Inception V3 static 69.47% 69.52% -0.07% 968.67 334.53 2.90x
ResNeSt50 static 80.80% 81.04% -0.30% 394.38 40.76 9.67x
ResNeXt101_32x8d static 78.94% 79.31% -0.46% 558.59 108.42 5.15x
Efficientnet_b0 static 76.89% 77.67% -1.01% 703.73 656.12 1.07x
Efficientnet_b3 static 77.82% 78.54% -0.93% 510.58 391.05 1.31x
Efficientnet_b7 static 73.55% 73.92% -0.50% 233.29 150.09 1.55x
Peleenet static 71.85% 72.10% -0.35% 857.72 585.60 1.46x
YOLO V3 static 55.09% 54.93% 0.31% 160.97 60.60 2.66x
SSD ResNet34 static 19.52 19.63 -0.58% 141.67 11.75 12.05x
Roberta base MRPC static 92.69% 93.59% -0.96% 407.78 174.53 2.34x
CamemBERT base MRPC static 88.93% 89.28% -0.39% 402.78 173.56 2.32x
DistilBERT base MRPC dynamic 90.20% 90.27% -0.07% 748.28 343.54 2.18x
DistilBERT base MRPC static 89.53% 90.27% -0.82% 804.57 343.24 2.34x
ALBERT base MRPC static 92.63% 92.63% 0.00% 352.44 162.26 2.17x
91.60% 92.25% -0.71% 302.57 183.57 1.65x
Xlm Roberta MRPC static 88.36% 88.62% -0.29% 404.61 173.71 2.33x
Xlm Roberta MRPC dynamic 88.24% 88.24% 0.00% 382.72 174.63 2.19x
BERT base MRPC static 89.63% 90.42% -0.87% 407.58 173.66 2.35x
BERT base COLA static 54.51% 53.39% 2.10% 414.72 173.86 2.39x
BERT base STSB static 87.55% 88.05% -0.57% 413.76 173.34 2.39x
BERT base SST-2 static 91.51% 92.32% -0.87% 410.87 173.63 2.37x
BERT large COLA static 62.84% 63.35% -0.80% 138.89 51.65 2.69x
BERT base RTE static 72.56% 72.56% 0.00% 385.23 173.32 2.22x
BERT large MRPC static 90.22% 90.38% -0.17% 141.61 51.67 2.74x
BERT large QNLI static 90.87% 91.54% -0.74% 407.84 173.52 2.35x
BERT large RTE static 73.29% 74.01% -0.98% 141.64 51.33 2.76x
BERT large RTE dynamic 71.48% 74.01% -3.41% 126.49 51.34 2.46x
BERT large SQuAD static 92.27 93.16 -0.95% 37.61 16.57 2.27x
GPT J WikiText static 3.36 2.34 NA 0.87 0.28 3.15x
Reformer Crime and Punishment static 1.88 1.87 0.23% 449.73 364.78 1.23x
lvwerra/pegasus-samsum static 42.50 42.67 -0.39% 101.32 37.80 2.68x
Model Example Accuracy Performance 1s56c1ins1bs
Throughput(samples/sec)
INT8 FP32 Accuracy Ratio
[(INT8-FP32)/FP32]
INT8 FP32 Performance Ratio
[INT8/FP32]
openai/whisper-large dynamic 97.07% 96.96% 0.12% 0.60 0.47 1.28x
abeja/gpt-neox-japanese-2.7b static 4.30 3.52 22.06% 1.03 0.56 1.84x

PyTorch Models with Torch 2.0.1+cpu in QAT Mode

Model Example Accuracy Performance 1s4c14ins1bs
Throughput(samples/sec)
INT8 FP32 Accuracy Ratio
[(INT8-FP32)/FP32]
INT8 FP32 Performance Ratio
[INT8/FP32]
ResNet18 static 69.74% 69.76% -0.03% 1723.70 654.17 2.63x
ResNet50 static 76.05% 76.15% -0.12% 1141.22 306.04 3.73x
ResNeXt101_32x8d static 79.28% 79.31% -0.04% 558.92 106.82 5.23x
MobileNet V2 static 69.73% 71.84% -2.93% 1379.34 729.22 1.89x
BERT base MRPC static 89.70% 90.40% -0.77% 389.77 173.54 2.25x

PyTorch Models with Intel® Extension for PyTorch* 2.0.1+cpu

Model Example Accuracy Performance 1s4c14ins1bs
Throughput(samples/sec)
INT8 FP32 Accuracy Ratio
[(INT8-FP32)/FP32]
INT8 FP32 Performance Ratio
[INT8/FP32]
ResNet18 static 75.98% 76.15% -0.22% 1980.94 672.93 2.94x
ResNet50 static 69.56% 69.76% -0.29% 5032.32 1500.16 3.35x
ResNeXt101_32x16d_wsl static 84.04% 84.17% -0.15% 533.60 78.84 6.77x
SSD ResNet34 static 19.93% 20.00% -0.38% 84.02 15.68 5.36x
bert-large-uncased-whole-word-masking-finetuned-squad static 92.93 93.16 -0.25% 161.44 22.19 7.27x
distilbert-base-uncased-distilled-squad static 86.09 86.84 -0.86% 556.19 149.79 3.71x
Model Example Accuracy Performance 1s56c1ins1bs
Throughput(samples/sec)
INT8 FP32 Accuracy Ratio
[(INT8-FP32)/FP32]
INT8 FP32 Performance Ratio
[INT8/FP32]
EleutherAI/gpt-j-6B static 78.60% 79.20% -0.76% 4.87 1.55 3.14x

ONNX Models with ONNX Runtime 1.15.0

Model Example Accuracy Performance 1s4c14ins1bs
Throughput(samples/sec)
INT8 FP32 Accuracy Ratio
[(INT8-FP32)/FP32]
INT8 FP32 Performance Ratio
[INT8/FP32]
ResNet50 V1.5 qlinearops 72.16% 72.29% -0.19% 1412.05 710.02 1.99x
ResNet50 V1.5 qdq 72.14% 72.29% -0.22% 1564.39 712.38 2.20x
ResNet50 V1.5 MLPerf qlinearops 76.11% 76.46% -0.46% 1377.47 719.66 1.91x
ResNet50 V1.5 MLPerf qdq 76.13% 76.46% -0.44% 1446.69 703.40 2.06x
ResNet50 V1.5 (ONNX Model Zoo) qlinearops 74.82% 74.99% -0.22% 1579.31 747.73 2.11x
ResNet50 V1.5 (ONNX Model Zoo) qdq 74.82% 74.99% -0.23% 1508.21 749.43 2.01x
MobileNet V2 qlinearops 65.49% 66.89% -2.09% 6950.77 4214.56 1.65x
MobileNet V2 qdq 65.49% 66.89% -2.10% 6881.60 4192.78 1.64x
MobileNet V2 (ONNX Model Zoo) qlinearops 68.38% 69.48% -1.59% 6563.24 3804.18 1.73x
MobileNet V2 (ONNX Model Zoo) qdq 68.38% 69.48% -1.59% 6631.12 3922.70 1.69x
VGG16 qlinearops 66.56% 66.69% -0.19% 423.44 158.01 2.68x
VGG16 qdq 66.59% 66.69% -0.15% 571.02 161.69 3.53x
VGG16 (ONNX Model Zoo) qlinearops 72.33% 72.40% -0.09% 598.92 163.53 3.66x
VGG16 (ONNX Model Zoo) qdq 72.33% 72.40% -0.09% 594.66 164.39 3.62x
MobileNet V3 MLPerf qlinearops 75.56% 75.74% -0.24% 5473.90 2567.96 2.13x
MobileNet V3 MLPerf qdq 75.56% 75.74% -0.24% 5455.36 2563.80 2.13x
ShuffleNet V2 (ONNX Model Zoo) qlinearops 66.09% 66.36% -0.41% 6818.46 3839.67 1.78x
ShuffleNet V2 (ONNX Model Zoo) qdq 66.09% 66.36% -0.41% 5750.72 3861.83 1.49x
GoogleNet (ONNX Model Zoo) qlinearops 67.71% 67.79% -0.12% 1783.63 1095.06 1.63x
GoogleNet (ONNX Model Zoo) qdq 67.73% 67.79% -0.09% 1755.03 1071.04 1.64x
SqueezeNet (ONNX Model Zoo) qlinearops 56.54% 56.87% -0.57% 9918.09 5639.89 1.76x
SqueezeNet (ONNX Model Zoo) qdq 56.54% 56.87% -0.57% 9423.22 5501.30 1.71x
CaffeNet (ONNX Model Zoo) qlinearops 56.21% 56.30% -0.16% 3363.62 1015.06 3.31x
CaffeNet (ONNX Model Zoo) qdq 56.25% 56.30% -0.09% 3276.82 798.28 4.10x
AlexNet (ONNX Model Zoo) qlinearops 54.73% 54.79% -0.10% 2104.66 985.33 2.14x
AlexNet (ONNX Model Zoo) qdq 54.71% 54.79% -0.14% 2054.60 745.36 2.76x
ZFNet (ONNX Model Zoo) qlinearops 55.84% 55.96% -0.21% 864.73 456.41 1.89x
ZFNet (ONNX Model Zoo) qdq 55.86% 55.96% -0.18% 866.80 455.75 1.90x
Inception V1 (ONNX Model Zoo) qlinearops 67.21% 67.24% -0.05% 1802.03 1170.74 1.54x
Inception V1 (ONNX Model Zoo) qdq 67.21% 67.24% -0.05% 1813.29 1164.87 1.56x
EfficientNet (ONNX Model Zoo) qlinearops 76.98% 77.11% -0.17% 2615.12 1349.97 1.94x
EfficientNet (ONNX Model Zoo) qdq 76.99% 77.11% -0.16% 2343.94 1322.86 1.77x
DenseNet (ONNX Model Zoo) qlinearops 60.53% 60.96% -0.70% 630.80 499.98 1.26x
SSD (ONNX Model Zoo) qlinearops 18.83% 18.98% -0.77% 56.69 14.56 3.89x
SSD (ONNX Model Zoo) qdq 18.62% 18.98% -1.89% 57.54 14.55 3.95x
SSD MobileNet V1 qlinearops 22.44% 23.10% -2.86% 1288.14 878.69 1.47x
SSD MobileNet V1 qdq 22.44% 23.10% -2.86% 1173.88 851.00 1.38x
SSD MobileNet V1 (ONNX Model Zoo) qlinearops 22.96% 23.02% -0.27% 1114.65 825.47 1.35x
SSD MobileNet V1 (ONNX Model Zoo) qdq 22.96% 23.02% -0.27% 1056.30 792.66 1.33x
SSD MobileNet V2 qlinearops 23.87% 24.67% -3.25% 788.51 669.72 1.18x
YOLOv3 (ONNX Model Zoo) qlinearops 27.01% 28.73% -5.99% 140.21 110.43 1.27x
YOLOv4 (ONNX Model Zoo) qlinearops 32.30% 33.71% -4.19% 72.95 64.95 1.12x
DUC (ONNX Model Zoo) qlinearops 81.63% 81.92% -0.36% 9.12 4.96 1.84x
Tiny YOLOv3 (ONNX Model Zoo) qlinearops 11.83% 12.42% -4.73% 1163.39 993.96 1.17x
Ultra Face (ONNX Model Zoo) qlinearops 83.23% 83.65% -0.49% 8501.08 1922.19 4.42x
Emotion FERPlus (ONNX Model Zoo) qlinearops 7.97% 8.00% -0.35% 3552.60 3114.19 1.14x
ArcFace (ONNX Model Zoo) qlinearops 99.80% 99.80% 0.00% 558.78 246.87 2.26x
BERT base MRPC qlinearops 85.54% 86.03% -0.57% 399.04 226.03 1.77x
BERT base MRPC qdq 85.54% 86.03% -0.57% 392.26 223.21 1.76x
BERT base MRPC integerops 85.29% 86.03% -0.85% 474.99 222.71 2.13x
DistilBERT base MRPC qdq 84.56% 84.56% 0.00% 557.05 399.46 1.39x
DistilBERT base MRPC integerops 85.54% 84.56% 1.16% 963.92 399.36 2.41x
Mobile bert MRPC qdq 85.54% 86.28% -0.85% 529.98 394.46 1.34x
Mobile bert MRPC integerops 85.54% 86.28% -0.85% 603.66 398.15 1.52x
Roberta base MRPC integerops 90.93% 89.95% 1.09% 485.74 223.54 2.17x
BERT SQuAD (ONNX Model Zoo) integerops 80.29 80.67 -0.47% 187.63 95.88 1.96x
MobileBERT SQuAD MLPerf (ONNX Model Zoo) integerops 89.87 90.03 -0.17% 144.88 124.08 1.17x
BiDAF (ONNX Model Zoo) integerops 65.93% 66.08% -0.23% 2757.83 2279.38 1.21x
GPT2 lm head WikiText (ONNX Model Zoo) integerops 31.98 29.00 10.31% 15.35 9.73 1.58x
BERT base cased MRPC (HuggingFace) qlinearops 90.21% 90.42% -0.23% 357.89 211.81 1.69x
BERT base uncased MRPC (HuggingFace) integerops 89.58% 90.42% -0.93% 472.44 211.65 2.23x
Roberta base MRPC (HuggingFace) qlinearops 91.00% 91.38% -0.41% 365.03 214.66 1.70x
Roberta base MRPC (HuggingFace) integerops 90.85% 91.38% -0.58% 489.85 212.20 2.31x
XLM Roberta base MRPC (HuggingFace) qlinearops 89.37% 90.10% -0.81% 302.49 212.76 1.42x
XLM Roberta base MRPC (HuggingFace) integerops 89.66% 90.10% -0.50% 343.75 213.09 1.61x
Camembert base MRPC (HuggingFace) qlinearops 89.28% 89.28% 0.00% 270.01 215.48 1.25x
Camembert base MRPC (HuggingFace) integerops 89.19% 89.28% -0.10% 491.01 212.92 2.31x
MiniLM L12 H384 uncased MRPC (HuggingFace) qlinearops 90.13% 90.97% -0.93% 1051.67 583.85 1.80x
MiniLM L12 H384 uncased MRPC (HuggingFace) integerops 91.07% 90.97% 0.10% 1076.27 589.80 1.82x
DistilBERT base uncased SST-2 (HuggingFace) qlinearops 90.71% 91.06% -0.38% 896.69 396.85 2.26x
DistilBERT base uncased SST-2 (HuggingFace) integerops 90.25% 91.06% -0.88% 753.88 396.59 1.90x
Albert base v2 SST-2 (HuggingFace) qlinearops 91.40% 92.32% -0.99% 274.17 210.87 1.30x
Albert base v2 SST-2 (HuggingFace) integerops 91.86% 92.32% -0.50% 271.85 211.18 1.29x
MiniLM L6 H384 uncased SST-2 (HuggingFace) qlinearops 89.45% 90.14% -0.76% 2022.40 1124.12 1.80x
MiniLM L6 H384 uncased SST-2 (HuggingFace) integerops 89.91% 90.14% -0.26% 2010.50 1127.41 1.78x
MiniLM L6 H384 uncased SST-2 (HuggingFace) qlinearops 87.70% 88.29% -0.67% 401.24 211.92 1.89x
MiniLM L6 H384 uncased SST-2 (HuggingFace) integerops 88.19% 88.29% -0.12% 494.84 212.01 2.33x
Electra small discriminator MRPC (HuggingFace) qlinearops 89.57% 89.83% -0.29% 1804.17 1154.99 1.56x
Electra small discriminator MRPC (HuggingFace) integerops 89.27% 89.83% -0.63% 1961.57 1158.86 1.69x
BERT mini MRPC (HuggingFace) qlinearops 86.70% 86.52% 0.21% 4986.29 3444.92 1.45x
BERT mini MRPC (HuggingFace) integerops 86.16% 86.52% -0.41% 5603.86 3320.38 1.69x
Xlnet base cased MRPC (HuggingFace) qlinearops 89.74% 89.86% -0.13% 108.36 91.63 1.18x
Xlnet base cased MRPC (HuggingFace) integerops 89.58% 89.86% -0.31% 108.27 92.24 1.17x
BART large MRPC (HuggingFace) qlinearops 91.77% 91.20% 0.63% 58.98 51.23 1.15x
BART large MRPC (HuggingFace) integerops 92.36% 91.20% 1.28% 96.02 51.12 1.88x
DeBERTa v3 base MRPC (HuggingFace) qlinearops 91.85% 92.23% -0.40% 161.42 147.11 1.10x
DeBERTa v3 base MRPC (HuggingFace) integerops 92.39% 92.23% 0.17% 170.50 147.28 1.16x
Spanbert SQuAD (HuggingFace) qlinearops 91.14 91.98 -0.91% 69.94 42.36 1.65x
Spanbert SQuAD (HuggingFace) integerops 91.40 91.98 -0.63% 80.06 42.62 1.88x
Bert base multilingual cased SQuAD (HuggingFace) qlinearops 88.42 89.13 -0.79% 71.67 42.36 1.69x
Bert base multilingual cased SQuAD (HuggingFace) integerops 88.70 89.13 -0.48% 79.42 42.32 1.88x
DistilBert base uncased SQuAD (HuggingFace) qlinearops 86.33 86.86 -0.62% 112.14 67.59 1.66x
DistilBert base uncased SQuAD (HuggingFace) integerops 86.05 86.86 -0.94% 159.29 67.70 2.35x
BERT large uncased whole word masking SQuAD (HuggingFace) qlinearops 92.34 93.16 -0.88% 24.56 12.71 1.93x
BERT large uncased whole word masking SQuAD (HuggingFace) integerops 92.99 93.16 -0.18% 26.76 12.72 2.10x
Roberta large SQuAD v2 (HuggingFace) qlinearops 89.03 89.02 0.02% 16.85 12.95 1.30x
Roberta large SQuAD v2 (HuggingFace) integerops 89.04 89.02 0.02% 26.85 12.95 2.07x
GPT2 WikiText (HuggingFace) qlinearops 30.25 29.00 4.33% 12.63 9.76 1.29x
GPT2 WikiText (HuggingFace) integerops 29.68 29.00 2.36% 13.54 9.72 1.39x
DistilGPT2 WikiText (HuggingFace) qlinearops 44.93 43.43 3.46% 20.45 16.72 1.22x
DistilGPT2 WikiText (HuggingFace) integerops 44.62 43.43 2.74% 21.91 16.73 1.31x
LayoutLM FUNSD (HuggingFace) qlinearops 78.15% 78.35% -0.25% 60.41 43.95 1.37x
LayoutLM FUNSD (HuggingFace) integerops 77.58% 78.35% -0.98% 65.82 43.83 1.50x
LayoutLMv3 FUNSD (HuggingFace) qlinearops 89.85% 90.49% -0.71% 31.12 29.13 1.07x
LayoutLMv3 FUNSD (HuggingFace) integerops 90.07% 90.49% -0.46% 35.01 27.92 1.25x
Model Example Accuracy Performance 1s56c1ins1bs
Throughput(samples/sec)
INT8 FP32 Accuracy Ratio
[(INT8-FP32)/FP32]
INT8 FP32 Performance Ratio
[INT8/FP32]
Faster R-CNN (ONNX Model Zoo) qlinearops 34.06% 34.37% -0.88% 3.99 3.28 1.21x
Faster R-CNN (ONNX Model Zoo) qdq 33.98% 34.37% -1.12% 4.00 3.37 1.19x
Mask R-CNN (ONNX Model Zoo) qlinearops 33.13% 33.72% -1.74% 3.36 2.95 1.14x
Mask R-CNN (ONNX Model Zoo) qdq 33.29% 33.72% -1.28% 3.38 2.98 1.14x
FCN (ONNX Model Zoo) qlinearops 64.54% 64.98% -0.67% 28.19 12.60 2.24x
FCN (ONNX Model Zoo) qdq 64.54% 64.98% -0.67% 28.22 12.56 2.25x

Validated Pruning Examples

Model Task
Dataset
Dense Accuracy
Sparse Accuracy
Relative Drop Sparsity ratio
Sparsity Pattern
Comments
Balanced
or unbalanced ratio
Bert-Mini question answering
SQuAD-v1.1
f1=76.87
f1=76.2
-0.80% 80%
structured 4x1
snip momentum
unbalanced
Bert-Mini question answering
SQuAD-v1.1
f1=76.87
f1=76.2
-0.80% 80%
structured 4x1
snip momentum
unbalanced
Bert-Mini question answering
SQuAD-v1.1
f1=76.87
f1=77.62
+0.98% 50%
structured 2:4
snip momentum
balanced
Distilbert-base-uncased question answering
SQuAD-v1.1
f1=86.90
f1=86.15
-0.86% 80%
structured 4x1
snip momentum
unbalanced
Distilbert-base-uncased question answering
SQuAD-v1.1
f1=86.90
f1=87.50
+0.69% 50%
structured 2:4
snip momentum
balanced
Bert-base-uncased question answering
SQuAD-v1.1
f1=88.59
f1=87.78
-0.92% 80%
structured 4x1
snip momentum
unbalanced
Bert-base-uncased question answering
SQuAD-v1.1
f1=88.59
f1=89.40
+0.91% 50%
structured 2:4
snip momentum
balanced
Bert-large question answering
SQuAD-v1.1
f1=91.23
f1=90.91
-0.35% 80%
structured 4x1
snip momentum
unbalanced
Bert-large question answering
SQuAD-v1.1
f1=91.23
f1=91.67
+0.48% 50%
structured 2:4
snip momentum
balanced
Bert-Mini text classification
MRPC
f1=87.52
f1=87.22
-0.34% 90%
structured 4x1
snip momentum
unbalanced
Bert-Mini text classification
MRPC
f1=87.52
f1=87.33
-0.22% 90%
structured 4x1
snip momentum
balanced
Bert-Mini text classification
MRPC
f1=87.52
f1=86.89
-0.72% 50%
structured 2:4
snip momentum
balanced
Bert-Mini text classification
MRPC
f1=87.52
f1=86.8
-0.83% 60%
structured per channel
snip momentum
unbalanced
Distilbert-base-uncased text classification
MRPC
f1=90.26
f1=89.85
-0.46% 90%
structured 4x1
snip momentum
unbalanced
Distilbert-base-uncased text classification
MRPC
f1=90.26
f1=90.88
+0.69% 50%
structured 2:4
snip momentum
balanced
Bert-Mini text classification
SST-2
accuracy=87.61
accuracy=86.92
-0.79% 90%
structured 4x1
snip momentum
unbalanced
Bert-Mini text classification
SST-2
accuracy=87.61
accuracy=87.73
+0.14% 50%
structured 2:4
snip momentum
balanced
Bert-Mini text classification
SST-2
accuracy=87.61
accuracy=86.92
-0.79% 50%
structured per channel
snip momentum
unbalanced
ResNet50 image recognition
ImageNet
top1 acc = 78.95
top1 acc = 80.10
-1.43% 75%
structured 2x1
snip momentum
unbalanced
YOLO-v5s6 object detection
COCO
AP0.50:0.95/AP0.50=0.404/0.6
AP0.50:0.95/AP0.50=0.393/0.584
-2.72% 80%
unstructured
snip momentum
unbalanced
Bert-Large question answering
SQuAD-v1.1
f1=91.34
f1=90.7
-0.07% 80%
structured 2x1
group lasso
unbalanced
Bert-Base text classification
MNLI
[m, mm] = [84.57, 84.79]
[m, mm] = [82.45, 83.27]
[-2.51%, -1.80%] 70%
unstructured
Prune once for all
balanced
Bert-Base text classification
MNLI
[m, mm] = [84.57, 84.79]
[m, mm] = [83.20, 84.11]
[-1.62%, -0.80%] 50%
structured 1:2
Prune once for all
balanced
Bert-Base text classification
SST-2
accuracy = 92.32
accuracy = 91.51
-0.88% 70%
unstructured
Prune once for all
balanced
Bert-Base text classification
SST-2
accuracy = 92.32
accuracy = 92.20
-0.13% 50%
structured 1:2
Prune once for all
balanced
Bert-Base text classification
SST-2
accuracy = 92.32
accuracy = 91.97
-0.38% 20%
unstructured
gradient sensitivity
balanced
Bert-Base text classification
QQP
[accuracy, f1] = [91.10, 88.05]
[accuracy, f1] = [90.48, 87.06]
[-0.68%, -1.12%] 70%
unstructured
Prune once for all
balanced
Bert-Base text classification
QQP
[accuracy, f1] = [91.10, 88.05]
[accuracy, f1] = [90.92, 87.78]
[-0.20%, -0.31%] 50%
structured 1:2
Prune once for all
balanced
Bert-Base text classification
QNLI
accuracy = 91.54
accuracy = 90.39
-1.26% 70%
unstructured
Prune once for all
balanced
Bert-Base text classification
QNLI
accuracy = 91.54
accuracy = 90.87
-0.73% 50%
structured 1:2
Prune once for all
balanced
Bert-Base question answering [em, f1] = [79.34, 87.10]
[em, f1] = [77.27, 85.75]
[-2.61%, -1.54%] 70%
unstructured
Prune once for all
balanced
Bert-Base question answering [em, f1] = [79.34, 87.10]
[em, f1] = [78.03, 86.50]
[-1.65%, -0.69%] 50%
structured 1:2
Prune once for all
balanced

Validated Knowledge Distillation Examples

Example Name Dataset Student
(Metrics)
Teacher
(Metrics)
Student With Distillation
(Metrics Improvement)
Student With
Distributed Distillation
(Metrics Improvement)
MobileNet example CIFAR-10 MobileNetV2-0.35
(0.7965 ACC)
WideResNet40-2
(0.9522 ACC)
0.8178 ACC
(0.0213 ACC)
0.8235 ACC
(0.027 ACC)
CNN example CIFAR-100 CNN-2
(0.5494 ACC)
CNN-10
(0.7153 ACC)
0.5540 ACC
(0.0046 ACC)
0.5523 ACC
(0.0029 ACC)
VGG example CIFAR-100 VGG-8-BN
(0.7022 ACC)
VGG-13-BN
(0.7415 ACC)
0.7025 ACC
(0.0003 ACC)
NA
ResNet example ImageNet ResNet18
(0.6739 ACC)
ResNet50
(0.7399 ACC)
0.6845 ACC
(0.0106 ACC)
NA
BlendCnn example MRPC BlendCnn
(0.7034 ACC)
BERT-Base
(0.8382 ACC)
0.7034 ACC
(0 ACC)
NA
BiLSTM example SST-2 BiLSTM
(0.8314 ACC)
RoBERTa-Base
(0.9403 ACC)
0.9048 ACC
(0.0734 ACC)
NA
DistilBERT example SQuAD DistilBERT
(0.7323/0.8256 EM/F1)
BERT-Base
(0.8084/0.8814 EM/F1)
0.7442/0.8371 EM/F1
(0.0119/0.0115 EM/F1)
NA
TinyBERT example MNLI TinyBERT
(0.8018/0.8044 m/mm)
BERT-Base
(0.8363/0.8411 m/mm)
0.8025/0.8074 m/mm
(0.0007/0.0030 m/mm)
NA
BERT-3 example QQP BERT-3
(0.8626/0.8213 EM/F1)
BERT-Base
(0.9091/0.8782 EM/F1)
0.8684/0.8259 EM/F1
(0.0058/0.0046 EM/F1)
NA
DistilRoBERTa example COLA DistilRoBERTa
(0.6057 ACC)
RoBERTa-Large
(0.6455 ACC)
0.6187 ACC
(0.0130 ACC)
NA

Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime

Model (ONNX QDQ) AWS c6i.2xlarge (Intel)
CPU Execution Provider
AWS c6a.2xlarge (AMD)
CPU Execution Provider
AWS c6g.2xlarge (ARM)
CPU Execution Provider
NVidia A100
CUDA Execution
Provider
ResNet50 74.76% 68.95% 74.76% 74.75%
BERT-base 85.54% 84.56% 85.54% 84.31%
ResNet50 V1.5 72.20% 67.70% 72.20% 72.29%
MobileNet V2 65.82% 58.56% 65.83% 65.63%
SSD MobileNet V1 22.45% 16.53% 22.45% 22.35%
DistilBERT base MRPC 84.56% 83.82% 84.56% 84.56%
SqueezeNet 56.54% 53.52% 56.54% 56.55%
SSD 18.63% 18.54% 18.63% 18.61%
AlexNet 54.71% 47.06% 54.71% 54.79%
CaffeNet 56.25% 52.35% 56.27% 56.24%
GoogleNet 67.73% 63.56% 67.72% 67.76%
ZFNet 55.86% 45.09% 55.86% 55.89%
Inception V1 67.21% 63.03% 67.20% 67.21%
SSD MobileNet V1 (ONNX Model Zoo) 22.86% 16.94% 22.80% 22.87%
Mobile bert MRPC 85.54% 84.56% 85.54% 85.54%
Roberta base MRPC 89.46% 90.44% 89.71% 89.71%
ResNet50 V1.5 MLPerf 76.14% 72.80% 76.14% 76.17%
VGG16 66.69% 64.25% 66.69% 66.64%
VGG16 (ONNX Model Zoo) 72.31% 69.35% 72.32% 72.34%
MobileNet V3 MLPerf 75.57% 70.78% 75.56% 75.52%
EfficientNet 77.61% 76.52% 77.56% 77.60%
MobileNet V2 (ONNX Model Zoo) 68.51% 62.48% 68.58% 68.48%
ShuffleNet V2 66.12% 58.41% 66.11% 66.11%