R2.10.1 Fixes #126

claynerobison · 2023-03-13T21:13:37Z

No description provided.

* revert bf16 changes (#488) * Add partials and spec yml for the end2end DLSA pipeline (#460) * Add partials and specs for the end2end DLSA pipeline * Add missing end line * Update name to include ipex * update specs to have use the public image as a base on one and SPR for the other * Dockerfile updates for the updated DLSA repo * Update pip install list * Rename to public * Removing partials that aren't used anymore * Fixes for 'kmp-blocktime' env var (#493) * Fixes for 'kmp-blocktime' env var Signed-off-by: Abolfazl Shahbazi <[email protected]> * update per review feedback Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add 'kmp-blocktime' for mlperf-gnmt (#494) * Add 'kmp-blocktime' for mlperf-gnmt Signed-off-by: Abolfazl Shahbazi <[email protected]> * Remove duplicate parameter definition Signed-off-by: Abolfazl Shahbazi <[email protected]> * add sample_input for resnet50 training (#495) * remove the case when fragment_size not equal args.batch_size (#500) * Changed the transformer_mlperf fp32 model so that we can fuse the ops… (#389) * Changed the transformer_mlperf fp32 model so that we can fuse the ops in the model, and also minor changes for python3 * Changed the transformer_mlperf int8 model so that we can fuse the ops in the model, and also minor changes for python3 * SPR updates for WW12, 2022 (#492) * SPR updates for WW12, 2022 Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update for PyTorch SPR WW2022-12 Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update pytorch base for SPR too Signed-off-by: Abolfazl Shahbazi <[email protected]> * Stick with specific 'keras-nightly' version Signed-off-by: Abolfazl Shahbazi <[email protected]> * Updates per code review Signed-off-by: Abolfazl Shahbazi <[email protected]> * update maskrcnn training_multinode.sh (#502) * Fixed a bug in the transformer_mlperf model threads setting (#482) * Fixed a bug in the transformer_mlperf model threads setting * Fix failing tests Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> * Added the default threads setting for transformer_mlperf inference in… (#504) * Added the default threads setting for transformer_mlperf inference in case there is no command line input * Fix unit tests Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> * PyTorch Image Classification TL notebook (#490) * Adds new TL notebook with documentation * Added newline * Added to main TL README * Small fixes * Updated for review feedback * Added more models and a download limit arg * Removed py3.9 requirement and changed default model * Adds Kitti torchvision dataset to TL notebook (#512) * Adds Kitti torchvision dataset to TL notebook * Fixed citations formatting * update maskrcnn model (#515) * minor update. (#465) * Create unit-test github action workflow (#518) * Create unit-test github action workflow Tested here: https://github.com/sriester/frameworks.ai.models.intel-models/runs/6089350443?check_suite_focus=true Runs tox py.test on push. * Containerize job * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Added login credentials to docker Trying to fix pull rate issue * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml Changed pip install command. * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml Changed docker credentials to imzbot * Update to Horovod commit 11c1389 to fix TF v2.9 + Horovod install failure (#519) Signed-off-by: Abolfazl Shahbazi <[email protected]> * update distilbert model to 4.18 transformers and enable int8 path (#521) * rnnt: use launcher to set output file path and name (#524) * Update BareMetalSetup.md (#526) Always use the latest torchvision * Reduce memory usage for dlrm acc test (#527) * updatedistilbert with text_classification (#529) * add patch for distilbert (#530) * Update the model-builder dockerfile to use ubuntu 20.04 (#532) * Add script for coco training dataset processing (#525) * and update tensorflow ssd-resnet34 training dataset instructions * update patch (#533) Co-authored-by: Wang, Chuanqi <[email protected]> * [RNN-T training] Enable FP32 gemm using oneDNN (#531) * Update the Readme guide for distilbert (#534) * Update the Readme guide for distilbert * Fix accuracy grep bug, and grep accuracy for distilbert Co-authored-by: Weizhuo Zhang <[email protected]> * Update end2end public dockerfile to look for IPEX in the conda directory (#535) * Notebook to script conversion example (#516) * Add notebook script conversion example * Fixed doc * Replaces custom preprocessor with built-in one * Changed tag to remove_for_custom_dataset * Add URL check prior to calling urlretrieve (#538) * Add URL check prior to calling urlretrieve Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a typo Signed-off-by: Abolfazl Shahbazi <[email protected]> * disable for ssd since fused cat cat kernel is slow (#537) * fix bug when adding steps in rnnt inference (#528) * Fix and updates for TensorFlow WW18-2022 SPR (#542) * Fix and updates for TensorFlow WW18-2022 SPR Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix TensorFlow SPR nightly versions Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update pre-trained models download URLs Signed-off-by: Abolfazl Shahbazi <[email protected]> * Intall Python 3.8 development tools Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix OpenMPI install and setup Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update to Horovod commit 11c1389 to fix TF v2.9 + Horovod install failure (#519) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Horovod Installaion for SPR and CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Python3.8 version for CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a typo in TensorFlow 3d-unet partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a broken partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add TCMalloc to TF base container for SPR and remove OpenSSL Signed-off-by: Abolfazl Shahbazi <[email protected]> * Remove some repositories Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add 'matplotlib' for '3d-unet' Signed-off-by: Abolfazl Shahbazi <[email protected]> * switch to build OpenMPI due to issue in Market Place provided version Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix PYTORCH_WHEEL and IPEX_WHEEL arg values Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix and updates for PyTorch WW14-2022 SPR (#543) * Fix and updates for PyTorch WW14-2022 SPR Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix and updates for TensorFlow WW18-2022 SPR Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix TensorFlow SPR nightly versions Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update pre-trained models download URLs Signed-off-by: Abolfazl Shahbazi <[email protected]> * Intall Python 3.8 development tools Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix OpenMPI install and setup Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update to Horovod commit 11c1389 to fix TF v2.9 + Horovod install failure (#519) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Horovod Installaion for SPR and CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Python3.8 version for CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a typo in TensorFlow 3d-unet partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a broken partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add TCMalloc to TF base container for SPR and remove OpenSSL Signed-off-by: Abolfazl Shahbazi <[email protected]> * Updates required to the base image Signed-off-by: Abolfazl Shahbazi <[email protected]> * Remove some repositories Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add 'matplotlib' for '3d-unet' Signed-off-by: Abolfazl Shahbazi <[email protected]> * switch to build OpenMPI due to issue in Market Place provided version Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix PYTORCH_WHEEL and IPEX_WHEEL arg values Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix PYT resnet50 quickstart scripts for both Linux and Windows (#547) * fix quickstart scripts, detect platform type, update to run with pytorch only * Fix SPR PyTorch MaskRCNN inference documentation for CHECKPOINT_DIR (#548) * Enable bert large multi stream inference (#554) * test bert multi stream module * enable input split and output concat for accuracy run * change the default num_streams batchsize cores to 56 * change ssd multi stream throughput to 1 core 1 batch * change the default parameter for rn50 ssd multi stream module * modify enable_ipex_for_squad.diff to align new multistream hint implementation * enable warmup and multi socket support * change default parameter for rn50 ssd multi stream inference * Add train-no-eval for rn50 pytorch (#555) * PyTorch SPR BERT large training updates (h5py and dataset instructions) and update LD_PRELOAD for SPR entrypoints (#550) * Add h5py install to bert training dockerfile * documentation updates * update docs, and add input_preprocessing to the wrapper package * Update LD_PRELOAD trailing : * Fix syntax * removing unnecessary change * Update DLRM entrypoint * Update docs to note that phase2 has bert_config.json in the CHECKPOINT_DIR * Fix syntax * increase shm-size to 10g * [RNN-T training] Update scripts -- run on 1S (#561) * Update maskrcnn training script to run on 1s (#562) * use single node to do ssd-rn34 training (#563) * Update training.sh (#564) * Update training.sh (#565) Use tcmalloc instead of jemalloc * use single node to do resnet50 training (#568) * add numactl -C and remove jit warm in main thread (#569) * Update unit-test.yml (#546) * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Fixed make command, updated pip install. Fixed make command to run from the root directory. Replaced pip install tox with a pip install -r requirements-tests.txt to install all dependencies for the tests. * Add tox to test dependencies. Added tox to the dependencies so that the Workflow and others may install it with pip install -r requirements-test.txt and be covered for running make lint and make unit-test. * Update unit-test.yml Changed 'make unit-test' to 'make unit_test' as that is the actual target defined in the Makefile. * Update unit-test.yml Changed apt-get install command. * re-enable int8 for api change (#579) * saperate fully convergency test from training test (#581) Co-authored-by: jianan-gu <[email protected]> * ssd enable new int8 (#580) * v1 * enable new int8 method * Revert "ssd enable new int8 (#580)" (#584) This reverts commit 9eb3211. * Revert "re-enable int8 for api change (#579)" (#583) This reverts commit 0bded92. * Update training script using 1s (#560) * Enable checkpoint during training for bert-large (#573) * minor fix * Add readme for enabling checkpoint * update phase1 to enable checkpoint by default * Update README.md * Enable ssd bf32 inference training (#589) * enable ssd bf32 inference * enable ssd bf32 train * enable RNN-T bf32 inference (#591) * Enable bf32 for bert and distilbert for inference (#593) * enable bf32 distilbert * enable bert bf32 * Enable RNN-T bf32 training (#594) * enable maskrcnn bf32 inference and training (#595) * enable resnet50 and resnext101 bf16 path (#596) * enable bert bf32 train (#600) * update resnet int8 path using new int8 api (#603) * re-enable int8 for api change (#604) Co-authored-by: jianan-gu <[email protected]> * Leslie/ssd enable new int8 (#605) * v1 * enable new int8 method * update json file * add rn50 int8 weight sharing Co-authored-by: Jiang, Xiaofei <[email protected]> * update ssd training bs to the multily of core numbers (#606) * enable bf32 for dlrm (#607) Co-authored-by: jianan-gu <[email protected]> * Update IPEX new int8 API enabling for distilbert/bert-large (#608) * enable distilbert * enable bert * fix max-ind-range and add memory info (#609) Co-authored-by: jianan-gu <[email protected]> * Remove debug code (#610) * update training steps (#611) * fix bandit scan fails (#612) * PYT Image recognition models support on Windows (#549) * fix all image recognition scripts to run on windows and linux with PYT, and only linux with IPEX * [RNN-T training] fix bandit scan fails (#614) * RNN-T inference: fix IMZ Bandit scan fails (#615) * Update unit-test.yml (#570) Changed the docker user credential to utilize GitHub Secret. * MaskRCNN: fix IMZ Bandit scan fails (#623) * Fix for horovod-related failures in TF nightly runs (#613) * cpp17 horovod failure fix * minor debugging changes * minor fixes - directory name * cleanup * addressing reviewer comments * Minor fix for Horovod install and adding 'tf_slim' for SSD ResNet34 (#624) * Minor fix for Horovod install and adding 'tf_slim' for SSD ResNet34 Signed-off-by: Abolfazl Shahbazi <[email protected]> * Set 'HOROVOD_WITH_MPI=1' explicitly Signed-off-by: Abolfazl Shahbazi <[email protected]> * update GCC version to GCC 9 Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add 'horovodrun --check-build' for sanity check Signed-off-by: Abolfazl Shahbazi <[email protected]> * removo force install inside Docker Signed-off-by: Abolfazl Shahbazi <[email protected]> * [RNN-T training] Fix ddp sample number issue (#625) * update BF32 usage (#627) * resnet50 training: add warm up before collecting time (#628) * image to bf16 (#629) * Update end2end DLSA dockerfile due to SPR wheel path update and removing int8 patch (#631) * Update mlpc path for SPR wheels * remove patch * Update Horovod commit id for BareMetal, Docker will be updated next (#630) Signed-off-by: Abolfazl Shahbazi <[email protected]> * fix dlrm convergence and change training performance BS to 32K (#633) Co-authored-by: jianan-gu <[email protected]> * [RNN-T training] Merge sh files to one (#635) * update torch-ccl into 1.12 (#636) * Liangan1/update torch ccl version (#637) * Update torch_ccl version * resnet50_distributed_training: don't set MASTER_ADDR by user (#638) * Update torch_ccl in script (#639) * Enable offline download distilbert (#632) * enable offline download distilbert * add convert * Update README.md * add accuracy.py * add file * refine download * refine path * refine path * add license * Update dlrm_s_pytorch.py (#643) * Update README.md (#649) * init pytorch T5 language model (#648) * init pytorch T5 language model * update README.md * update doc * update fpn models (#650) * pytorch resnet50: directly call ipex.quantization (#653) * fix int8 accuracy (#655) Co-authored-by: Zhang, Weizhuo <[email protected]> * Made fixes to the broken links (#652) * Made fixes to the broken links * Changed the ResNet50v1_5 version back to v2_7_0 * Modified the setup AI kit instructions Co-authored-by: msalopan <[email protected]> * Update Security Center URL (#657) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Weizhuoz/fix for pt 1.12 (#656) * fix vgg11_bn accuracy syntax error * remove exact_match from roberta-base * modify maskrcnn BS to 2*num_cores * Update dlrm_s_pytorch.py (#660) * Update dlrm_s_pytorch.py Reduce int8 memory usage. * Update dlrm_s_pytorch.py * Update dlrm_s_pytorch.py * Update dlrm_s_pytorch.py * Update dlrm_s_pytorch.py * Add BF32 DDP for bert-large (#663) * Update run_ddp_bert_pretrain_phase1.sh * Update run_ddp_bert_pretrain_phase2.sh * Update README.md * move OMP_NUM_THREADS=1 into dlrm_s_pytorch.py (#664) minor changes * remove rn50 ao (#665) * Re-organize models list to be grouped by framework (#654) * re-organize models list to be grouped by framework * update tensorflow ssd-resnet34 training dataset * add T5 in benchmark/README.md * mannuel set torch num threads only for int8 (#666) * Update inference_performance.sh (#669) * improve ssdrn34 perf. (#671) * improve ssdrn34 perf. * minor update. * Fix linting Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix unit tests too Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> * update py version in base spec (#678) * TF addons upgrade to 0.17.1 (#689) * updated tf adons version * remove comment * Sriniva2/ssd rn34 (#682) * improve ssdrn34 perf. * minor update. * enabling synthetic data. * Update base_benchmark_util.py * Fix linting error Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> * Update Dockerfiles prior to IMZ 2.8 release (#693) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update Documents prior to IMZ 2.8 release (#694) Signed-off-by: Abolfazl Shahbazi <[email protected]> * add support for open SUSE leap operating system (#708) (#715) * updated tpps (#725) * remove tf bert int8 from main readmes, model is not supported in this release. (#743) * Adding Scipy for TensorFlow serving SSD-MobileNet model (#764) (#766) Signed-off-by: Abolfazl Shahbazi <[email protected]> Signed-off-by: Abolfazl Shahbazi <[email protected]> * remove .github Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: leslie-fang-intel <[email protected]> Co-authored-by: Dina Suehiro Jones <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: XiaobingZhang <[email protected]> Co-authored-by: Xiaoming (Jason) Cui <[email protected]> Co-authored-by: jiayisunx <[email protected]> Co-authored-by: Melanie Buehler <[email protected]> Co-authored-by: Srini511 <[email protected]> Co-authored-by: Sean-Michael Riesterer <[email protected]> Co-authored-by: jianan-gu <[email protected]> Co-authored-by: Chunyuan WU <[email protected]> Co-authored-by: zhuhaozhe <[email protected]> Co-authored-by: Wang, Chuanqi <[email protected]> Co-authored-by: YanbingJiang <[email protected]> Co-authored-by: Weizhuo Zhang <[email protected]> Co-authored-by: xiaofeij <[email protected]> Co-authored-by: liangan1 <[email protected]> Co-authored-by: blzheng <[email protected]> Co-authored-by: Om Thakkar <[email protected]> Co-authored-by: mahathis <[email protected]> Co-authored-by: msalopan <[email protected]> Co-authored-by: Jitendra Patil <[email protected]>

* revert bf16 changes (#488) * Add partials and spec yml for the end2end DLSA pipeline (#460) * Add partials and specs for the end2end DLSA pipeline * Add missing end line * Update name to include ipex * update specs to have use the public image as a base on one and SPR for the other * Dockerfile updates for the updated DLSA repo * Update pip install list * Rename to public * Removing partials that aren't used anymore * Fixes for 'kmp-blocktime' env var (#493) * Fixes for 'kmp-blocktime' env var Signed-off-by: Abolfazl Shahbazi <[email protected]> * update per review feedback Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add 'kmp-blocktime' for mlperf-gnmt (#494) * Add 'kmp-blocktime' for mlperf-gnmt Signed-off-by: Abolfazl Shahbazi <[email protected]> * Remove duplicate parameter definition Signed-off-by: Abolfazl Shahbazi <[email protected]> * add sample_input for resnet50 training (#495) * remove the case when fragment_size not equal args.batch_size (#500) * Changed the transformer_mlperf fp32 model so that we can fuse the ops… (#389) * Changed the transformer_mlperf fp32 model so that we can fuse the ops in the model, and also minor changes for python3 * Changed the transformer_mlperf int8 model so that we can fuse the ops in the model, and also minor changes for python3 * SPR updates for WW12, 2022 (#492) * SPR updates for WW12, 2022 Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update for PyTorch SPR WW2022-12 Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update pytorch base for SPR too Signed-off-by: Abolfazl Shahbazi <[email protected]> * Stick with specific 'keras-nightly' version Signed-off-by: Abolfazl Shahbazi <[email protected]> * Updates per code review Signed-off-by: Abolfazl Shahbazi <[email protected]> * update maskrcnn training_multinode.sh (#502) * Fixed a bug in the transformer_mlperf model threads setting (#482) * Fixed a bug in the transformer_mlperf model threads setting * Fix failing tests Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> * Added the default threads setting for transformer_mlperf inference in… (#504) * Added the default threads setting for transformer_mlperf inference in case there is no command line input * Fix unit tests Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> * PyTorch Image Classification TL notebook (#490) * Adds new TL notebook with documentation * Added newline * Added to main TL README * Small fixes * Updated for review feedback * Added more models and a download limit arg * Removed py3.9 requirement and changed default model * Adds Kitti torchvision dataset to TL notebook (#512) * Adds Kitti torchvision dataset to TL notebook * Fixed citations formatting * update maskrcnn model (#515) * minor update. (#465) * Create unit-test github action workflow (#518) * Create unit-test github action workflow * Update to Horovod commit 11c1389 to fix TF v2.9 + Horovod install failure (#519) Signed-off-by: Abolfazl Shahbazi <[email protected]> * update distilbert model to 4.18 transformers and enable int8 path (#521) * rnnt: use launcher to set output file path and name (#524) * Update BareMetalSetup.md (#526) Always use the latest torchvision * Reduce memory usage for dlrm acc test (#527) * updatedistilbert with text_classification (#529) * add patch for distilbert (#530) * Update the model-builder dockerfile to use ubuntu 20.04 (#532) * Add script for coco training dataset processing (#525) * and update tensorflow ssd-resnet34 training dataset instructions * update patch (#533) Co-authored-by: Wang, Chuanqi <[email protected]> * [RNN-T training] Enable FP32 gemm using oneDNN (#531) * Update the Readme guide for distilbert (#534) * Update the Readme guide for distilbert * Fix accuracy grep bug, and grep accuracy for distilbert Co-authored-by: Weizhuo Zhang <[email protected]> * Update end2end public dockerfile to look for IPEX in the conda directory (#535) * Notebook to script conversion example (#516) * Add notebook script conversion example * Fixed doc * Replaces custom preprocessor with built-in one * Changed tag to remove_for_custom_dataset * Add URL check prior to calling urlretrieve (#538) * Add URL check prior to calling urlretrieve Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a typo Signed-off-by: Abolfazl Shahbazi <[email protected]> * disable for ssd since fused cat cat kernel is slow (#537) * fix bug when adding steps in rnnt inference (#528) * Fix and updates for TensorFlow WW18-2022 SPR (#542) * Fix and updates for TensorFlow WW18-2022 SPR Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix TensorFlow SPR nightly versions Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update pre-trained models download URLs Signed-off-by: Abolfazl Shahbazi <[email protected]> * Intall Python 3.8 development tools Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix OpenMPI install and setup Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update to Horovod commit 11c1389 to fix TF v2.9 + Horovod install failure (#519) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Horovod Installaion for SPR and CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Python3.8 version for CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a typo in TensorFlow 3d-unet partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a broken partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add TCMalloc to TF base container for SPR and remove OpenSSL Signed-off-by: Abolfazl Shahbazi <[email protected]> * Remove some repositories Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add 'matplotlib' for '3d-unet' Signed-off-by: Abolfazl Shahbazi <[email protected]> * switch to build OpenMPI due to issue in Market Place provided version Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix PYTORCH_WHEEL and IPEX_WHEEL arg values Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix and updates for PyTorch WW14-2022 SPR (#543) * Fix and updates for PyTorch WW14-2022 SPR Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix and updates for TensorFlow WW18-2022 SPR Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix TensorFlow SPR nightly versions Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update pre-trained models download URLs Signed-off-by: Abolfazl Shahbazi <[email protected]> * Intall Python 3.8 development tools Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix OpenMPI install and setup Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update to Horovod commit 11c1389 to fix TF v2.9 + Horovod install failure (#519) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Horovod Installaion for SPR and CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Python3.8 version for CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a typo in TensorFlow 3d-unet partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a broken partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add TCMalloc to TF base container for SPR and remove OpenSSL Signed-off-by: Abolfazl Shahbazi <[email protected]> * Updates required to the base image Signed-off-by: Abolfazl Shahbazi <[email protected]> * Remove some repositories Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add 'matplotlib' for '3d-unet' Signed-off-by: Abolfazl Shahbazi <[email protected]> * switch to build OpenMPI due to issue in Market Place provided version Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix PYTORCH_WHEEL and IPEX_WHEEL arg values Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix PYT resnet50 quickstart scripts for both Linux and Windows (#547) * fix quickstart scripts, detect platform type, update to run with pytorch only * Fix SPR PyTorch MaskRCNN inference documentation for CHECKPOINT_DIR (#548) * Enable bert large multi stream inference (#554) * test bert multi stream module * enable input split and output concat for accuracy run * change the default num_streams batchsize cores to 56 * change ssd multi stream throughput to 1 core 1 batch * change the default parameter for rn50 ssd multi stream module * modify enable_ipex_for_squad.diff to align new multistream hint implementation * enable warmup and multi socket support * change default parameter for rn50 ssd multi stream inference * Add train-no-eval for rn50 pytorch (#555) * PyTorch SPR BERT large training updates (h5py and dataset instructions) and update LD_PRELOAD for SPR entrypoints (#550) * Add h5py install to bert training dockerfile * documentation updates * update docs, and add input_preprocessing to the wrapper package * Update LD_PRELOAD trailing : * Fix syntax * removing unnecessary change * Update DLRM entrypoint * Update docs to note that phase2 has bert_config.json in the CHECKPOINT_DIR * Fix syntax * increase shm-size to 10g * [RNN-T training] Update scripts -- run on 1S (#561) * Update maskrcnn training script to run on 1s (#562) * use single node to do ssd-rn34 training (#563) * Update training.sh (#564) * Update training.sh (#565) Use tcmalloc instead of jemalloc * use single node to do resnet50 training (#568) * add numactl -C and remove jit warm in main thread (#569) * Update unit-test.yml (#546) * re-enable int8 for api change (#579) * saperate fully convergency test from training test (#581) Co-authored-by: jianan-gu <[email protected]> * ssd enable new int8 (#580) * v1 * enable new int8 method * Revert "ssd enable new int8 (#580)" (#584) This reverts commit 9eb3211. * Revert "re-enable int8 for api change (#579)" (#583) This reverts commit 0bded92. * Update training script using 1s (#560) * Enable checkpoint during training for bert-large (#573) * minor fix * Add readme for enabling checkpoint * update phase1 to enable checkpoint by default * Update README.md * Enable ssd bf32 inference training (#589) * enable ssd bf32 inference * enable ssd bf32 train * enable RNN-T bf32 inference (#591) * Enable bf32 for bert and distilbert for inference (#593) * enable bf32 distilbert * enable bert bf32 * Enable RNN-T bf32 training (#594) * enable maskrcnn bf32 inference and training (#595) * enable resnet50 and resnext101 bf16 path (#596) * enable bert bf32 train (#600) * update resnet int8 path using new int8 api (#603) * re-enable int8 for api change (#604) Co-authored-by: jianan-gu <[email protected]> * Leslie/ssd enable new int8 (#605) * v1 * enable new int8 method * update json file * add rn50 int8 weight sharing Co-authored-by: Jiang, Xiaofei <[email protected]> * update ssd training bs to the multily of core numbers (#606) * enable bf32 for dlrm (#607) Co-authored-by: jianan-gu <[email protected]> * Update IPEX new int8 API enabling for distilbert/bert-large (#608) * enable distilbert * enable bert * fix max-ind-range and add memory info (#609) Co-authored-by: jianan-gu <[email protected]> * Remove debug code (#610) * update training steps (#611) * fix bandit scan fails (#612) * PYT Image recognition models support on Windows (#549) * fix all image recognition scripts to run on windows and linux with PYT, and only linux with IPEX * [RNN-T training] fix bandit scan fails (#614) * RNN-T inference: fix IMZ Bandit scan fails (#615) * Update unit-test.yml (#570) * MaskRCNN: fix IMZ Bandit scan fails (#623) * Fix for horovod-related failures in TF nightly runs (#613) * cpp17 horovod failure fix * minor debugging changes * minor fixes - directory name * cleanup * addressing reviewer comments * Minor fix for Horovod install and adding 'tf_slim' for SSD ResNet34 (#624) * Minor fix for Horovod install and adding 'tf_slim' for SSD ResNet34 Signed-off-by: Abolfazl Shahbazi <[email protected]> * Set 'HOROVOD_WITH_MPI=1' explicitly Signed-off-by: Abolfazl Shahbazi <[email protected]> * update GCC version to GCC 9 Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add 'horovodrun --check-build' for sanity check Signed-off-by: Abolfazl Shahbazi <[email protected]> * removo force install inside Docker Signed-off-by: Abolfazl Shahbazi <[email protected]> * [RNN-T training] Fix ddp sample number issue (#625) * update BF32 usage (#627) * resnet50 training: add warm up before collecting time (#628) * image to bf16 (#629) * Update end2end DLSA dockerfile due to SPR wheel path update and removing int8 patch (#631) * Update mlpc path for SPR wheels * remove patch * Update Horovod commit id for BareMetal, Docker will be updated next (#630) Signed-off-by: Abolfazl Shahbazi <[email protected]> * fix dlrm convergence and change training performance BS to 32K (#633) Co-authored-by: jianan-gu <[email protected]> * [RNN-T training] Merge sh files to one (#635) * update torch-ccl into 1.12 (#636) * Liangan1/update torch ccl version (#637) * Update torch_ccl version * resnet50_distributed_training: don't set MASTER_ADDR by user (#638) * Update torch_ccl in script (#639) * Enable offline download distilbert (#632) * enable offline download distilbert * add convert * Update README.md * add accuracy.py * add file * refine download * refine path * refine path * add license * Update dlrm_s_pytorch.py (#643) * Update README.md (#649) * init pytorch T5 language model (#648) * init pytorch T5 language model * update README.md * update doc * update fpn models (#650) * pytorch resnet50: directly call ipex.quantization (#653) * fix int8 accuracy (#655) Co-authored-by: Zhang, Weizhuo <[email protected]> * Made fixes to the broken links (#652) * Made fixes to the broken links * Changed the ResNet50v1_5 version back to v2_7_0 * Modified the setup AI kit instructions Co-authored-by: msalopan <[email protected]> * Update Security Center URL (#657) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Weizhuoz/fix for pt 1.12 (#656) * fix vgg11_bn accuracy syntax error * remove exact_match from roberta-base * modify maskrcnn BS to 2*num_cores * Update dlrm_s_pytorch.py (#660) * Update dlrm_s_pytorch.py Reduce int8 memory usage. * Update dlrm_s_pytorch.py * Update dlrm_s_pytorch.py * Update dlrm_s_pytorch.py * Update dlrm_s_pytorch.py * Add BF32 DDP for bert-large (#663) * Update run_ddp_bert_pretrain_phase1.sh * Update run_ddp_bert_pretrain_phase2.sh * Update README.md * move OMP_NUM_THREADS=1 into dlrm_s_pytorch.py (#664) minor changes * remove rn50 ao (#665) * Re-organize models list to be grouped by framework (#654) * re-organize models list to be grouped by framework * update tensorflow ssd-resnet34 training dataset * add T5 in benchmark/README.md * mannuel set torch num threads only for int8 (#666) * Update inference_performance.sh (#669) * improve ssdrn34 perf. (#671) * improve ssdrn34 perf. * minor update. * Fix linting Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix unit tests too Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> * Use IPEX Pytorch whls instead of building IPEX from source (#674) Co-authored-by: Clayne Robison <[email protected]> * Lpot2inc (#446) Co-authored-by: ltsai1 <[email protected]> * Sriniva2/ssd rn34 (#682) * improve ssdrn34 perf. * minor update. * enabling synthetic data. * Update base_benchmark_util.py * Fix linting error Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> * Add doc updates for '--synthetic-data' option (#683) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Change checkpoint setting for Bert train phase 1 (#602) * Change checkpoint setting for Bert train phase 1 * fix model and config saving * fix error when runing gpu path (#686) * fix load pretrained model error when using torch_ccl (#688) * update py version in base spec (#678) (#690) * TF addons upgrade to 0.17.1 (#689) (#691) * updated tf adons version * remove comment * Update Dockerfiles prior to IMZ 2.8 release (#693) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update Documents prior to IMZ 2.8 release (#694) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update README.md (#697) * change numpy version requirement (#703) * Remove MiniGo training from IMZ (#644) * remove MiniGo training scripts and unit test * [RNN-T] [Inference] optimize the batch decoder (#711) * reduce fill_ OP in rnnt embedding kernel * optimize add between int and log to reduce dtype conversion * rnnt: support dump tracing file and print profile table (#712) * add support for open SUSE leap operating system (#708) * rnnt inference: pre convert data to bf16 (#713) * remove squeeze/slice/transpose (#714) * update resnet50 training code (#710) * update resnet50 training code * not using ipex optimize for resnet50 training * use ipex.optimize() on the whole model (#718) * resnet50 bf32: calling ipex.optimize to enable bf32 path (#719) * Added batch size as an env variable to the quickstart scripts (#676) Co-authored-by: Clayne Robison <[email protected]> * Added batchsize as an env variable to quickstart scripts (#680) * updated readme: nit fix (#723) Co-authored-by: Rahul Nair <[email protected]> * compute throughput by test_mini_batch_size (#740) * pytorch resnet50: fix bf32 training path error (#739) * Fix a subtle 'E275' style issue that causes unknown behavior (#742) Signed-off-by: Abolfazl Shahbazi <[email protected]> Signed-off-by: Abolfazl Shahbazi <[email protected]> * rearrange the paragraphs and fix Markdown headers (#744) * Align Transformers version for BERT models (#738) * align transformer version(4.18) for bert models * change scripts to legacy * redo calibration * patch fix * Update README.md (#746) * Add support for stock PYT- object detection models (#732) * stock PYT and windows support for object detection models * Weizhuoz/reduce model zoo steps (#762) * reduce steps for bert-base, roberta, fpn models * modify max_iter for fpn models * reduce all img classification models steps * update new config for bert models (#763) * Addin Scipy for TensorFlow serving SSD-MobileNet model (#764) Signed-off-by: Abolfazl Shahbazi <[email protected]> Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update TF ResNet50v1.5 inference for SPR (baremetal) (#749) * Added matplotlib dependency to image_segmentation requirements (#768) * Update readmes for the path to output directory (#769) * update wide & deep readme for the path to pretrained model directory (#771) * add a check for ubuntu 22.04 support (#721) * Changes to add bfloat16 support for DIEN training (#679) * Changes to add bfloat16 support for DIEN training * Some for for reporting performance * Fixes for dien training and unit tests * updated tpp file withr2.8 approvals (#773) * Add Windows stock PyTorch support for TransNet v2 (#779) * update TransNet v2 to work with stock pytorch * update Windows.md path in all relevant docs * add P99 metric for LZ models (#780) Co-authored-by: Weizhuo Zhang <[email protected]> * Rn50 training multiple epoches output 1 KPI and add training_steps argument. (#775) * enable --training_steps and 1 training KPI output with multiple epoches * add prefix * update print freq * fix display bug * enable PyTorch resnet50 fp16 path (#783) * enable PyTorch resnet50 fp16 path * fix conflict * Extract p99 metric from log to summary (#784) * enable fp16 bert train and inference (#782) * Vruddarr/pt update windows readmes (#778) * remove bfloat16 experimental support note (#786) * Update IPEX installation path (#788) * Clean up _pycache_ files, remove symlinks, and add license headers for dien training bf16 (#787) * update readme for jemalloc and iomp path (#789) * update readme for jemalloc and iomp path * Updated IOMP path as path to the intel-openmp directory * PyTorch: fix resnext101 running script (#795) * Update 3dunet mlperf bash scripts and README (#797) * update 3dunet mlperf doc to use quickstart scripts, rename quickstart scripts for multi-instance * fix tests job (#803) * rnnt inference: align replace lstm API due to IPEX change (#802) * Adding quick start scripts to MobileNetV1 bfloat16 precision (#793) * Adding quick start scripts to ssd-mobilenet bfloat16 precision (#798) * Update T5 model with windows quick start scripts (#790) * Update T5 model with windows quick start scripts * Updated Readme by specifying values to environment variables * Update inference int8 readme and script of 4 CV models using INC (#698) * update docs to add INC int8 models as an option * add instructions for how to quantize a fp32 model using INC * rnnt: fix stft due to PyTorch API change (#811) * rnnt training: fix stft due to PyTorch API change (#813) * Update BareMetalSetup.md (#817) * Gerardod/build container (#807) First phase of GHA WF to build the image of a Model Zoo workload container and push it to CAAS. * Sharvils/tf workload (#808) * TFv2.10 support added. Horovod version updated. * Vruddarr/tf add language translation bert fp32 quick start scripts (#804) * Adding quick start scripts to language translation BERT FP32 model * Updated TL notebooks for SPR Launch (#810) * Updates for TL PyTorch notebook * Edits for two more TL notebooks * Reverting previous change for virtualenv * Removed --no-deps and some nonexistent links * Added TFHub cache dir * Updated TL notebook README for legal/branding * Update typo in Readme (#821) Co-authored-by: veena.mounika.ruddarraju <[email protected]> * PyTorch: using ipex.optimize for bf16 training (#824) * Fix CVEs for Pillow and notebook packages (#831) Signed-off-by: Abolfazl Shahbazi <[email protected]> Signed-off-by: Abolfazl Shahbazi <[email protected]> * add intel-alphafold2 optimized w/ IPEX from realm of AIDD (#737) * add alphafold2 from AIDD realm * Remove unused variable in mlperf 3DUnet performance run (#832) * Update Model Zoo name, Python version and message for IPEX (#833) * Update instruction for Miniconda, Jemalloc, PyTorch and IPEX and updt… (#830) * Update models main tables (#836) *update main readmes * Adding jemalloc instructions and environment variables (#838) * Add support for dGPU models (#840) * add support for dGPU support * remove spr dockerfiles and spec files (#842) * delete links to 3dunet mlperf and bert large int8 (#841) * update tbb files (#843) * fix vulnerability issues reported by snyk scans (#848) * update for new precision (#849) * upgrade for ipex 1.13 * delete workflows Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: leslie-fang-intel <[email protected]> Co-authored-by: Dina Suehiro Jones <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: XiaobingZhang <[email protected]> Co-authored-by: Xiaoming (Jason) Cui <[email protected]> Co-authored-by: jiayisunx <[email protected]> Co-authored-by: Melanie Buehler <[email protected]> Co-authored-by: Srini511 <[email protected]> Co-authored-by: Sean-Michael Riesterer <[email protected]> Co-authored-by: jianan-gu <[email protected]> Co-authored-by: Chunyuan WU <[email protected]> Co-authored-by: zhuhaozhe <[email protected]> Co-authored-by: Wang, Chuanqi <[email protected]> Co-authored-by: YanbingJiang <[email protected]> Co-authored-by: Weizhuo Zhang <[email protected]> Co-authored-by: xiaofeij <[email protected]> Co-authored-by: liangan1 <[email protected]> Co-authored-by: blzheng <[email protected]> Co-authored-by: Om Thakkar <[email protected]> Co-authored-by: mahathis <[email protected]> Co-authored-by: Clayne Robison <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: Neo Zhang Jianyu <[email protected]> Co-authored-by: ltsai1 <[email protected]> Co-authored-by: Jitendra Patil <[email protected]> Co-authored-by: Kanvi Khanna <[email protected]> Co-authored-by: Rahul Nair <[email protected]> Co-authored-by: Veena2207 <[email protected]> Co-authored-by: jojivk-intel-nervana <[email protected]> Co-authored-by: xiangdong <[email protected]> Co-authored-by: Huang, Zhiwei <[email protected]> Co-authored-by: gera-aldama <[email protected]> Co-authored-by: Sharvil Shah <[email protected]> Co-authored-by: wyang2 <[email protected]> Co-authored-by: Yimei Sun <[email protected]>

Signed-off-by: Abolfazl Shahbazi <[email protected]> Signed-off-by: Abolfazl Shahbazi <[email protected]>

* Update Pillow to '>=9.3.0' (#884) Signed-off-by: Abolfazl Shahbazi <[email protected]> Signed-off-by: Abolfazl Shahbazi <[email protected]> * remove supported OS checks (#926) * Remove Linux/windows OS platform support checks (#927) * upgrade Pillow version for Yolov4 Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]>

…odels.intel-models

* rnnt: use launcher to set output file path and name (#524) * Update BareMetalSetup.md (#526) Always use the latest torchvision * Reduce memory usage for dlrm acc test (#527) * updatedistilbert with text_classification (#529) * add patch for distilbert (#530) * Update the model-builder dockerfile to use ubuntu 20.04 (#532) * Add script for coco training dataset processing (#525) * and update tensorflow ssd-resnet34 training dataset instructions * update patch (#533) Co-authored-by: Wang, Chuanqi <[email protected]> * [RNN-T training] Enable FP32 gemm using oneDNN (#531) * Update the Readme guide for distilbert (#534) * Update the Readme guide for distilbert * Fix accuracy grep bug, and grep accuracy for distilbert Co-authored-by: Weizhuo Zhang <[email protected]> * Update end2end public dockerfile to look for IPEX in the conda directory (#535) * Notebook to script conversion example (#516) * Add notebook script conversion example * Fixed doc * Replaces custom preprocessor with built-in one * Changed tag to remove_for_custom_dataset * Add URL check prior to calling urlretrieve (#538) * Add URL check prior to calling urlretrieve Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a typo Signed-off-by: Abolfazl Shahbazi <[email protected]> * disable for ssd since fused cat cat kernel is slow (#537) * fix bug when adding steps in rnnt inference (#528) * Fix and updates for TensorFlow WW18-2022 SPR (#542) * Fix and updates for TensorFlow WW18-2022 SPR Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix TensorFlow SPR nightly versions Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update pre-trained models download URLs Signed-off-by: Abolfazl Shahbazi <[email protected]> * Intall Python 3.8 development tools Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix OpenMPI install and setup Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update to Horovod commit 11c1389 to fix TF v2.9 + Horovod install failure (#519) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Horovod Installaion for SPR and CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Python3.8 version for CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a typo in TensorFlow 3d-unet partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a broken partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add TCMalloc to TF base container for SPR and remove OpenSSL Signed-off-by: Abolfazl Shahbazi <[email protected]> * Remove some repositories Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add 'matplotlib' for '3d-unet' Signed-off-by: Abolfazl Shahbazi <[email protected]> * switch to build OpenMPI due to issue in Market Place provided version Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix PYTORCH_WHEEL and IPEX_WHEEL arg values Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix and updates for PyTorch WW14-2022 SPR (#543) * Fix and updates for PyTorch WW14-2022 SPR Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix and updates for TensorFlow WW18-2022 SPR Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix TensorFlow SPR nightly versions Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update pre-trained models download URLs Signed-off-by: Abolfazl Shahbazi <[email protected]> * Intall Python 3.8 development tools Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix OpenMPI install and setup Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update to Horovod commit 11c1389 to fix TF v2.9 + Horovod install failure (#519) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Horovod Installaion for SPR and CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Python3.8 version for CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a typo in TensorFlow 3d-unet partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a broken partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add TCMalloc to TF base container for SPR and remove OpenSSL Signed-off-by: Abolfazl Shahbazi <[email protected]> * Updates required to the base image Signed-off-by: Abolfazl Shahbazi <[email protected]> * Remove some repositories Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add 'matplotlib' for '3d-unet' Signed-off-by: Abolfazl Shahbazi <[email protected]> * switch to build OpenMPI due to issue in Market Place provided version Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix PYTORCH_WHEEL and IPEX_WHEEL arg values Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix PYT resnet50 quickstart scripts for both Linux and Windows (#547) * fix quickstart scripts, detect platform type, update to run with pytorch only * Fix SPR PyTorch MaskRCNN inference documentation for CHECKPOINT_DIR (#548) * Enable bert large multi stream inference (#554) * test bert multi stream module * enable input split and output concat for accuracy run * change the default num_streams batchsize cores to 56 * change ssd multi stream throughput to 1 core 1 batch * change the default parameter for rn50 ssd multi stream module * modify enable_ipex_for_squad.diff to align new multistream hint implementation * enable warmup and multi socket support * change default parameter for rn50 ssd multi stream inference * Add train-no-eval for rn50 pytorch (#555) * PyTorch SPR BERT large training updates (h5py and dataset instructions) and update LD_PRELOAD for SPR entrypoints (#550) * Add h5py install to bert training dockerfile * documentation updates * update docs, and add input_preprocessing to the wrapper package * Update LD_PRELOAD trailing : * Fix syntax * removing unnecessary change * Update DLRM entrypoint * Update docs to note that phase2 has bert_config.json in the CHECKPOINT_DIR * Fix syntax * increase shm-size to 10g * [RNN-T training] Update scripts -- run on 1S (#561) * Update maskrcnn training script to run on 1s (#562) * use single node to do ssd-rn34 training (#563) * Update training.sh (#564) * Update training.sh (#565) Use tcmalloc instead of jemalloc * use single node to do resnet50 training (#568) * add numactl -C and remove jit warm in main thread (#569) * Update unit-test.yml (#546) * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Fixed make command, updated pip install. Fixed make command to run from the root directory. Replaced pip install tox with a pip install -r requirements-tests.txt to install all dependencies for the tests. * Add tox to test dependencies. Added tox to the dependencies so that the Workflow and others may install it with pip install -r requirements-test.txt and be covered for running make lint and make unit-test. * Update unit-test.yml Changed 'make unit-test' to 'make unit_test' as that is the actual target defined in the Makefile. * Update unit-test.yml Changed apt-get install command. * re-enable int8 for api change (#579) * saperate fully convergency test from training test (#581) Co-authored-by: jianan-gu <[email protected]> * ssd enable new int8 (#580) * v1 * enable new int8 method * Revert "ssd enable new int8 (#580)" (#584) This reverts commit 9eb3211. * Revert "re-enable int8 for api change (#579)" (#583) This reverts commit 0bded92. * Update training script using 1s (#560) * Enable checkpoint during training for bert-large (#573) * minor fix * Add readme for enabling checkpoint * update phase1 to enable checkpoint by default * Update README.md * Enable ssd bf32 inference training (#589) * enable ssd bf32 inference * enable ssd bf32 train * enable RNN-T bf32 inference (#591) * Enable bf32 for bert and distilbert for inference (#593) * enable bf32 distilbert * enable bert bf32 * Enable RNN-T bf32 training (#594) * enable maskrcnn bf32 inference and training (#595) * enable resnet50 and resnext101 bf16 path (#596) * enable bert bf32 train (#600) * update resnet int8 path using new int8 api (#603) * re-enable int8 for api change (#604) Co-authored-by: jianan-gu <[email protected]> * Leslie/ssd enable new int8 (#605) * v1 * enable new int8 method * update json file * add rn50 int8 weight sharing Co-authored-by: Jiang, Xiaofei <[email protected]> * update ssd training bs to the multily of core numbers (#606) * enable bf32 for dlrm (#607) Co-authored-by: jianan-gu <[email protected]> * Update IPEX new int8 API enabling for distilbert/bert-large (#608) * enable distilbert * enable bert * fix max-ind-range and add memory info (#609) Co-authored-by: jianan-gu <[email protected]> * Remove debug code (#610) * update training steps (#611) * fix bandit scan fails (#612) * PYT Image recognition models support on Windows (#549) * fix all image recognition scripts to run on windows and linux with PYT, and only linux with IPEX * [RNN-T training] fix bandit scan fails (#614) * RNN-T inference: fix IMZ Bandit scan fails (#615) * Update unit-test.yml (#570) Changed the docker user credential to utilize GitHub Secret. * MaskRCNN: fix IMZ Bandit scan fails (#623) * Fix for horovod-related failures in TF nightly runs (#613) * cpp17 horovod failure fix * minor debugging changes * minor fixes - directory name * cleanup * addressing reviewer comments * Minor fix for Horovod install and adding 'tf_slim' for SSD ResNet34 (#624) * Minor fix for Horovod install and adding 'tf_slim' for SSD ResNet34 Signed-off-by: Abolfazl Shahbazi <[email protected]> * Set 'HOROVOD_WITH_MPI=1' explicitly Signed-off-by: Abolfazl Shahbazi <[email protected]> * update GCC version to GCC 9 Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add 'horovodrun --check-build' for sanity check Signed-off-by: Abolfazl Shahbazi <[email protected]> * removo force install inside Docker Signed-off-by: Abolfazl Shahbazi <[email protected]> * [RNN-T training] Fix ddp sample number issue (#625) * update BF32 usage (#627) * resnet50 training: add warm up before collecting time (#628) * image to bf16 (#629) * Update end2end DLSA dockerfile due to SPR wheel path update and removing int8 patch (#631) * Update mlpc path for SPR wheels * remove patch * Update Horovod commit id for BareMetal, Docker will be updated next (#630) Signed-off-by: Abolfazl Shahbazi <[email protected]> * fix dlrm convergence and change training performance BS to 32K (#633) Co-authored-by: jianan-gu <[email protected]> * [RNN-T training] Merge sh files to one (#635) * update torch-ccl into 1.12 (#636) * Liangan1/update torch ccl version (#637) * Update torch_ccl version * resnet50_distributed_training: don't set MASTER_ADDR by user (#638) * Update torch_ccl in script (#639) * Enable offline download distilbert (#632) * enable offline download distilbert * add convert * Update README.md * add accuracy.py * add file * refine download * refine path * refine path * add license * Update dlrm_s_pytorch.py (#643) * Update README.md (#649) * init pytorch T5 language model (#648) * init pytorch T5 language model * update README.md * update doc * update fpn models (#650) * pytorch resnet50: directly call ipex.quantization (#653) * fix int8 accuracy (#655) Co-authored-by: Zhang, Weizhuo <[email protected]> * Made fixes to the broken links (#652) * Changed the ResNet50v1_5 version back to v2_7_0 * Update Security Center URL (#657) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Weizhuoz/fix for pt 1.12 (#656) * fix vgg11_bn accuracy syntax error * remove exact_match from roberta-base * modify maskrcnn BS to 2*num_cores * Update dlrm_s_pytorch.py (#660) * Update dlrm_s_pytorch.py Reduce int8 memory usage. * Update dlrm_s_pytorch.py * Update dlrm_s_pytorch.py * Update dlrm_s_pytorch.py * Update dlrm_s_pytorch.py * Add BF32 DDP for bert-large (#663) * Update run_ddp_bert_pretrain_phase1.sh * Update run_ddp_bert_pretrain_phase2.sh * Update README.md * move OMP_NUM_THREADS=1 into dlrm_s_pytorch.py (#664) minor changes * remove rn50 ao (#665) * Re-organize models list to be grouped by framework (#654) * re-organize models list to be grouped by framework * update tensorflow ssd-resnet34 training dataset * add T5 in benchmark/README.md * mannuel set torch num threads only for int8 (#666) * Update inference_performance.sh (#669) * improve ssdrn34 perf. (#671) * improve ssdrn34 perf. * minor update. * Fix linting Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix unit tests too Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> * Use IPEX Pytorch whls instead of building IPEX from source (#674) * Use IPEX Pytorch whls instead of building IPEX from source * Corrected the link to install pytorch/IPEX * Corrected the link to install pytorch/IPEX * Updated the link with latest tutorial to install pytorch/IPEX * Update docs/general/pytorch/BareMetalSetup.md Co-authored-by: Clayne Robison <[email protected]> * Update docs/general/pytorch/BareMetalSetup.md Co-authored-by: Clayne Robison <[email protected]> * Made the suggested tweaks in the names * Adding condition to install jemalloc and tcmalloc Co-authored-by: Clayne Robison <[email protected]> * Added condition to install jemalloc, tcmalloc, vision and torch-ccl * Added some tweaks Co-authored-by: Clayne Robison <[email protected]> Co-authored-by: root <[email protected]> * Lpot2inc (#446) * draft for lpot quantization and perf analysis jupyter notebook * update with formal name of model zoo, correct wrong words, add license in python file * rm empty line * renmae LPOT to INC in text and code, and use new api * Update README.md * Update set_env.sh * Update README.md * Update ut.sh * Update local_banchmark.sh * Create local_benchmark.sh * Update README.md * Update inc_for_tensorflow.ipynb * Update ut.sh * Update README.md * rename to local_benchmark.sh * Update ut.sh * Update ut.sh * Update run_jupyter.sh * Delete lpot_for_tensorflow.ipynb * Delete lpot_quantize_model.py * Update README.md * Update README.md * Update README.md * Update inc_for_tensorflow.ipynb * Update README.md * Update README.md * Update inc_for_tensorflow.ipynb * Update requirements.txt Co-authored-by: ltsai1 <[email protected]> * Sriniva2/ssd rn34 (#682) * improve ssdrn34 perf. * minor update. * enabling synthetic data. * Update base_benchmark_util.py * Fix linting error Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> * Add doc updates for '--synthetic-data' option (#683) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Change checkpoint setting for Bert train phase 1 (#602) * Change checkpoint setting for Bert train phase 1 * fix model and config saving * fix error when runing gpu path (#686) * fix load pretrained model error when using torch_ccl (#688) * update py version in base spec (#678) (#690) * TF addons upgrade to 0.17.1 (#689) (#691) * updated tf adons version * remove comment * Update Dockerfiles prior to IMZ 2.8 release (#693) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update Documents prior to IMZ 2.8 release (#694) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update README.md (#697) * change numpy version requirement (#703) * Remove MiniGo training from IMZ (#644) * remove MiniGo training scripts and unit test * [RNN-T] [Inference] optimize the batch decoder (#711) * reduce fill_ OP in rnnt embedding kernel * optimize add between int and log to reduce dtype conversion * rnnt: support dump tracing file and print profile table (#712) * add support for open SUSE leap operating system (#708) * rnnt inference: pre convert data to bf16 (#713) * remove squeeze/slice/transpose (#714) * update resnet50 training code (#710) * update resnet50 training code * not using ipex optimize for resnet50 training * use ipex.optimize() on the whole model (#718) * resnet50 bf32: calling ipex.optimize to enable bf32 path (#719) * Added batch size as an env variable to the quickstart scripts (#676) * WIP: Adding batch size as an environment variable to the quickstart scripts * Added instructions in README.md for all workloads * Update README.md * Corrected typo in launch_benchmark * Made corrections to .docs and ran model-builder * Delete .README.md.swp * Delete .fp32_accuracy.sh.swp * Update quickstart/image_segmentation/tensorflow/3d_unet_mlperf/inference/cpu/inference_throughput.sh Co-authored-by: Clayne Robison <[email protected]> * Update quickstart/language_translation/tensorflow/transformer_mlperf/inference/cpu/inference_realtime.sh Co-authored-by: Clayne Robison <[email protected]> * Update benchmarks/launch_benchmark.py Co-authored-by: Clayne Robison <[email protected]> * Made corrections to batch-size parameter * Made changes in launch_benchmark for batch-size arg * Made modifications to the README's * Resolved merge conflict by keeping README.md file. * Modified readme for windows * Resolved merge conflict by keeping README.md file. * Corrected SPR run.sh scripts * Removed echo from run.sh Co-authored-by: Clayne Robison <[email protected]> * Added batchsize as an env variable to quickstart scripts (#680) * Added batchsize as an env variable to quickstart scripts * Made modifications to .docs and scripts * Made modifications to README * Resolved merge conflict by incorporating both suggestions. * Made corrections in README.md * Made corrections in README.md * Undo changes in training.sh file * updated readme: nit fix (#723) Co-authored-by: Rahul Nair <[email protected]> * compute throughput by test_mini_batch_size (#740) * pytorch resnet50: fix bf32 training path error (#739) * Fix a subtle 'E275' style issue that causes unknown behavior (#742) Signed-off-by: Abolfazl Shahbazi <[email protected]> Signed-off-by: Abolfazl Shahbazi <[email protected]> * rearrange the paragraphs and fix Markdown headers (#744) * Align Transformers version for BERT models (#738) * align transformer version(4.18) for bert models * change scripts to legacy * redo calibration * patch fix * Update README.md (#746) * Add support for stock PYT- object detection models (#732) * stock PYT and windows support for object detection models * Weizhuoz/reduce model zoo steps (#762) * reduce steps for bert-base, roberta, fpn models * modify max_iter for fpn models * reduce all img classification models steps * update new config for bert models (#763) * Addin Scipy for TensorFlow serving SSD-MobileNet model (#764) Signed-off-by: Abolfazl Shahbazi <[email protected]> Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update TF ResNet50v1.5 inference for SPR (baremetal) (#749) * Added matplotlib dependency to image_segmentation requirements (#768) * Update readmes for the path to output directory (#769) * update wide & deep readme for the path to pretrained model directory (#771) * add a check for ubuntu 22.04 support (#721) * Changes to add bfloat16 support for DIEN training (#679) * Changes to add bfloat16 support for DIEN training * Some for for reporting performance * Fixes for dien training and unit tests * updated tpp file withr2.8 approvals (#773) * Add Windows stock PyTorch support for TransNet v2 (#779) * update TransNet v2 to work with stock pytorch * update Windows.md path in all relevant docs * add P99 metric for LZ models (#780) Co-authored-by: Weizhuo Zhang <[email protected]> * Rn50 training multiple epoches output 1 KPI and add training_steps argument. (#775) * enable --training_steps and 1 training KPI output with multiple epoches * add prefix * update print freq * fix display bug * enable PyTorch resnet50 fp16 path (#783) * enable PyTorch resnet50 fp16 path * fix conflict * Extract p99 metric from log to summary (#784) * enable fp16 bert train and inference (#782) * Vruddarr/pt update windows readmes (#778) * remove bfloat16 experimental support note (#786) * Update IPEX installation path (#788) * Clean up _pycache_ files, remove symlinks, and add license headers for dien training bf16 (#787) * update readme for jemalloc and iomp path (#789) * update readme for jemalloc and iomp path * Updated IOMP path as path to the intel-openmp directory * PyTorch: fix resnext101 running script (#795) * Update 3dunet mlperf bash scripts and README (#797) * update 3dunet mlperf doc to use quickstart scripts, rename quickstart scripts for multi-instance * fix tests job (#803) * rnnt inference: align replace lstm API due to IPEX change (#802) * Adding quick start scripts to MobileNetV1 bfloat16 precision (#793) * Adding quick start scripts to ssd-mobilenet bfloat16 precision (#798) * Update T5 model with windows quick start scripts (#790) * Update T5 model with windows quick start scripts * Updated Readme by specifying values to environment variables * Update inference int8 readme and script of 4 CV models using INC (#698) * update docs to add INC int8 models as an option * add instructions for how to quantize a fp32 model using INC * rnnt: fix stft due to PyTorch API change (#811) * rnnt training: fix stft due to PyTorch API change (#813) * Update BareMetalSetup.md (#817) * Gerardod/build container (#807) First phase of GHA WF to build the image of a Model Zoo workload container and push it to CAAS. * Sharvils/tf workload (#808) * TFv2.10 support added. Horovod version updated. * Vruddarr/tf add language translation bert fp32 quick start scripts (#804) * Adding quick start scripts to language translation BERT FP32 model * Changed path to the Readme * Adding spec file <bert-fp32-inference_spec.yml> * Update spec file and model link in Readme tables * Update Readme path in windows.md * Updated TL notebooks for SPR Launch (#810) * Updates for TL PyTorch notebook * Edits for two more TL notebooks * Reverting previous change for virtualenv * Removed --no-deps and some nonexistent links * Added TFHub cache dir * Updated TL notebook README for legal/branding * Update typo in Readme (#821) * PyTorch: using ipex.optimize for bf16 training (#824) * Fix CVEs for Pillow and notebook packages (#831) Signed-off-by: Abolfazl Shahbazi <[email protected]> Signed-off-by: Abolfazl Shahbazi <[email protected]> * add intel-alphafold2 optimized w/ IPEX from realm of AIDD (#737) * add alphafold2 from AIDD realm * Remove unused variable in mlperf 3DUnet performance run (#832) * Update Model Zoo name, Python version and message for IPEX (#833) * Update instruction for Miniconda, Jemalloc, PyTorch and IPEX and updt… (#830) * Update instruction for Miniconda, Jemalloc, PyTorch and IPEX and updting the readme by replacing conda with Miniconda. * Adding comment to install torch in BareMetalSetup.md * Adding IPEX version and removing *s * Update models main tables (#836) *update main readmes * Adding jemalloc instructions and environment variables (#838) * DLRM hybrid gradient product (#814) * enable hybrid mergedembedding * Hybrid Merge embedding * refine code * Update model file * Fix data loader issue for distributed trianing * Update the print info * Fix lr issue for sparse table both 2/8 ranks get convergenced with 0.75 epochs Co-authored-by: root <[email protected]> * update the TTT evaluation method by excluding dataloader & metric evaluation (#844) Co-authored-by: Zhang, Liangang <[email protected]> * PyTorch: resnet50 distributed training using lars optimizer (#826) * modify dlrm's sklearn metric eval func to ipex's multi-thread version (#850) * modify recall/precision/f1/ap 's eval as optional (#856) * Port dataloader optimization for distributed training of dlrm (#847) * update the TTT evaluation method by excluding dataloader & metric evaluation * port dataloader optimization for distributed training of dlrm * modify dlrm's sklearn metric eval func to ipex's multi-thread version (#850) * modify recall/precision/f1/ap 's eval as optional (#856) * port dataloader optimization for distributed training of dlrm * delete local bs computation in evaluation stage * modify the TTT output name Co-authored-by: Zhang, Liangang <[email protected]> * Update horovod version to fix run time failure due to Status call (#859) * fix regression for dlrm single node training (#864) Co-authored-by: Weizhuo Zhang <[email protected]> * Update pytorch model zoo table of BF32 with landing zoo models (#865) * Added SNYK scan (#855) * Update SSD-ResNet34 code in start.sh(#862) * Add Distilbert base model for inference (Tensorflow) to model zoo (#815) * Add fp32 inference for distilbert base model * Fix Bert spec file (#873) * 1) Add torch.profiler (#871) 2) change the distributed_training.sh for dlrm to diamond cluster * Update Wide & Deep docs (#875) * The copy of #867(Porting evaluation iteration overlapping) (#876) * port evaluation overlapping * remove debug code * remove debug code * remove unused code * remove unused code * add resnet50 distributed training script (#879) * add resnet50 distributed training script * collect TTT Co-authored-by: XiaobingSuper <[email protected]> * reduce redundant bus traffic (#880) * Port all_to_all index overlapping with interaction and top mlp. (#878) * port all_to_all index overlapping with interaction and top mlp * fix seg fault * Add int8 support for distilbert (#823) * Add fp32 inference for distilbert base model Co-authored-by: syedshahbaaz <[email protected]> * Update DIEN inference docs & quickstart scripts (#869) * Update DIEN docs * update for spr ww42 Co-authored-by: WafaaT <[email protected]> * Update ResNet50v1.5 docs (#820) * Update and Validate ResNet50v1.5 Inference and training model for TF SPR * Update and validate docs for TF SPR Co-authored-by: WafaaT <[email protected]> * Update Wide & Deep using Large Dataset docs (#877) * Vruddarr/tf bfloat32 precision check (#893) * Update Wide and Deep Large Dataset Training Model docs (#881) * Vruddarr/tf update image recognition models docs (#816) * Update Inceptionv3,DenseNet 169, Inceptionv4, ResNet50, ResNet101, MobileNet V1 quickstart scripts and docs * Update and validate MobileNet v1 for TF SPR Co-authored-by: WafaaT <[email protected]> * Fix BFloat32 precision check code for Resnet50v1.5 training model (#894) * Update 3DUNet MLperf for SPR (#889) * Updated Bert Large SPR READMEs (#887) * Included tensorflow and keras versions * updated to downloaded bert checkpoints * Fix typos in MobilenetV1 scripts (#899) * modify time function to solve int8 benchmark issue on windows (#898) * modify time function to solve int8 benchmark issue on windows * Replace the time.time function calls to time.perf_counter to improve the time statistic resolution. Updated for the additional 5 models Co-authored-by: Ying <[email protected]> * Update DIEN Training docs (#882) * Adding permissions to scripts in DIEN and correcting pb file paths in README_SPR_baremetal (#901) * Adding SPR_baremetal_readme and fixing model paths in the tables (#904) * fix acc test for single node (#903) * fix acc test for single node * Update dlrm_s_pytorch.py Co-authored-by: Weizhuo Zhang <[email protected]> * commit cherry-picks from r2.9 (#900) * update tbb files (#843) * fix vulnerability issues reported by snyk scans (#848) * upgrade for ipex 1.13 * Update Pillow to '>=9.3.0' (#884) Signed-off-by: Abolfazl Shahbazi <[email protected]> Signed-off-by: Abolfazl Shahbazi <[email protected]> * fix some bugs for p99 (#909) * Update tensorflow benchmarks to use latest horovod commit (#908) * Update start.sh * Update start.sh * Update to use shortened commit hash * do not convert data to bf16 while using fp32 and bf32 (#911) Co-authored-by: Weizhuo Zhang <[email protected]> * Update SSD-Resnet34 training docs for SPR task (#914) * Update SSD-Resnet34 training & docs for SPR * Vruddarr/tf update ssd mobilenet docs (#846) * Update quick start scripts and spec file to run for all precisions * Update and validate SSD-Mobilenet docs for TF SPR Co-authored-by: WafaaT <[email protected]> * fix print issue (#915) Co-authored-by: Weizhuo Zhang <[email protected]> * Update rfcn docs to use same quick start scripts (#897) * Update rfcn docs to use same quick start scripts Co-authored-by: WafaaT <[email protected]> * Sharvils/spr ssd training (#917) * Dockerfile updated * Update SSD-ResNet34 Inference docs (#866) * Update ResNet34 Inference to use same scripts & docs for all precisions * Update for SPR WW42 Co-authored-by: WafaaT <[email protected]> * Update transformer_mlperf scripts and README fro SPR WW42 (#891) Co-authored-by: Wafaa Taie <[email protected]> * Update TF models spec files for SPR WW42 (#919) * update TF models spec files for spr ww42 * update docker partial for tf addons version * workaround rdma config for spr (#925) * remove supported OS checks (#926) * Update Model paths in main readme (#928) * Remove Linux/windows OS platform support checks (#927) * update resnet50 distributed training script (#923) * resnet50 distributed training: use logical core for ccl (#930) * Update bert scripts to add same quick start scripts to all precisions (#910) * Update MobilenetV1 SPR docs (#931) * Update Resnet50v1_5_SPR_docs (#934) * Update SSD-Mobilenet SPR docs (#935) * Update Resenet50v1.5 inference SPR docs (#933) * Fix DIEN inference.sh script and add pretrained model env var in mobilenetv1 SPR baremetal readme (#939) * Update DIEN Inference and Training SPR docs (#937) * Update SSD-Resnet34 training SPR docs (#936) * Update SSD-Resnet34 Inference SPR docs (#938) * Update README_SPR_baremetal.md remove steps and warm_up steps env vars Co-authored-by: Wafaa Taie <[email protected]> * BERT training dockerfile fixed (#921) * BERT repo version fixed for SPR container (#920) * Update spr baremetal instructions for 3dunet, bert large and transformer mlperf (#932) * Update Transformer MLPerf inference docs for pre-trained models (#940) * Fix Language Translation BERT quickstart scripts (#941) * fix scripts to detect the number of cores * Update mlperf_gnmt docs (#945) * Updating Transformer_LT_official scripts (#913) * Add support for dGPU models (#840) (#948) * Add support for dGPU models (#840) * upgrade Pillow version for Yolov4 * Update main README.md (#947) * update main readme * edit transformer_mlperf and bert SPR docs * remove workflows * Fix CVEs based on Snyk scans in TL notebooks (#951) * fix snyk critical issues in TL jupyter notebooks * Remove INC dependency for Snyk issues (#953) * removed neuralcompressorfor to avoid vulnerability in Snyk scans * Remove pointers to BERT Large int8 docs (#952) * fix int8 model link (#958) * Fixed num_intra_threads for bfloat16 (#959) (#960) * Fixed num_intra_threads for bfloat16 * Modified open mpi instructions * Added kmp_blocktime for bfloat16 Co-authored-by: mahathis <[email protected]> * Fix syntax error and pythonpath in ssd-resnet34 training (#962) (#965) Co-authored-by: Veena2207 <[email protected]> * fix training bkms (#967) (#968) * fix T5 inference script (#969) --------- Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: Chunyuan WU <[email protected]> Co-authored-by: XiaobingZhang <[email protected]> Co-authored-by: zhuhaozhe <[email protected]> Co-authored-by: jianan-gu <[email protected]> Co-authored-by: Dina Suehiro Jones <[email protected]> Co-authored-by: Wang, Chuanqi <[email protected]> Co-authored-by: YanbingJiang <[email protected]> Co-authored-by: Weizhuo Zhang <[email protected]> Co-authored-by: Melanie Buehler <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: leslie-fang-intel <[email protected]> Co-authored-by: xiaofeij <[email protected]> Co-authored-by: jiayisunx <[email protected]> Co-authored-by: Sean-Michael Riesterer <[email protected]> Co-authored-by: liangan1 <[email protected]> Co-authored-by: blzheng <[email protected]> Co-authored-by: Om Thakkar <[email protected]> Co-authored-by: mahathis <[email protected]> Co-authored-by: Srini511 <[email protected]> Co-authored-by: Clayne Robison <[email protected]> Co-authored-by: Neo Zhang Jianyu <[email protected]> Co-authored-by: ltsai1 <[email protected]> Co-authored-by: Jitendra Patil <[email protected]> Co-authored-by: Kanvi Khanna <[email protected]> Co-authored-by: Rahul Nair <[email protected]> Co-authored-by: Veena2207 <[email protected]> Co-authored-by: jojivk-intel-nervana <[email protected]> Co-authored-by: xiangdong <[email protected]> Co-authored-by: Huang, Zhiwei <[email protected]> Co-authored-by: gera-aldama <[email protected]> Co-authored-by: Sharvil Shah <[email protected]> Co-authored-by: wyang2 <[email protected]> Co-authored-by: Yimei Sun <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: tangleintel <[email protected]> Co-authored-by: Syed Shahbaaz Ahmed <[email protected]> Co-authored-by: Er-Xin (Edwin) Shang <[email protected]> Co-authored-by: Ying <[email protected]> Co-authored-by: sevdeawesome <[email protected]> Co-authored-by: DiweiSun <[email protected]>

* [RNN-T training] Enable FP32 gemm using oneDNN (#531) * Update the Readme guide for distilbert (#534) * Update the Readme guide for distilbert * Fix accuracy grep bug, and grep accuracy for distilbert Co-authored-by: Weizhuo Zhang <[email protected]> * Update end2end public dockerfile to look for IPEX in the conda directory (#535) * Notebook to script conversion example (#516) * Add notebook script conversion example * Fixed doc * Replaces custom preprocessor with built-in one * Changed tag to remove_for_custom_dataset * Add URL check prior to calling urlretrieve (#538) * Add URL check prior to calling urlretrieve Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a typo Signed-off-by: Abolfazl Shahbazi <[email protected]> * disable for ssd since fused cat cat kernel is slow (#537) * fix bug when adding steps in rnnt inference (#528) * Fix and updates for TensorFlow WW18-2022 SPR (#542) * Fix and updates for TensorFlow WW18-2022 SPR Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix TensorFlow SPR nightly versions Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update pre-trained models download URLs Signed-off-by: Abolfazl Shahbazi <[email protected]> * Intall Python 3.8 development tools Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix OpenMPI install and setup Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update to Horovod commit 11c1389 to fix TF v2.9 + Horovod install failure (#519) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Horovod Installaion for SPR and CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Python3.8 version for CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a typo in TensorFlow 3d-unet partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a broken partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add TCMalloc to TF base container for SPR and remove OpenSSL Signed-off-by: Abolfazl Shahbazi <[email protected]> * Remove some repositories Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add 'matplotlib' for '3d-unet' Signed-off-by: Abolfazl Shahbazi <[email protected]> * switch to build OpenMPI due to issue in Market Place provided version Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix PYTORCH_WHEEL and IPEX_WHEEL arg values Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix and updates for PyTorch WW14-2022 SPR (#543) * Fix and updates for PyTorch WW14-2022 SPR Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix and updates for TensorFlow WW18-2022 SPR Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix TensorFlow SPR nightly versions Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update pre-trained models download URLs Signed-off-by: Abolfazl Shahbazi <[email protected]> * Intall Python 3.8 development tools Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix OpenMPI install and setup Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update to Horovod commit 11c1389 to fix TF v2.9 + Horovod install failure (#519) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Horovod Installaion for SPR and CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Python3.8 version for CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a typo in TensorFlow 3d-unet partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a broken partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add TCMalloc to TF base container for SPR and remove OpenSSL Signed-off-by: Abolfazl Shahbazi <[email protected]> * Updates required to the base image Signed-off-by: Abolfazl Shahbazi <[email protected]> * Remove some repositories Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add 'matplotlib' for '3d-unet' Signed-off-by: Abolfazl Shahbazi <[email protected]> * switch to build OpenMPI due to issue in Market Place provided version Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix PYTORCH_WHEEL and IPEX_WHEEL arg values Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix PYT resnet50 quickstart scripts for both Linux and Windows (#547) * fix quickstart scripts, detect platform type, update to run with pytorch only * Fix SPR PyTorch MaskRCNN inference documentation for CHECKPOINT_DIR (#548) * Enable bert large multi stream inference (#554) * test bert multi stream module * enable input split and output concat for accuracy run * change the default num_streams batchsize cores to 56 * change ssd multi stream throughput to 1 core 1 batch * change the default parameter for rn50 ssd multi stream module * modify enable_ipex_for_squad.diff to align new multistream hint implementation * enable warmup and multi socket support * change default parameter for rn50 ssd multi stream inference * Add train-no-eval for rn50 pytorch (#555) * PyTorch SPR BERT large training updates (h5py and dataset instructions) and update LD_PRELOAD for SPR entrypoints (#550) * Add h5py install to bert training dockerfile * documentation updates * update docs, and add input_preprocessing to the wrapper package * Update LD_PRELOAD trailing : * Fix syntax * removing unnecessary change * Update DLRM entrypoint * Update docs to note that phase2 has bert_config.json in the CHECKPOINT_DIR * Fix syntax * increase shm-size to 10g * [RNN-T training] Update scripts -- run on 1S (#561) * Update maskrcnn training script to run on 1s (#562) * use single node to do ssd-rn34 training (#563) * Update training.sh (#564) * Update training.sh (#565) Use tcmalloc instead of jemalloc * use single node to do resnet50 training (#568) * add numactl -C and remove jit warm in main thread (#569) * Update unit-test.yml (#546) * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Fixed make command, updated pip install. Fixed make command to run from the root directory. Replaced pip install tox with a pip install -r requirements-tests.txt to install all dependencies for the tests. * Add tox to test dependencies. Added tox to the dependencies so that the Workflow and others may install it with pip install -r requirements-test.txt and be covered for running make lint and make unit-test. * Update unit-test.yml Changed 'make unit-test' to 'make unit_test' as that is the actual target defined in the Makefile. * Update unit-test.yml Changed apt-get install command. * re-enable int8 for api change (#579) * saperate fully convergency test from training test (#581) Co-authored-by: jianan-gu <[email protected]> * ssd enable new int8 (#580) * v1 * enable new int8 method * Revert "ssd enable new int8 (#580)" (#584) This reverts commit 9eb3211. * Revert "re-enable int8 for api change (#579)" (#583) This reverts commit 0bded92. * Update training script using 1s (#560) * Enable checkpoint during training for bert-large (#573) * minor fix * Add readme for enabling checkpoint * update phase1 to enable checkpoint by default * Update README.md * Enable ssd bf32 inference training (#589) * enable ssd bf32 inference * enable ssd bf32 train * enable RNN-T bf32 inference (#591) * Enable bf32 for bert and distilbert for inference (#593) * enable bf32 distilbert * enable bert bf32 * Enable RNN-T bf32 training (#594) * enable maskrcnn bf32 inference and training (#595) * enable resnet50 and resnext101 bf16 path (#596) * enable bert bf32 train (#600) * update resnet int8 path using new int8 api (#603) * re-enable int8 for api change (#604) Co-authored-by: jianan-gu <[email protected]> * Leslie/ssd enable new int8 (#605) * v1 * enable new int8 method * update json file * add rn50 int8 weight sharing Co-authored-by: Jiang, Xiaofei <[email protected]> * update ssd training bs to the multily of core numbers (#606) * enable bf32 for dlrm (#607) Co-authored-by: jianan-gu <[email protected]> * Update IPEX new int8 API enabling for distilbert/bert-large (#608) * enable distilbert * enable bert * fix max-ind-range and add memory info (#609) Co-authored-by: jianan-gu <[email protected]> * Remove debug code (#610) * update training steps (#611) * fix bandit scan fails (#612) * PYT Image recognition models support on Windows (#549) * fix all image recognition scripts to run on windows and linux with PYT, and only linux with IPEX * [RNN-T training] fix bandit scan fails (#614) * RNN-T inference: fix IMZ Bandit scan fails (#615) * Update unit-test.yml (#570) Changed the docker user credential to utilize GitHub Secret. * MaskRCNN: fix IMZ Bandit scan fails (#623) * Fix for horovod-related failures in TF nightly runs (#613) * cpp17 horovod failure fix * minor debugging changes * minor fixes - directory name * cleanup * addressing reviewer comments * Minor fix for Horovod install and adding 'tf_slim' for SSD ResNet34 (#624) * Minor fix for Horovod install and adding 'tf_slim' for SSD ResNet34 Signed-off-by: Abolfazl Shahbazi <[email protected]> * Set 'HOROVOD_WITH_MPI=1' explicitly Signed-off-by: Abolfazl Shahbazi <[email protected]> * update GCC version to GCC 9 Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add 'horovodrun --check-build' for sanity check Signed-off-by: Abolfazl Shahbazi <[email protected]> * removo force install inside Docker Signed-off-by: Abolfazl Shahbazi <[email protected]> * [RNN-T training] Fix ddp sample number issue (#625) * update BF32 usage (#627) * resnet50 training: add warm up before collecting time (#628) * image to bf16 (#629) * Update end2end DLSA dockerfile due to SPR wheel path update and removing int8 patch (#631) * Update mlpc path for SPR wheels * remove patch * Update Horovod commit id for BareMetal, Docker will be updated next (#630) Signed-off-by: Abolfazl Shahbazi <[email protected]> * fix dlrm convergence and change training performance BS to 32K (#633) Co-authored-by: jianan-gu <[email protected]> * [RNN-T training] Merge sh files to one (#635) * update torch-ccl into 1.12 (#636) * Liangan1/update torch ccl version (#637) * Update torch_ccl version * resnet50_distributed_training: don't set MASTER_ADDR by user (#638) * Update torch_ccl in script (#639) * Enable offline download distilbert (#632) * enable offline download distilbert * add convert * Update README.md * add accuracy.py * add file * refine download * refine path * refine path * add license * Update dlrm_s_pytorch.py (#643) * Update README.md (#649) * init pytorch T5 language model (#648) * init pytorch T5 language model * update README.md * update doc * update fpn models (#650) * pytorch resnet50: directly call ipex.quantization (#653) * fix int8 accuracy (#655) Co-authored-by: Zhang, Weizhuo <[email protected]> * Made fixes to the broken links (#652) * Update Security Center URL (#657) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Weizhuoz/fix for pt 1.12 (#656) * fix vgg11_bn accuracy syntax error * remove exact_match from roberta-base * modify maskrcnn BS to 2*num_cores * Update dlrm_s_pytorch.py (#660) * Update dlrm_s_pytorch.py Reduce int8 memory usage. * Update dlrm_s_pytorch.py * Update dlrm_s_pytorch.py * Update dlrm_s_pytorch.py * Update dlrm_s_pytorch.py * Add BF32 DDP for bert-large (#663) * Update run_ddp_bert_pretrain_phase1.sh * Update run_ddp_bert_pretrain_phase2.sh * Update README.md * move OMP_NUM_THREADS=1 into dlrm_s_pytorch.py (#664) minor changes * remove rn50 ao (#665) * Re-organize models list to be grouped by framework (#654) * re-organize models list to be grouped by framework * update tensorflow ssd-resnet34 training dataset * add T5 in benchmark/README.md * mannuel set torch num threads only for int8 (#666) * Update inference_performance.sh (#669) * improve ssdrn34 perf. (#671) * improve ssdrn34 perf. * minor update. * Fix linting Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix unit tests too Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> * Use IPEX Pytorch whls instead of building IPEX from source (#674) * Use IPEX Pytorch whls instead of building IPEX from source * Corrected the link to install pytorch/IPEX * Corrected the link to install pytorch/IPEX * Updated the link with latest tutorial to install pytorch/IPEX * Update docs/general/pytorch/BareMetalSetup.md Co-authored-by: Clayne Robison <[email protected]> * Update docs/general/pytorch/BareMetalSetup.md Co-authored-by: Clayne Robison <[email protected]> * Made the suggested tweaks in the names * Adding condition to install jemalloc and tcmalloc Co-authored-by: Clayne Robison <[email protected]> * Added condition to install jemalloc, tcmalloc, vision and torch-ccl * Added some tweaks Co-authored-by: Clayne Robison <[email protected]> * Lpot2inc (#446) * draft for lpot quantization and perf analysis jupyter notebook * update with formal name of model zoo, correct wrong words, add license in python file * rm empty line * renmae LPOT to INC in text and code, and use new api * Update README.md * Update set_env.sh * Update README.md * Update ut.sh * Update local_banchmark.sh * Create local_benchmark.sh * Update README.md * Update inc_for_tensorflow.ipynb * Update ut.sh * Update README.md * rename to local_benchmark.sh * Update ut.sh * Update ut.sh * Update run_jupyter.sh * Delete lpot_for_tensorflow.ipynb * Delete lpot_quantize_model.py * Update README.md * Update README.md * Update README.md * Update inc_for_tensorflow.ipynb * Update README.md * Update README.md * Update inc_for_tensorflow.ipynb * Update requirements.txt Co-authored-by: ltsai1 <[email protected]> * Sriniva2/ssd rn34 (#682) * improve ssdrn34 perf. * minor update. * enabling synthetic data. * Update base_benchmark_util.py * Fix linting error Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> * Add doc updates for '--synthetic-data' option (#683) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Change checkpoint setting for Bert train phase 1 (#602) * Change checkpoint setting for Bert train phase 1 * fix model and config saving * fix error when runing gpu path (#686) * fix load pretrained model error when using torch_ccl (#688) * update py version in base spec (#678) (#690) * TF addons upgrade to 0.17.1 (#689) (#691) * updated tf adons version * remove comment * Update Dockerfiles prior to IMZ 2.8 release (#693) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update Documents prior to IMZ 2.8 release (#694) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update README.md (#697) * change numpy version requirement (#703) * Remove MiniGo training from IMZ (#644) * remove MiniGo training scripts and unit test * [RNN-T] [Inference] optimize the batch decoder (#711) * reduce fill_ OP in rnnt embedding kernel * optimize add between int and log to reduce dtype conversion * rnnt: support dump tracing file and print profile table (#712) * add support for open SUSE leap operating system (#708) * rnnt inference: pre convert data to bf16 (#713) * remove squeeze/slice/transpose (#714) * update resnet50 training code (#710) * update resnet50 training code * not using ipex optimize for resnet50 training * use ipex.optimize() on the whole model (#718) * resnet50 bf32: calling ipex.optimize to enable bf32 path (#719) * Added batch size as an env variable to the quickstart scripts (#676) * WIP: Adding batch size as an environment variable to the quickstart scripts * Added instructions in README.md for all workloads * Update README.md * Corrected typo in launch_benchmark * Made corrections to .docs and ran model-builder * Delete .README.md.swp * Delete .fp32_accuracy.sh.swp * Update quickstart/image_segmentation/tensorflow/3d_unet_mlperf/inference/cpu/inference_throughput.sh Co-authored-by: Clayne Robison <[email protected]> * Update quickstart/language_translation/tensorflow/transformer_mlperf/inference/cpu/inference_realtime.sh Co-authored-by: Clayne Robison <[email protected]> * Update benchmarks/launch_benchmark.py Co-authored-by: Clayne Robison <[email protected]> * Made corrections to batch-size parameter * Made changes in launch_benchmark for batch-size arg * Made modifications to the README's * Resolved merge conflict by keeping README.md file. * Modified readme for windows * Resolved merge conflict by keeping README.md file. * Corrected SPR run.sh scripts * Removed echo from run.sh Co-authored-by: Clayne Robison <[email protected]> * Added batchsize as an env variable to quickstart scripts (#680) * Added batchsize as an env variable to quickstart scripts * Made modifications to .docs and scripts * Made modifications to README * Resolved merge conflict by incorporating both suggestions. * Made corrections in README.md * Made corrections in README.md * Undo changes in training.sh file * updated readme: nit fix (#723) Co-authored-by: Rahul Nair <[email protected]> * compute throughput by test_mini_batch_size (#740) * pytorch resnet50: fix bf32 training path error (#739) * Fix a subtle 'E275' style issue that causes unknown behavior (#742) Signed-off-by: Abolfazl Shahbazi <[email protected]> Signed-off-by: Abolfazl Shahbazi <[email protected]> * rearrange the paragraphs and fix Markdown headers (#744) * Align Transformers version for BERT models (#738) * align transformer version(4.18) for bert models * change scripts to legacy * redo calibration * patch fix * Update README.md (#746) * Add support for stock PYT- object detection models (#732) * stock PYT and windows support for object detection models * Weizhuoz/reduce model zoo steps (#762) * reduce steps for bert-base, roberta, fpn models * modify max_iter for fpn models * reduce all img classification models steps * update new config for bert models (#763) * Addin Scipy for TensorFlow serving SSD-MobileNet model (#764) Signed-off-by: Abolfazl Shahbazi <[email protected]> Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update TF ResNet50v1.5 inference for SPR (baremetal) (#749) * Added matplotlib dependency to image_segmentation requirements (#768) * Update readmes for the path to output directory (#769) * update wide & deep readme for the path to pretrained model directory (#771) * add a check for ubuntu 22.04 support (#721) * Changes to add bfloat16 support for DIEN training (#679) * Changes to add bfloat16 support for DIEN training * Some for for reporting performance * Fixes for dien training and unit tests * updated tpp file withr2.8 approvals (#773) * Add Windows stock PyTorch support for TransNet v2 (#779) * update TransNet v2 to work with stock pytorch * update Windows.md path in all relevant docs * add P99 metric for LZ models (#780) Co-authored-by: Weizhuo Zhang <[email protected]> * Rn50 training multiple epoches output 1 KPI and add training_steps argument. (#775) * enable --training_steps and 1 training KPI output with multiple epoches * add prefix * update print freq * fix display bug * enable PyTorch resnet50 fp16 path (#783) * enable PyTorch resnet50 fp16 path * fix conflict * Extract p99 metric from log to summary (#784) * enable fp16 bert train and inference (#782) * Vruddarr/pt update windows readmes (#778) * remove bfloat16 experimental support note (#786) * Update IPEX installation path (#788) * Clean up _pycache_ files, remove symlinks, and add license headers for dien training bf16 (#787) * update readme for jemalloc and iomp path (#789) * update readme for jemalloc and iomp path * Updated IOMP path as path to the intel-openmp directory * PyTorch: fix resnext101 running script (#795) * Update 3dunet mlperf bash scripts and README (#797) * update 3dunet mlperf doc to use quickstart scripts, rename quickstart scripts for multi-instance * fix tests job (#803) * rnnt inference: align replace lstm API due to IPEX change (#802) * Adding quick start scripts to MobileNetV1 bfloat16 precision (#793) * Adding quick start scripts to MobileNetV1 bfloat16 precision * Adding executable permissions to files * Adding aikit.md to docs file * updated the comments on readme Co-authored-by: veena.mounika.ruddarraju <[email protected]> * Adding quick start scripts to ssd-mobilenet bfloat16 precision (#798) * Adding quick start scripts to ssd-mobilenet bfloat16 precision * changed file permissions * Updated comments on readme file Co-authored-by: veena.mounika.ruddarraju <[email protected]> * Update T5 model with windows quick start scripts (#790) * Update T5 model with windows quick start scripts * Updated Readme by specifying values to environment variables * Update inference int8 readme and script of 4 CV models using INC (#698) * update docs to add INC int8 models as an option * add instructions for how to quantize a fp32 model using INC * rnnt: fix stft due to PyTorch API change (#811) * rnnt training: fix stft due to PyTorch API change (#813) * Update BareMetalSetup.md (#817) * Gerardod/build container (#807) First phase of GHA WF to build the image of a Model Zoo workload container and push it to CAAS. * Sharvils/tf workload (#808) * TFv2.10 support added. Horovod version updated. * Vruddarr/tf add language translation bert fp32 quick start scripts (#804) * Adding quick start scripts to language translation BERT FP32 model * Corrected typo errors * Changed path to the Readme * Adding spec file <bert-fp32-inference_spec.yml> * Update spec file and model link in Readme tables * Update Readme path in windows.md * Updated TL notebooks for SPR Launch (#810) * Updates for TL PyTorch notebook * Edits for two more TL notebooks * Reverting previous change for virtualenv * Removed --no-deps and some nonexistent links * Added TFHub cache dir * Updated TL notebook README for legal/branding * Update typo in Readme (#821) * PyTorch: using ipex.optimize for bf16 training (#824) * Fix CVEs for Pillow and notebook packages (#831) Signed-off-by: Abolfazl Shahbazi <[email protected]> Signed-off-by: Abolfazl Shahbazi <[email protected]> * add intel-alphafold2 optimized w/ IPEX from realm of AIDD (#737) * add alphafold2 from AIDD realm * Remove unused variable in mlperf 3DUnet performance run (#832) * Update Model Zoo name, Python version and message for IPEX (#833) Co-authored-by: veena.mounika.ruddarraju <[email protected]> * Update instruction for Miniconda, Jemalloc, PyTorch and IPEX and updt… (#830) * Update instruction for Miniconda, Jemalloc, PyTorch and IPEX and updting the readme by replacing conda with Miniconda. * Adding comment to install torch in BareMetalSetup.md * Update models main tables (#836) *update main readmes * Adding jemalloc instructions and environment variables (#838) * DLRM hybrid gradient product (#814) * enable hybrid mergedembedding * Hybrid Merge embedding * refine code * Update model file * Fix data loader issue for distributed trianing * Update the print info * Fix lr issue for sparse table both 2/8 ranks get convergenced with 0.75 epochs Co-authored-by: root <[email protected]> * update the TTT evaluation method by excluding dataloader & metric evaluation (#844) Co-authored-by: Zhang, Liangang <[email protected]> * PyTorch: resnet50 distributed training using lars optimizer (#826) * modify dlrm's sklearn metric eval func to ipex's multi-thread version (#850) * modify recall/precision/f1/ap 's eval as optional (#856) * Port dataloader optimization for distributed training of dlrm (#847) * update the TTT evaluation method by excluding dataloader & metric evaluation * port dataloader optimization for distributed training of dlrm * modify dlrm's sklearn metric eval func to ipex's multi-thread version (#850) * modify recall/precision/f1/ap 's eval as optional (#856) * port dataloader optimization for distributed training of dlrm * delete local bs computation in evaluation stage * modify the TTT output name Co-authored-by: Zhang, Liangang <[email protected]> * Update horovod version to fix run time failure due to Status call (#859) * fix regression for dlrm single node training (#864) Co-authored-by: Weizhuo Zhang <[email protected]> * Update pytorch model zoo table of BF32 with landing zoo models (#865) * Added SNYK scan (#855) * Update SSD-ResNet34 code in start.sh(#862) * Add Distilbert base model for inference (Tensorflow) to model zoo (#815) * Add fp32 inference for distilbert base model * Fix Bert spec file (#873) * 1) Add torch.profiler (#871) 2) change the distributed_training.sh for dlrm to diamond cluster * Update Wide & Deep docs (#875) * The copy of #867(Porting evaluation iteration overlapping) (#876) * port evaluation overlapping * remove debug code * remove debug code * remove unused code * remove unused code * add resnet50 distributed training script (#879) * add resnet50 distributed training script * collect TTT Co-authored-by: XiaobingSuper <[email protected]> * reduce redundant bus traffic (#880) * Port all_to_all index overlapping with interaction and top mlp. (#878) * port all_to_all index overlapping with interaction and top mlp * fix seg fault * Add int8 support for distilbert (#823) * Add fp32 inference for distilbert base model Co-authored-by: syedshahbaaz <[email protected]> * Update DIEN inference docs & quickstart scripts (#869) * Update DIEN docs * update for spr ww42 Co-authored-by: WafaaT <[email protected]> * Update ResNet50v1.5 docs (#820) * Update and Validate ResNet50v1.5 Inference and training model for TF SPR * Update and validate docs for TF SPR Co-authored-by: WafaaT <[email protected]> * Update Wide & Deep using Large Dataset docs (#877) * Vruddarr/tf bfloat32 precision check (#893) * Update Wide and Deep Large Dataset Training Model docs (#881) * Vruddarr/tf update image recognition models docs (#816) * Update Inceptionv3,DenseNet 169, Inceptionv4, ResNet50, ResNet101, MobileNet V1 quickstart scripts and docs * Update and validate MobileNet v1 for TF SPR Co-authored-by: WafaaT <[email protected]> * Fix BFloat32 precision check code for Resnet50v1.5 training model (#894) * Update 3DUNet MLperf for SPR (#889) * Updated Bert Large SPR READMEs (#887) * Updated Bert Large SPR READMEs * Included tensorflow and keras versions * Updated bert large README for spr * Updated scripts and README as per reviews * Update SPR quickstart description * updated to downloaded bert checkpoints * Fix typos in MobilenetV1 scripts (#899) * modify time function to solve int8 benchmark issue on windows (#898) * modify time function to solve int8 benchmark issue on windows * Replace the time.time function calls to time.perf_counter to improve the time statistic resolution. Updated for the additional 5 models Co-authored-by: Ying <[email protected]> * Update DIEN Training docs (#882) * Adding permissions to scripts in DIEN and correcting pb file paths in README_SPR_baremetal (#901) * Adding SPR_baremetal_readme and fixing model paths in the tables (#904) * fix acc test for single node (#903) * fix acc test for single node * Update dlrm_s_pytorch.py Co-authored-by: Weizhuo Zhang <[email protected]> * commit cherry-picks from r2.9 (#900) * update tbb files (#843) * fix vulnerability issues reported by snyk scans (#848) * upgrade for ipex 1.13 * Update Pillow to '>=9.3.0' (#884) Signed-off-by: Abolfazl Shahbazi <[email protected]> Signed-off-by: Abolfazl Shahbazi <[email protected]> * fix some bugs for p99 (#909) * Update tensorflow benchmarks to use latest horovod commit (#908) * Update start.sh * Update start.sh * Update to use shortened commit hash * do not convert data to bf16 while using fp32 and bf32 (#911) Co-authored-by: Weizhuo Zhang <[email protected]> * Update SSD-Resnet34 training docs for SPR task (#914) * Update SSD-Resnet34 training & docs for SPR * Vruddarr/tf update ssd mobilenet docs (#846) * Update quick start scripts and spec file to run for all precisions * Update and validate SSD-Mobilenet docs for TF SPR Co-authored-by: WafaaT <[email protected]> * fix print issue (#915) Co-authored-by: Weizhuo Zhang <[email protected]> * Update rfcn docs to use same quick start scripts (#897) * Update rfcn docs to use same quick start scripts Co-authored-by: WafaaT <[email protected]> * Sharvils/spr ssd training (#917) * Dockerfile updated * Update SSD-ResNet34 Inference docs (#866) * Update ResNet34 Inference to use same scripts & docs for all precisions * Update for SPR WW42 Co-authored-by: WafaaT <[email protected]> * Update transformer_mlperf scripts and README fro SPR WW42 (#891) Co-authored-by: Wafaa Taie <[email protected]> * Update TF models spec files for SPR WW42 (#919) * update TF models spec files for spr ww42 * update docker partial for tf addons version * workaround rdma config for spr (#925) * remove supported OS checks (#926) * Update Model paths in main readme (#928) * Remove Linux/windows OS platform support checks (#927) * update resnet50 distributed training script (#923) * resnet50 distributed training: use logical core for ccl (#930) * Update bert scripts to add same quick start scripts to all precisions (#910) * Update MobilenetV1 SPR docs (#931) * Update Resnet50v1_5_SPR_docs (#934) * Update SSD-Mobilenet SPR docs (#935) * Update Resenet50v1.5 inference SPR docs (#933) * Fix DIEN inference.sh script and add pretrained model env var in mobilenetv1 SPR baremetal readme (#939) * Update DIEN Inference and Training SPR docs (#937) * Update SSD-Resnet34 training SPR docs (#936) * Update SSD-Resnet34 Inference SPR docs (#938) * Update README_SPR_baremetal.md remove steps and warm_up steps env vars Co-authored-by: Wafaa Taie <[email protected]> * BERT training dockerfile fixed (#921) * BERT repo version fixed for SPR container (#920) * Update spr baremetal instructions for 3dunet, bert large and transformer mlperf (#932) * Update Transformer MLPerf inference docs for pre-trained models (#940) * Fix Language Translation BERT quickstart scripts (#941) * fix scripts to detect the number of cores * Update mlperf_gnmt docs (#945) * Updating Transformer_LT_official scripts (#913) * Add support for dGPU models (#840) (#948) * Add support for dGPU models (#840) * upgrade Pillow version for Yolov4 * Update main README.md (#947) * update main readme * edit transformer_mlperf and bert SPR docs * remove workflows * Fix CVEs based on Snyk scans in TL notebooks (#951) * fix snyk critical issues in TL jupyter notebooks * Remove INC dependency for Snyk issues (#953) * removed neuralcompressorfor to avoid vulnerability in Snyk scans * Remove pointers to BERT Large int8 docs (#952) * fix int8 model link (#958) * Fixed num_intra_threads for bfloat16 (#959) (#960) * Fixed num_intra_threads for bfloat16 * Modified open mpi instructions * Added kmp_blocktime for bfloat16 Co-authored-by: mahathis <[email protected]> * Fix syntax error and pythonpath in ssd-resnet34 training (#962) (#965) Co-authored-by: Veena2207 <[email protected]> * fix training bkms (#967) (#968) * fix T5 inference script (#969) * Fix resnet50v1.5 weightsharing for int8 (#996) * Corrected typo in SPR quickstart scripts (#991) * fix model_init for int8 weightsharing --------- Co-authored-by: mahathis <[email protected]> * TF SPR DevCatalog READMEs (#983) * add image recognition devcats * add tf object detection devcats * add TF language translation devcats * add tf image segmentation devcats * add tf language modeling devcats * add recommendation tf devcats * fix swapped containers and precision in run command * add README_SPR to all getting started links and correct script names * rename files and point getting started to itself * fix last link * fix minor error (#994) * Update TF SPR ww42 containers partials, spec-files and dockerfiles (#998) TF SPR Containers Built and Validated * Sharvils/tf devcats fixes (#995) Minor fixes to SPR TF DevCatalogs --------- Co-authored-by: sharvil.shah * SPR PyTorch DevCatalogs (#993) Added Devcatalog files targeting SPR container launch * Delete SPR containers README_SPR.md (#999) * delete README_SPR.md * remove references in spec-files * fix for auto-merge --------- Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: YanbingJiang <[email protected]> Co-authored-by: jianan-gu <[email protected]> Co-authored-by: Weizhuo Zhang <[email protected]> Co-authored-by: Dina Suehiro Jones <[email protected]> Co-authored-by: Melanie Buehler <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: leslie-fang-intel <[email protected]> Co-authored-by: xiaofeij <[email protected]> Co-authored-by: jiayisunx <[email protected]> Co-authored-by: zhuhaozhe <[email protected]> Co-authored-by: XiaobingZhang <[email protected]> Co-authored-by: Sean-Michael Riesterer <[email protected]> Co-authored-by: liangan1 <[email protected]> Co-authored-by: Chunyuan WU <[email protected]> Co-authored-by: blzheng <[email protected]> Co-authored-by: Om Thakkar <[email protected]> Co-authored-by: mahathis <[email protected]> Co-authored-by: Srini511 <[email protected]> Co-authored-by: Clayne Robison <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: Neo Zhang Jianyu <[email protected]> Co-authored-by: ltsai1 <[email protected]> Co-authored-by: Jitendra Patil <[email protected]> Co-authored-by: Kanvi Khanna <[email protected]> Co-authored-by: Rahul Nair <[email protected]> Co-authored-by: Veena2207 <[email protected]> Co-authored-by: jojivk-intel-nervana <[email protected]> Co-authored-by: xiangdong <[email protected]> Co-authored-by: Huang, Zhiwei <[email protected]> Co-authored-by: gera-aldama <[email protected]> Co-authored-by: Sharvil Shah <[email protected]> Co-authored-by: wyang2 <[email protected]> Co-authored-by: Yimei Sun <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: tangleintel <[email protected]> Co-authored-by: Syed Shahbaaz Ahmed <[email protected]> Co-authored-by: Er-Xin (Edwin) Shang <[email protected]> Co-authored-by: Ying <[email protected]> Co-authored-by: sevdeawesome <[email protected]> Co-authored-by: DiweiSun <[email protected]> Co-authored-by: Tyler Titsworth <[email protected]> Co-authored-by: Srikanth Ramakrishna <[email protected]>

WafaaT · 2023-05-05T19:18:51Z

@claynerobison this PR still needed?

WafaaT and others added 14 commits September 19, 2022 09:28

Update Pillow to '>=9.3.0' (#884)

59d6cf2

Signed-off-by: Abolfazl Shahbazi <[email protected]> Signed-off-by: Abolfazl Shahbazi <[email protected]>

remove supported OS checks (#926)

d48ffa2

Remove Linux/windows OS platform support checks (#927)

de7dda4

Merge branch 'master' of github.com:intel-innersource/frameworks.ai.m…

e0b3415

…odels.intel-models

Merge branch r2.10 into 'master'

f6bd1ea

Adding README files for Intel® Data Center Flex Series GPUs

5f79cdf

Adding README files for Intel® Data Center Flex Series GPUs (intel#125)

ff7d9c7

Fix broken links for IPEX RN50 and SSD-Mobilenetv1

5e61b33

Merge error

d512bdd

WafaaT force-pushed the master branch from 92bc604 to accca70 Compare April 27, 2023 18:04

WafaaT requested a review from ashahba as a code owner April 27, 2023 18:04

lerealno requested review from jitendra42, lerealno and Mahathi-Vatsal as code owners August 2, 2024 21:58

tfqaprod force-pushed the main branch from c52b6e6 to 499e601 Compare September 17, 2024 15:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R2.10.1 Fixes #126

R2.10.1 Fixes #126

claynerobison commented Mar 13, 2023

WafaaT commented May 5, 2023

R2.10.1 Fixes #126

Are you sure you want to change the base?

R2.10.1 Fixes #126

Conversation

claynerobison commented Mar 13, 2023

WafaaT commented May 5, 2023