Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with Docker, NUMA, and quickstart script for ResNet50-v1.5 on GPU Max 1100 #177

Open
WarrenSchultz opened this issue Apr 16, 2024 · 0 comments

Comments

@WarrenSchultz
Copy link

Running the following (pointing to pre-processed data)

docker pull intel/image-recognition:tf-max-gpu-resnet50v1-5-inference

export DATASET_DIR=~/ImageNet/tf_records
export OUTPUT_DIR=~/resnet50-log

export PRECISION=fp32
IMAGE_NAME=intel/image-recognition:tf-max-gpu-resnet50v1-5-inference
export GPU_TYPE=max_series

docker run \
  --device=/dev/dri \
  --ipc=host \
  --privileged \
  --env PRECISION=${PRECISION} \
  --env GPU_TYPE=${GPU_TYPE} \
  --env OUTPUT_DIR=${OUTPUT_DIR} \
  --env DATASET_DIR=${DATASET_DIR} \
  --env http_proxy=${http_proxy} \
  --env https_proxy=${https_proxy} \
  --env no_proxy=${no_proxy} \
  --volume ${OUTPUT_DIR}:${OUTPUT_DIR} \
  --volume ${DATASET_DIR}:${DATASET_DIR} \
  --rm -it \
  $IMAGE_NAME \
  /bin/bash quickstart/batch_inference.sh

I get the following output:

MODEL_DIR=/workspace/tf-max-series-resnet50v1-5-inference
PRECISION=fp32
OUTPUT_DIR=/home/user/nvmepool/resnet50-log
performance
Running with default batch size of 1024
Precision is fp32
resnet50 v1.5 int8 inference
lspci: Unable to load libkmod resources: error -2
2024-04-16 18:51:30.419047: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-16 18:51:30.420511: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-04-16 18:51:30.450607: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-04-16 18:51:30.450999: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-16 18:51:30.994173: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-04-16 18:51:31.290250: I itex/core/wrapper/itex_cpu_wrapper.cc:42] Intel Extension for Tensorflow* AVX512 CPU backend is loaded.
2024-04-16 18:51:31.763276: I itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow* GPU backend is loaded.
2024-04-16 18:51:31.800082: W itex/core/ops/op_init.cc:58] Op: _QuantizedMaxPool3D is already registered in Tensorflow
2024-04-16 18:51:31.823464: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero
2024-04-16 18:51:31.823492: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-04-16 18:51:31.823496: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-04-16 18:51:31.823499: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-04-16 18:51:31.823502: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
using default data type: float32
Run inference
Inference with dummy data.
WARNING:tensorflow:From /workspace/tf-max-series-resnet50v1-5-inference/models/image_recognition/tensorflow/resnet50v1_5/inference/gpu/int8/eval_image_classifier_inference.py:186: FastGFile.__init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
WARNING:tensorflow:From /usr/local/lib/python3.10/dist-packages/tensorflow/python/tools/strip_unused_lib.py:84: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This API was designed for TensorFlow v1. See https://www.tensorflow.org/guide/migrate for instructions on how to migrate your code to TensorFlow v2.
WARNING:tensorflow:From /usr/local/lib/python3.10/dist-packages/tensorflow/python/tools/optimize_for_inference_lib.py:112: remove_training_nodes (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This API was designed for TensorFlow v1. See https://www.tensorflow.org/guide/migrate for instructions on how to migrate your code to TensorFlow v2.
2024-04-16 18:52:12.985296: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:303] Could not identify NUMA node of platform XPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-04-16 18:52:12.985330: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:303] Could not identify NUMA node of platform XPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-04-16 18:52:12.985335: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:303] Could not identify NUMA node of platform XPU ID 2, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-04-16 18:52:12.985339: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:303] Could not identify NUMA node of platform XPU ID 3, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-04-16 18:52:12.985364: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:269] Created TensorFlow device (/job:localhost/replica:0/task:0/device:XPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: XPU, pci bus id: <undefined>)
2024-04-16 18:52:12.985909: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:269] Created TensorFlow device (/job:localhost/replica:0/task:0/device:XPU:1 with 0 MB memory) -> physical PluggableDevice (device: 1, name: XPU, pci bus id: <undefined>)
2024-04-16 18:52:12.986271: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:269] Created TensorFlow device (/job:localhost/replica:0/task:0/device:XPU:2 with 0 MB memory) -> physical PluggableDevice (device: 2, name: XPU, pci bus id: <undefined>)
2024-04-16 18:52:12.986843: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:269] Created TensorFlow device (/job:localhost/replica:0/task:0/device:XPU:3 with 0 MB memory) -> physical PluggableDevice (device: 3, name: XPU, pci bus id: <undefined>)
2024-04-16 18:52:12.989942: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:303] Could not identify NUMA node of platform XPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-04-16 18:52:12.989963: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:303] Could not identify NUMA node of platform XPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-04-16 18:52:12.989968: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:303] Could not identify NUMA node of platform XPU ID 2, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-04-16 18:52:12.989971: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:303] Could not identify NUMA node of platform XPU ID 3, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-04-16 18:52:12.989982: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:269] Created TensorFlow device (/job:localhost/replica:0/task:0/device:XPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: XPU, pci bus id: <undefined>)
2024-04-16 18:52:12.989992: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:269] Created TensorFlow device (/job:localhost/replica:0/task:0/device:XPU:1 with 0 MB memory) -> physical PluggableDevice (device: 1, name: XPU, pci bus id: <undefined>)
2024-04-16 18:52:12.989998: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:269] Created TensorFlow device (/job:localhost/replica:0/task:0/device:XPU:2 with 0 MB memory) -> physical PluggableDevice (device: 2, name: XPU, pci bus id: <undefined>)
2024-04-16 18:52:12.990006: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:269] Created TensorFlow device (/job:localhost/replica:0/task:0/device:XPU:3 with 0 MB memory) -> physical PluggableDevice (device: 3, name: XPU, pci bus id: <undefined>)
2024-04-16 18:52:12.990775: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:375] MLIR V1 optimization pass is not enabled
2024-04-16 18:52:12.998438: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type XPU is enabled.
2024-04-16 18:52:15.343963: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type XPU is enabled.
Iteration 1: 43.758534 sec
Iteration 2: 0.000638 sec
...

Drivers all are reporting functionality is correct. NUMA is working properly in the host OS (Ubuntu 22.04) with kernel updated. I've tried running the Docker container with the --cap-add SYS_NICE flag. Still the same output on logs. FP32 and FP16 all perform as expected. INT8, oddly, performs slower than either.

Any ideas?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant