Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiling fails due to other GPU #2504

Open
Gotbread opened this issue Apr 26, 2024 · 1 comment
Open

Compiling fails due to other GPU #2504

Gotbread opened this issue Apr 26, 2024 · 1 comment

Comments

@Gotbread
Copy link

Issue type

Build/Install

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

2.15

Custom code

No

OS platform and distribution

Linux Ubuntu 22.04 LTS

Mobile device

No response

Python version

3.10

Bazel version

6.1.0

GCC/compiler version

11.4

CUDA/cuDNN version

No response

GPU model and memory

gfx1100 & gfx1036

Current behavior?

I am trying to compile from source, but fails as it tries to compile for gfx1036 too.
I dont want that, i just want the gfx1100 version, but i am unable to disable this.

I tried to disable the iGPU in the bios, but it still shows up to rocminfo and apparently also to the compilation process.

Is there a way to force it to not try to compile for other gfx versions? i just want the gfx1100 version.

I also cant use the prebuild binary, since it contains the "gfx1030gfx1100" bug string, which causes tf to ignore my gpu.

I need a way to disable the iGPU so that rocm does not see it anymore.
this issue is similar to #2292 but i cant find a way to skip this gpu.

Standalone code to reproduce the issue

compile the latest version from source, with a gfx1036 on the system.

Relevant log output

INFO: Found applicable config definition build:dynamic_kernels in file /home/user/custom_tf/tensorflow-upstream/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
WARNING: The following configs were expanded more than once: [rocm, rocm_base, no_tfrt, release_cpu_linux_base]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior.
INFO: Analyzed target //tensorflow/tools/pip_package:wheel (710 packages loaded, 50788 targets configured).
INFO: Found 1 target...
ERROR: /home/user/.cache/bazel/_bazel_user/8ff3c252cf6943b0e4c6e47a965a8647/external/local_xla/xla/service/gpu/BUILD:1412:23: Compiling xla/service/gpu/cub_sort_kernel.cu.cc failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target @local_xla//xla/service/gpu:cub_sort_kernel_f64) 
  (cd /home/user/.cache/bazel/_bazel_user/8ff3c252cf6943b0e4c6e47a965a8647/execroot/org_tensorflow && \
  exec env - \
    CLANG_COMPILER_PATH=/usr/lib/llvm-17/bin/clang \
    PATH=/home/user/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin \
    PWD=/proc/self/cwd \
    PYTHON_BIN_PATH=/usr/bin/python3 \
    PYTHON_LIB_PATH=/usr/lib/python3/dist-packages \
    ROCM_PATH=/opt/rocm-6.0.2 \
    TF2_BEHAVIOR=1 \
    TF_ROCM_CLANG=1 \
  external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++14' -MD -MF bazel-out/k8-opt/bin/external/local_xla/xla/service/gpu/_objs/cub_sort_kernel_f64/cub_sort_kernel.cu.pic.d '-frandom-seed=bazel-out/k8-opt/bin/external/local_xla/xla/service/gpu/_objs/cub_sort_kernel_f64/cub_sort_kernel.cu.pic.o' -fPIC '-DEIGEN_MAX_ALIGN_BYTES=64' -DEIGEN_ALLOW_UNALIGNED_SCALARS '-DEIGEN_USE_AVX512_GEMM_KERNELS=0' '-DTENSORFLOW_USE_ROCM=1' -DCUB_TYPE_F64 '-DBAZEL_CURRENT_REPOSITORY="local_xla"' -iquote external/local_xla -iquote bazel-out/k8-opt/bin/external/local_xla -iquote external/eigen_archive -iquote bazel-out/k8-opt/bin/external/eigen_archive -iquote external/local_config_cuda -iquote bazel-out/k8-opt/bin/external/local_config_cuda -iquote external/local_tsl -iquote bazel-out/k8-opt/bin/external/local_tsl -iquote external/local_config_rocm -iquote bazel-out/k8-opt/bin/external/local_config_rocm -Ibazel-out/k8-opt/bin/external/local_config_cuda/cuda/_virtual_includes/cuda_headers_virtual -isystem external/eigen_archive -isystem bazel-out/k8-opt/bin/external/eigen_archive -isystem external/eigen_archive/mkl_include -isystem bazel-out/k8-opt/bin/external/eigen_archive/mkl_include -isystem external/local_config_cuda/cuda -isystem bazel-out/k8-opt/bin/external/local_config_cuda/cuda -isystem external/local_config_cuda/cuda/cuda/include -isystem bazel-out/k8-opt/bin/external/local_config_cuda/cuda/cuda/include -isystem external/local_config_rocm/rocm -isystem bazel-out/k8-opt/bin/external/local_config_rocm/rocm -isystem external/local_config_rocm/rocm/rocm/include/hipcub -isystem bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include/hipcub -isystem external/local_config_rocm/rocm/rocm/include/rocprim -isystem bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include/rocprim -isystem external/local_config_rocm/rocm/rocm/include -isystem bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include -isystem external/local_config_rocm/rocm/rocm/include/rocrand -isystem bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include/rocrand -isystem external/local_config_rocm/rocm/rocm/include/roctracer -isystem bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include/roctracer -Wno-all -Wno-extra -Wno-deprecated -Wno-deprecated-declarations -Wno-ignored-attributes -Wno-array-bounds -Wunused-result '-Werror=unused-result' -Wswitch '-Werror=switch' '-Wno-error=unused-but-set-variable' -DAUTOLOAD_DYNAMIC_KERNELS -Wno-gnu-offsetof-extensions -Wno-unused-result -Wno-sign-compare -Wno-gnu-offsetof-extensions -Wno-unused-result '-std=c++17' -x rocm '--amdgpu-target=gfx1100' -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' '-DTENSORFLOW_USE_ROCM=1' -D__HIP_PLATFORM_AMD__ -DEIGEN_USE_HIP -no-canonical-prefixes -fno-canonical-system-headers -c external/local_xla/xla/service/gpu/cub_sort_kernel.cu.cc -o bazel-out/k8-opt/bin/external/local_xla/xla/service/gpu/_objs/cub_sort_kernel_f64/cub_sort_kernel.cu.pic.o)
# Configuration: e4ece56677a12dcf02a4cc8466fa0e1a29e7ca5c7dc9c8d9b2f8ab0324debfef
# Execution platform: @local_execution_config_platform//:platform
clang: warning: argument unused during compilation: '-fgpu-flush-denormals-to-zero' [-Wunused-command-line-argument]
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr43 = V_MOV_B32_dpp undef $vgpr43(tied-def 0), $vgpr4, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr4 = V_MOV_B32_dpp undef $vgpr4(tied-def 0), killed $vgpr3, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr3 = V_MOV_B32_dpp undef $vgpr3(tied-def 0), $vgpr2, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr48 = V_MOV_B32_dpp undef $vgpr48(tied-def 0), $vgpr45, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr45 = V_MOV_B32_dpp undef $vgpr45(tied-def 0), $vgpr44, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr48 = V_MOV_B32_dpp undef $vgpr48(tied-def 0), $vgpr45, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr45 = V_MOV_B32_dpp undef $vgpr45(tied-def 0), $vgpr44, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr43 = V_MOV_B32_dpp undef $vgpr43(tied-def 0), $vgpr4, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr48 = V_MOV_B32_dpp undef $vgpr48(tied-def 0), $vgpr45, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr45 = V_MOV_B32_dpp undef $vgpr45(tied-def 0), $vgpr44, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr48 = V_MOV_B32_dpp undef $vgpr48(tied-def 0), $vgpr45, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr45 = V_MOV_B32_dpp undef $vgpr45(tied-def 0), $vgpr44, 322, 15, 15, 0, implicit $exec
12 errors generated when compiling for gfx1036.
Target //tensorflow/tools/pip_package:wheel failed to build
INFO: Elapsed time: 177.097s, Critical Path: 81.30s
INFO: 6265 processes: 1446 internal, 4819 local.
FAILED: Build did NOT complete successfully
@taylding-amd
Copy link

taylding-amd commented Nov 22, 2024

Hi @Gotbread, there are two ways to isolate/disable GPUs:

  1. Setting environment variables, for example: HIP_VISIBLE_DEVICES, here is a link to the documentation of how to setup and use the variables.
  2. Use a docker image, By passing --device /dev/dri, you are granting access to all GPUs on the system. In order to limit access to a subset of GPUs, you can pass each device individually using one or more -device /dev/dri/renderD<node>, where <node> is the card index, starting from 128.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants