740 add generic support for different gpu hardware #3371

jakki-amd · 2024-12-02T08:01:47Z

Description

This PR decouples the hardware layer from the front- and backend of TorchServe.
Relates to #740

'Add AMD backend support'
Rony Leppänen [email protected]

'Add AMD frontend support'
Anders Smedegaard Pedersen [email protected]

'Add Dockerfile.rocm'
Samu Tamminen [email protected]
Jarkko Lehtiranta [email protected]

'Add AMD documentation'
Anders Smedegaard Pedersen [email protected]
Rony Leppänen [email protected]
Jarkko Lehtiranta [email protected]

Other contributions:

Bipradip Chowdhury [email protected]
Jarkko Vainio [email protected]
Tero Kemppi [email protected]

Requirement Files

Added requirements/torch_rocm62.txt, requirements/torch_rocm61.txt and requirements/torch_rocm60.txt for easy install of dependencies needed for AMD support.

Backend

The Python backend supports currently NVIDIA GPUs using hardware specific libraries. There were also a number of functions that could be refactored using more generalized interfaces.

Changes Made to Backend

Use torch.cuda for detecting GPU availability and torch.version for differentiating between GPU vendors (NVIDIA, AMD)
Use torch.cuda for collecting GPU metrics
- Exclude nvgpu library usage which is a quick and dirty solution calling nvidia-smi and parsing its output
- Currently temporary solution for AMD GPUs which relies on using amdsmi library directly
- When the bug is changed in torch.cuda, same functions can be used for collecting metrics from different GPUs (NVIDIA, AMD)
Extend print_env_info for AMD GPUs and reimplement a number of functions
- Detect versions of HIP runtime, ROCm and MIOpen
- Collect model names of available GPUs with torch.cuda (NVIDIA, AMD)
- Use pynvml for detecting nvidia driver and cuda versions
- Use torch for detecting compiled cuda and cudnn versions
Refactor nvidia-specific code in several places

Frontend

The Java frontend that acts as the workload manager had calls to SMIs hard-coded in a few places. This made it difficult for TorchServe to support multiple hardware vendors in a graceful manner.

Changes Made to Frontend

We've introduced a new package org.pytorch.serve.device with the classes SystemInfo and Accelerator. SystemInfo holds an array list of Accelerator objects that holds static information about the specific accelerators on a machine, and the relevant metrics.

Instead of calling the SMIs directly in multiple places in the frontend code we have abstracted the hardware away by adding an instance of SystemInfo to the pre-existing ConfigManager. Now the frontend can get data from the hardware via the methods on SystemInfo without knowing about the specifics of the hardware and SMIs.

To implement the specifics for each of the vendors that was already partially supported we have created a number of utility classes that communicates with the hardware via the relevant SMI.

The following steps are taken in the SystemInfo constructor.

Detect the relevant vendor by calling which {relevant smi} for each of the supported vendors.
This is how vendor detection was done previously. There might be more robust ways. where is used on Windows systems.
When the accelerator vendor is detected it creates an instance of the relevant utility class , for example ROCmUtility for AMD.
Accelerators are detected, respecting the relevant environment variable for selecting devices. HIP_VISIBLE_DEVICES for AMD, CUDA_VISIBLE_DEVICES for nvidia and XPU_VISIBLE_DEVICES for Intel. All devices are detected if the relevant environment variable is not set.
Finally the metrics for the detected devices are updated

The following is a class diagram showing how the new classes relate to the existing code

classDiagram
    class Accelerator {
        +Integer id
        +AcceleratorVendor vendor
        +String model
        +IAcceleratorUtility acceleratorUtility
        +Float usagePercentage
        +Float memoryUtilizationPercentage
        +Integer memoryAvailableMegabytes
        +Integer memoryUtilizationMegabytes
        +getVendor()
        +getAcceleratorModel()
        +getAcceleratorId()
        +getMemoryAvailableMegaBytes()
        +getUsagePercentage()
        +getMemoryUtilizationPercentage()
        +getMemoryUtilizationMegabytes()
        +setMemoryAvailableMegaBytes()
        +setUsagePercentage()
        +setMemoryUtilizationPercentage()
        +setMemoryUtilizationMegabytes()
        +utilizationToString()
        +updateDynamicAttributes()
    }

    class SystemInfo {
        -AcceleratorVendor acceleratorVendor
        -ArrayList<Accelerator> accelerators
        -IAcceleratorUtility acceleratorUtil
        +hasAccelerators()
        +getNumberOfAccelerators()
        +getAccelerators()
        +updateAcceleratorMetrics()
    }

    class AcceleratorVendor {
        <<enumeration>>
        AMD
        NVIDIA
        INTEL
        APPLE
        UNKNOWN
    }

    class IAcceleratorUtility {
        <<interface>>
        +getGpuEnvVariableName()
        +getUtilizationSmiCommand()
        +getAvailableAccelerators()
        +smiOutputToUpdatedAccelerators()
        +getUpdatedAcceleratorsUtilization()
    }

    class ICsvSmiParser {
        <<interface>>
        +csvSmiOutputToAccelerators()
    }

    class IJsonSmiParser {
        <<interface>>
        +jsonOutputToAccelerators()
        +extractAcceleratorId()
        +jsonObjectToAccelerator()
        +extractAccelerators()
    }

    class CudaUtil {
        +getGpuEnvVariableName()
        +getUtilizationSmiCommand()
        +getAvailableAccelerators()
        +smiOutputToUpdatedAccelerators()
        +parseAccelerator()
        +parseUpdatedAccelerator()
    }

    class ROCmUtil {
        +getGpuEnvVariableName()
        +getUtilizationSmiCommand()
        +getAvailableAccelerators()
        +smiOutputToUpdatedAccelerators()
        +extractAccelerators()
        +extractAcceleratorId()
        +jsonObjectToAccelerator()
    }

    class XpuUtil {
        +getGpuEnvVariableName()
        +getUtilizationSmiCommand()
        +getAvailableAccelerators()
        +smiOutputToUpdatedAccelerators()
        +parseDiscoveryOutput()
        +parseUtilizationOutput()
    }

    class AppleUtil {
        +getGpuEnvVariableName()
        +getUtilizationSmiCommand()
        +getAvailableAccelerators()
        +smiOutputToUpdatedAccelerators()
        +jsonObjectToAccelerator()
        +extractAcceleratorId()
        +extractAccelerators()
    }

        class ConfigManager {
        -SystemInfo systemInfo
        +init(Arguments args)
    }

    class WorkerThread {
        #ConfigManager configManager
        #WorkerLifeCycle lifeCycle
    }

    class AsyncWorkerThread {
        #boolean loadingFinished
        #CountDownLatch latch
        +run()
        #connect()
    }

    class SystemInfo {
        -Logger logger
        -AcceleratorVendor acceleratorVendor
        -ArrayList<Accelerator> accelerators
        -IAcceleratorUtility acceleratorUtil
        +SystemInfo()
        -createAcceleratorUtility() IAcceleratorUtility
        -populateAccelerators()
        +hasAccelerators() boolean
        +getNumberOfAccelerators() Integer
        +static detectVendorType() AcceleratorVendor
        -static isCommandAvailable(String) boolean
        +getAccelerators() ArrayList<Accelerator>
        -updateAccelerators(List<Accelerator>)
        +updateAcceleratorMetrics()
        +getAcceleratorVendor() AcceleratorVendor
        +getVisibleDevicesEnvName() String
    }

    class Accelerator {
        +Integer id
        +AcceleratorVendor vendor
        +String model
        +Float usagePercentage
        +Float memoryUtilizationPercentage
        +Integer memoryAvailableMegabytes
        +Integer memoryUtilizationMegabytes
        +getVendor() AcceleratorVendor
        +getAcceleratorModel() String
        +getAcceleratorId() Integer
        +getUsagePercentage() Float
        +setUsagePercentage(Float)
        +setMemoryUtilizationPercentage(Float)
        +setMemoryUtilizationMegabytes(Integer)
    }

    class WorkerLifeCycle {
        -ConfigManager configManager
        -ModelManager modelManager
        -Model model
    }

    class WorkerThread {
        #ConfigManager configManager
        #int port
        #Model model
        #WorkerState state
        #WorkerLifeCycle lifeCycle

    }

    WorkerLifeCycle --> "1" ConfigManager
    WorkerLifeCycle --> "1" Model
    WorkerLifeCycle --> "1" Connector
    WorkerThread --> "1" WorkerLifeCycle

    ConfigManager "1" --> "1" SystemInfo
    ConfigManager "1" --> "*" Accelerator
    WorkerThread --> "1" ConfigManager

    WorkerThread --> "1" WorkerLifeCycle
    AsyncWorkerThread --|> WorkerThread

    SystemInfo --> "0..*" Accelerator
    SystemInfo --> "1" IAcceleratorUtility
    SystemInfo --> "1" AcceleratorVendor
    Accelerator --> "1" AcceleratorVendor
    CudaUtil ..|> IAcceleratorUtility
    CudaUtil ..|> ICsvSmiParser
    ROCmUtil ..|> IAcceleratorUtility
    ROCmUtil ..|> IJsonSmiParser
    XpuUtil ..|> IAcceleratorUtility
    XpuUtil ..|> ICsvSmiParser
    AppleUtil ..|> IAcceleratorUtility
    AppleUtil ..|> IJsonSmiParser

Documentation

Added the section "Hardware Support" in the table of contents
Moved the pages about hardware support to serve/docs/hardware_support/ and added them under "Hardware Support" in the TOC
Added the page "AMD Support"

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing

We build new docker container for ROCm using Dockerfile.rocm and build argument USE_ROCM_VERSION. For other platforms we used build_image.sh script.

# AMD instance
docker build -f docker/Dockerfile.rocm -t torch-serve-dev-image-rocm --build-arg USE_ROCM_VERSION=rocm62 --build-arg BUILD_FROM_SRC=true .

Run containers

# AMD instance
docker run --rm -it -w /serve --device=/dev/kfd --device=/dev/dri torch-serve-dev-image-rocm bash

Tests

Frontend tests, CPU

Logs:

> ./frontend/gradlew -p frontend clean build
...
BUILD SUCCESSFUL in 6m 35s

Frontend tests, CUDA

Logs:

> ./frontend/gradlew -p frontend clean build
...
BUILD SUCCESSFUL in 6m 5s

Frontend tests, ROCm

Logs:

> ./frontend/gradlew -p frontend clean build
...
BUILD SUCCESSFUL in 6m 43s

Backend tests, CPU

Logs:

> python3 -m pytest ts/tests/unit_tests ts/torch_handler/unit_tests
============================================================================ 113 passed, 30 warnings in 38.09s ============================================================================
> cd workflow-archiver && python3 -m pytest workflow_archiver/tests/unit_tests workflow_archiver/tests/integ_tests
=================================================================================== 20 passed in 0.36s ====================================================================================
> cd model-archiver && python3 -m pytest model_archiver/tests/unit_tests model_archiver/tests/integ_tests
=================================================================================== 33 passed in 0.20s ====================================================================================

Backend tests, CUDA

Logs:

> python3 -m pytest ts/tests/unit_tests ts/torch_handler/unit_tests
======================================================================= 113 passed, 21 warnings in 83.76s (0:01:23) =======================================================================
> cd workflow-archiver && python3 -m pytest workflow_archiver/tests/unit_tests workflow_archiver/tests/integ_tests
=================================================================================== 20 passed in 0.31s ====================================================================================
> cd model-archiver && python3 -m pytest model_archiver/tests/unit_tests model_archiver/tests/integ_tests
=================================================================================== 33 passed in 0.20s ====================================================================================

Backend tests, ROCm

Logs:

> python3 -m pytest ts/tests/unit_tests ts/torch_handler/unit_tests
============================ 113 passed, 21 warnings in 48.06s ============================
> cd workflow-archiver && python3 -m pytest workflow_archiver/tests/unit_tests workflow_archiver/tests/integ_tests
=================================== 20 passed in 0.32s ====================================
> cd model-archiver && python3 -m pytest model_archiver/tests/unit_tests model_archiver/tests/integ_tests
=================================== 33 passed in 0.16s ====================================

Regression tests, CPU

Logs:

> git submodule update --init --recursive
> python3 test/regression_tests.py
================================================================ 163 passed, 40 skipped, 15 warnings in 2014.67s (0:33:34) ================================================================

Regression tests, CUDA

Logs:

> git submodule update --init --recursive
> python3 test/regression_tests.py
====================================================== 156 passed, 47 skipped, 10 warnings in 8067.30s (2:14:27) =======================================================

Regression tests, ROCm

Logs:

> git submodule update --init --recursive
> python3 test/regression_tests.py
FAILED test_handler.py::test_huggingface_bert_model_parallel_inference - assert 'Bloomberg has decided to publish a new report on the global economy' in '{\n  ...
=========== 1 failed, 162 passed, 40 skipped, 11 warnings in 2085.45s (0:34:45) ===========

OBS! The test test_handler.py::test_huggingface_bert_model_parallel_inference fails due to:

ValueError: Input length of input_ids is 150, but max_length is set to 50. This can lead to unexpected behavior. You should consider increasing max_length or, better yet, setting max_new_tokens.

This indicates that preprocessing uses a different max_length than inference, which can be verified when looking at the handler when the test was originally implemented: model.generate() has max_length=50 by default, while tokenizer uses max_length from setup_config (max_length=150). It seems that the BERT-based Textgeneration.mar needs an update.

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

Co-authored-by: Samu Tamminen <[email protected]>

agunapal

Thank you for the PR. We are still reviewing it

agunapal · 2024-12-04T01:16:42Z

docker/Dockerfile.rocm

+#       For reference:
+#           https://docs.docker.com/develop/develop-images/build_enhancements/
+
+ARG BASE_IMAGE=ubuntu:24.04


The rest of TorchServe images are still on ubuntu 20.04 as we had issues with github runners with versions greater. Haven't tried this in a while.

agunapal · 2024-12-04T01:29:27Z

requirements/torch_rocm62.txt

+--index-url https://download.pytorch.org/whl/rocm6.2
+torch==2.5.1+rocm6.2; sys_platform == 'linux'
+torchvision==0.20.1+rocm6.2; sys_platform == 'linux'
+torchaudio==2.5.1+rocm6.2; sys_platform == 'linux'


We haven't yet updated the Pytorch version for the rest of the project. But this should be ok. I will update it for other platforms too

facebook-github-bot added the module: rocm label Dec 2, 2024

eppaneamd and others added 2 commits December 2, 2024 11:36

Add AMD backend support

0b826c9

Add AMD frontend support

22f83ff

jakki-amd force-pushed the 740-add-generic-support-for-different-GPU-hardware branch from 70c500c to 54ef2bd Compare December 2, 2024 09:38

jakki-amd marked this pull request as ready for review December 2, 2024 10:49

jakki-amd and others added 2 commits December 2, 2024 12:57

Add Dockerfile.rocm

8ec3d80

Co-authored-by: Samu Tamminen <[email protected]>

Add AMD documentation

bc96fa7

jakki-amd force-pushed the 740-add-generic-support-for-different-GPU-hardware branch from 54ef2bd to bc96fa7 Compare December 2, 2024 10:59

jeffdaily approved these changes Dec 2, 2024

View reviewed changes

agunapal reviewed Dec 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

740 add generic support for different gpu hardware #3371

740 add generic support for different gpu hardware #3371

jakki-amd commented Dec 2, 2024 •

edited

Loading

agunapal left a comment

agunapal Dec 4, 2024

agunapal Dec 4, 2024

740 add generic support for different gpu hardware #3371

Are you sure you want to change the base?

740 add generic support for different gpu hardware #3371

Conversation

jakki-amd commented Dec 2, 2024 • edited Loading

Description

Requirement Files

Backend

Changes Made to Backend

Frontend

Changes Made to Frontend

Documentation

Type of change

Feature/Issue validation/testing

Tests

Checklist:

agunapal left a comment

Choose a reason for hiding this comment

agunapal Dec 4, 2024

Choose a reason for hiding this comment

agunapal Dec 4, 2024

Choose a reason for hiding this comment

jakki-amd commented Dec 2, 2024 •

edited

Loading