Skip to content

Commit

Permalink
Add AMD documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
jakki-amd committed Dec 2, 2024
1 parent d3dcde6 commit 54ef2bd
Show file tree
Hide file tree
Showing 8 changed files with 138 additions and 46 deletions.
43 changes: 18 additions & 25 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,18 +11,7 @@ Your contributions will fall into two categories:
- Search for your issue here: https://github.com/pytorch/serve/issues (look for the "good first issue" tag if you're a first time contributor)
- Pick an issue and comment on the task that you want to work on this feature.
- To ensure your changes doesn't break any of the existing features run the sanity suite as follows from serve directory:
- Install dependencies (if not already installed)
For CPU

```bash
python ts_scripts/install_dependencies.py --environment=dev
```

For GPU
```bash
python ts_scripts/install_dependencies.py --environment=dev --cuda=cu121
```
> Supported cuda versions as cu121, cu118, cu117, cu116, cu113, cu111, cu102, cu101, cu92
- [Install dependencies](#Install-TorchServe-for-development) (if not already installed)
- Install `pre-commit` to your Git flow:
```bash
pre-commit install
Expand Down Expand Up @@ -60,26 +49,30 @@ pytest -k test/pytest/test_mnist_template.py

If you plan to develop with TorchServe and change some source code, you must install it from source code.

Ensure that you have `python3` installed, and the user has access to the site-packages or `~/.local/bin` is added to the `PATH` environment variable.
1. Clone the repository, including third-party modules, with `git clone --recurse-submodules --remote-submodules [email protected]:pytorch/serve.git`
2. Ensure that you have `python3` installed, and the user has access to the site-packages or `~/.local/bin` is added to the `PATH` environment variable.
3. Run the following script from the top of the source directory. NOTE: This script force re-installs `torchserve`, `torch-model-archiver` and `torch-workflow-archiver` if existing installations are found

Run the following script from the top of the source directory.
#### For Debian Based Systems/MacOS

NOTE: This script force re-installs `torchserve`, `torch-model-archiver` and `torch-workflow-archiver` if existing installations are found
```
python ./ts_scripts/install_dependencies.py --environment=dev
python ./ts_scripts/install_from_src.py --environment=dev
```
##### Installing Dependencies for Accelerator Support
Use the optional `--rocm` or `--cuda` flag with `install_dependencies.py` for installing accelerator specific dependencies.

#### For Debian Based Systems/ MacOS

```
python ./ts_scripts/install_dependencies.py --environment=dev
python ./ts_scripts/install_from_src.py --environment=dev
```
Possible values are
- rocm: `rocm61`, `rocm60`
- cuda: `cu111`, `cu102`, `cu101`, `cu92`

Use `--cuda` flag with `install_dependencies.py` for installing cuda version specific dependencies. Possible values are `cu111`, `cu102`, `cu101`, `cu92`
For example `python ./ts_scripts/install_dependencies.py --environment=dev --rocm=rocm61`

#### For Windows
#### For Windows

Refer to the documentation [here](docs/torchserve_on_win_native.md).
Refer to the documentation [here](docs/torchserve_on_win_native.md).

For information about the model archiver, see [detailed documentation](model-archiver/README.md).
For information about the model archiver, see [detailed documentation](model-archiver/README.md).

### What to Contribute?

Expand Down
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,10 @@ curl http://127.0.0.1:8080/predictions/bert -T input.txt

```bash
# Install dependencies
# cuda is optional
python ./ts_scripts/install_dependencies.py

# Include dependencies for accelerator support with the relevant optional flags
python ./ts_scripts/install_dependencies.py --rocm=rocm61
python ./ts_scripts/install_dependencies.py --cuda=cu121

# Latest release
Expand All @@ -36,7 +39,10 @@ pip install torchserve-nightly torch-model-archiver-nightly torch-workflow-archi

```bash
# Install dependencies
# cuda is optional
python ./ts_scripts/install_dependencies.py

# Include depeendencies for accelerator support with the relevant optional flags
python ./ts_scripts/install_dependencies.py --rocm=rocm61
python ./ts_scripts/install_dependencies.py --cuda=cu121

# Latest release
Expand Down
8 changes: 6 additions & 2 deletions docs/contents.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,7 @@
model_zoo
request_envelopes
server
nvidia_mps
snapshot
intel_extension_for_pytorch <https://github.com/pytorch/serve/tree/master/examples/intel_extension_for_pytorch>
torchserve_on_win_native
torchserve_on_wsl
use_cases
Expand All @@ -27,6 +25,12 @@
Security
FAQs

.. toctree::
:maxdepth: 0
:caption: Hardware Support:

hardware_support/hardware_support

.. toctree::
:maxdepth: 0
:caption: Service APIs:
Expand Down
81 changes: 81 additions & 0 deletions docs/hardware_support/amd_support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# AMD Support

TorchServe can be run on any combination of operating system and device that is
[supported by ROCm](https://rocm.docs.amd.com/projects/radeon/en/latest/docs/compatibility.html).

## Supported Versions of ROCm

The current stable `major.patch` version of ROCm and the previous path version will be supported. For example version `N.2` and `N.1` where `N` is the current major version.

## Installation

- Make sure you have **python >= 3.8 installed** on your system.
- clone the repo
```bash
git clone [email protected]:pytorch/serve.git
```

- cd into the cloned folder

```bash
cd serve
```

- create a virtual environment for python

```bash
python -m venv venv
```

- activate the virtual environment. If you use another shell (fish, csh, powershell) use the relevant option in from `/venv/bin/`
```bash
source venv/bin/activate
```

- install the dependencies needed for ROCm support.

```bash
python ./ts_scripts/install_dependencies.py --rocm=rocm61
python ./ts_scripts/install_from_src.py
```
- enable amd-smi in the python virtual environment
```bash
sudo chown -R $USER:$USER /opt/rocm/share/amd_smi/
pip install -e /opt/rocm/share/amd_smi/
```

### Selecting Accelerators Using `HIP_VISIBLE_DEVICES`

If you have multiple accelerators on the system where you are running TorchServe you can select which accelerators should be visible to TorchServe
by setting the environment variable `HIP_VISIBLE_DEVICES` to a string of 0-indexed comma-separated integers representing the ids of the accelerators.

If you have 8 accelerators but only want TorchServe to see the last four of them do `export HIP_VISIBLE_DEVICES=4,5,6,7`.

>ℹ️ **Not setting** `HIP_VISIBLE_DEVICES` will cause TorchServe to use all available accelerators on the system it is running on.

> ⚠️ You can run into trouble if you set `HIP_VISIBLE_DEVICES` to an empty string.
> eg. `export HIP_VISIBLE_DEVICES=` or `export HIP_VISIBLE_DEVICES=""`
> use `unset HIP_VISIBLE_DEVICES` if you want to remove its effect.

> ⚠️ Setting both `CUDA_VISIBLE_DEVICES` and `HIP_VISIBLE_DEVICES` may cause unintended behaviour and should be avoided.
> Doing so may cause an exception in the future.

## Docker

**In Development**

`Dockerfile.rocm` provides preliminary ROCm support for TorchServe.

Building and running `dev-image`:

```bash
docker build --file docker/Dockerfile.rocm --target dev-image -t torch-serve-dev-image-rocm --build-arg USE_ROCM_VERSION=rocm62 --build-arg BUILD_FROM_SRC=true .
docker run -it --rm --device=/dev/kfd --device=/dev/dri torch-serve-dev-image-rocm bash
```

## Example Usage

After installing TorchServe with the required dependencies for ROCm you should be ready to serve your model.

For a simple example, refer to `serve/examples/image_classifier/mnist/`.
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
# Apple Silicon Support
# Apple Silicon Support

## What is supported
## What is supported
* TorchServe CI jobs now include M1 hardware in order to ensure support, [documentation](https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories) on github M1 hardware.
- [Regression Tests](https://github.com/pytorch/serve/blob/master/.github/workflows/regression_tests_cpu.yml)
- [Regression binaries Test](https://github.com/pytorch/serve/blob/master/.github/workflows/regression_tests_cpu_binaries.yml)
- [Regression Tests](https://github.com/pytorch/serve/blob/master/.github/workflows/regression_tests_cpu.yml)
- [Regression binaries Test](https://github.com/pytorch/serve/blob/master/.github/workflows/regression_tests_cpu_binaries.yml)
* For [Docker](https://docs.docker.com/desktop/install/mac-install/) ensure Docker for Apple silicon is installed then follow [setup steps](https://github.com/pytorch/serve/tree/master/docker)

## Experimental Support

* For GPU jobs on Apple Silicon, [MPS](https://pytorch.org/docs/master/notes/mps.html) is now auto detected and enabled. To prevent TorchServe from using MPS, users have to set `deviceType: "cpu"` in model-config.yaml.
* This is an experimental feature and NOT ALL models are guaranteed to work.
* For GPU jobs on Apple Silicon, [MPS](https://pytorch.org/docs/master/notes/mps.html) is now auto detected and enabled. To prevent TorchServe from using MPS, users have to set `deviceType: "cpu"` in model-config.yaml.
* This is an experimental feature and NOT ALL models are guaranteed to work.
* Number of GPUs now reports GPUs on Apple Silicon

### Testing
* [Pytests](https://github.com/pytorch/serve/tree/master/test/pytest/test_device_config.py) that checks for MPS on MacOS M1 devices
### Testing
* [Pytests](https://github.com/pytorch/serve/tree/master/test/pytest/test_device_config.py) that checks for MPS on MacOS M1 devices
* Models that have been tested and work: Resnet-18, Densenet161, Alexnet
* Models that have been tested and DO NOT work: MNIST

Expand All @@ -31,10 +31,10 @@ Config file: N/A
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Model Store:
Model Store:
Initial Models: resnet-18=resnet-18.mar
Log dir:
Metrics dir:
Log dir:
Metrics dir:
Netty threads: 0
Netty client threads: 0
Default workers per model: 16
Expand All @@ -48,7 +48,7 @@ Custom python dependency for model allowed: false
Enable metrics API: true
Metrics mode: LOG
Disable system metrics: false
Workflow Store:
Workflow Store:
CPP log config: N/A
Model config: N/A
024-04-08T14:18:02,380 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin...
Expand All @@ -69,17 +69,17 @@ serve % curl http://127.0.0.1:8080/predictions/resnet-18 -T ./examples/image_cla
}
...
```
#### Conda Example
#### Conda Example

```
(myenv) serve % pip list | grep torch
(myenv) serve % pip list | grep torch
torch 2.2.1
torchaudio 2.2.1
torchdata 0.7.1
torchtext 0.17.1
torchvision 0.17.1
(myenv3) serve % conda install -c pytorch-nightly torchserve torch-model-archiver torch-workflow-archiver
(myenv3) serve % pip list | grep torch
(myenv3) serve % pip list | grep torch
torch 2.2.1
torch-model-archiver 0.10.0b20240312
torch-workflow-archiver 0.2.12b20240312
Expand Down Expand Up @@ -119,11 +119,11 @@ System metrics command: default
2024-03-12T15:58:54,702 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: densenet161, count: 10
Model server started.
...
(myenv3) serve % curl http://127.0.0.1:8080/predictions/densenet161 -T examples/image_classifier/kitten.jpg
(myenv3) serve % curl http://127.0.0.1:8080/predictions/densenet161 -T examples/image_classifier/kitten.jpg
{
"tabby": 0.46661922335624695,
"tiger_cat": 0.46449029445648193,
"Egyptian_cat": 0.0661405548453331,
"lynx": 0.001292439759708941,
"plastic_bag": 0.00022909720428287983
}
}
8 changes: 8 additions & 0 deletions docs/hardware_support/hardware_support.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.. toctree::
:caption: Hardware Support:

amd_support
apple_silicon_support
linux_aarch64
nvidia_mps
Intel Extension for PyTorch <https://github.com/pytorch/serve/tree/master/examples/intel_extension_for_pytorch>
File renamed without changes.
File renamed without changes.

0 comments on commit 54ef2bd

Please sign in to comment.