diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index a25e754761..952bb1fb5b 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -11,18 +11,7 @@ Your contributions will fall into two categories: - Search for your issue here: https://github.com/pytorch/serve/issues (look for the "good first issue" tag if you're a first time contributor) - Pick an issue and comment on the task that you want to work on this feature. - To ensure your changes doesn't break any of the existing features run the sanity suite as follows from serve directory: - - Install dependencies (if not already installed) - For CPU - - ```bash - python ts_scripts/install_dependencies.py --environment=dev - ``` - - For GPU - ```bash - python ts_scripts/install_dependencies.py --environment=dev --cuda=cu121 - ``` - > Supported cuda versions as cu121, cu118, cu117, cu116, cu113, cu111, cu102, cu101, cu92 + - [Install dependencies](#Install-TorchServe-for-development) (if not already installed) - Install `pre-commit` to your Git flow: ```bash pre-commit install @@ -60,26 +49,30 @@ pytest -k test/pytest/test_mnist_template.py If you plan to develop with TorchServe and change some source code, you must install it from source code. -Ensure that you have `python3` installed, and the user has access to the site-packages or `~/.local/bin` is added to the `PATH` environment variable. +1. Clone the repository, including third-party modules, with `git clone --recurse-submodules --remote-submodules git@github.com:pytorch/serve.git` +2. Ensure that you have `python3` installed, and the user has access to the site-packages or `~/.local/bin` is added to the `PATH` environment variable. +3. Run the following script from the top of the source directory. NOTE: This script force re-installs `torchserve`, `torch-model-archiver` and `torch-workflow-archiver` if existing installations are found -Run the following script from the top of the source directory. + #### For Debian Based Systems/MacOS -NOTE: This script force re-installs `torchserve`, `torch-model-archiver` and `torch-workflow-archiver` if existing installations are found + ``` + python ./ts_scripts/install_dependencies.py --environment=dev + python ./ts_scripts/install_from_src.py --environment=dev + ``` + ##### Installing Dependencies for Accelerator Support + Use the optional `--rocm` or `--cuda` flag with `install_dependencies.py` for installing accelerator specific dependencies. -#### For Debian Based Systems/ MacOS - -``` -python ./ts_scripts/install_dependencies.py --environment=dev -python ./ts_scripts/install_from_src.py --environment=dev -``` + Possible values are + - rocm: `rocm61`, `rocm60` + - cuda: `cu111`, `cu102`, `cu101`, `cu92` -Use `--cuda` flag with `install_dependencies.py` for installing cuda version specific dependencies. Possible values are `cu111`, `cu102`, `cu101`, `cu92` + For example `python ./ts_scripts/install_dependencies.py --environment=dev --rocm=rocm61` -#### For Windows + #### For Windows -Refer to the documentation [here](docs/torchserve_on_win_native.md). + Refer to the documentation [here](docs/torchserve_on_win_native.md). -For information about the model archiver, see [detailed documentation](model-archiver/README.md). + For information about the model archiver, see [detailed documentation](model-archiver/README.md). ### What to Contribute? diff --git a/README.md b/README.md index a74b952708..200dcc5269 100644 --- a/README.md +++ b/README.md @@ -22,7 +22,10 @@ curl http://127.0.0.1:8080/predictions/bert -T input.txt ```bash # Install dependencies -# cuda is optional +python ./ts_scripts/install_dependencies.py + +# Include dependencies for accelerator support with the relevant optional flags +python ./ts_scripts/install_dependencies.py --rocm=rocm61 python ./ts_scripts/install_dependencies.py --cuda=cu121 # Latest release @@ -36,7 +39,10 @@ pip install torchserve-nightly torch-model-archiver-nightly torch-workflow-archi ```bash # Install dependencies -# cuda is optional +python ./ts_scripts/install_dependencies.py + +# Include depeendencies for accelerator support with the relevant optional flags +python ./ts_scripts/install_dependencies.py --rocm=rocm61 python ./ts_scripts/install_dependencies.py --cuda=cu121 # Latest release diff --git a/docs/contents.rst b/docs/contents.rst index 1ba7e83e32..c42a6a3076 100644 --- a/docs/contents.rst +++ b/docs/contents.rst @@ -16,9 +16,7 @@ model_zoo request_envelopes server - nvidia_mps snapshot - intel_extension_for_pytorch torchserve_on_win_native torchserve_on_wsl use_cases @@ -27,6 +25,12 @@ Security FAQs +.. toctree:: + :maxdepth: 0 + :caption: Hardware Support: + + hardware_support/hardware_support + .. toctree:: :maxdepth: 0 :caption: Service APIs: diff --git a/docs/hardware_support/amd_support.md b/docs/hardware_support/amd_support.md new file mode 100644 index 0000000000..55de40f6d4 --- /dev/null +++ b/docs/hardware_support/amd_support.md @@ -0,0 +1,81 @@ +# AMD Support + +TorchServe can be run on any combination of operating system and device that is +[supported by ROCm](https://rocm.docs.amd.com/projects/radeon/en/latest/docs/compatibility.html). + +## Supported Versions of ROCm + +The current stable `major.patch` version of ROCm and the previous path version will be supported. For example version `N.2` and `N.1` where `N` is the current major version. + +## Installation + + - Make sure you have **python >= 3.8 installed** on your system. + - clone the repo + ```bash + git clone git@github.com:pytorch/serve.git + ``` + + - cd into the cloned folder + + ```bash + cd serve + ``` + + - create a virtual environment for python + + ```bash + python -m venv venv + ``` + + - activate the virtual environment. If you use another shell (fish, csh, powershell) use the relevant option in from `/venv/bin/` + ```bash + source venv/bin/activate + ``` + + - install the dependencies needed for ROCm support. + + ```bash + python ./ts_scripts/install_dependencies.py --rocm=rocm61 + python ./ts_scripts/install_from_src.py + ``` + - enable amd-smi in the python virtual environment + ```bash + sudo chown -R $USER:$USER /opt/rocm/share/amd_smi/ + pip install -e /opt/rocm/share/amd_smi/ + ``` + +### Selecting Accelerators Using `HIP_VISIBLE_DEVICES` + +If you have multiple accelerators on the system where you are running TorchServe you can select which accelerators should be visible to TorchServe +by setting the environment variable `HIP_VISIBLE_DEVICES` to a string of 0-indexed comma-separated integers representing the ids of the accelerators. + +If you have 8 accelerators but only want TorchServe to see the last four of them do `export HIP_VISIBLE_DEVICES=4,5,6,7`. + +>ℹ️ **Not setting** `HIP_VISIBLE_DEVICES` will cause TorchServe to use all available accelerators on the system it is running on. + +> ⚠️ You can run into trouble if you set `HIP_VISIBLE_DEVICES` to an empty string. +> eg. `export HIP_VISIBLE_DEVICES=` or `export HIP_VISIBLE_DEVICES=""` +> use `unset HIP_VISIBLE_DEVICES` if you want to remove its effect. + +> ⚠️ Setting both `CUDA_VISIBLE_DEVICES` and `HIP_VISIBLE_DEVICES` may cause unintended behaviour and should be avoided. +> Doing so may cause an exception in the future. + +## Docker + +**In Development** + +`Dockerfile.rocm` provides preliminary ROCm support for TorchServe. + +Building and running `dev-image`: + +```bash +docker build --file docker/Dockerfile.rocm --target dev-image -t torch-serve-dev-image-rocm --build-arg USE_ROCM_VERSION=rocm62 --build-arg BUILD_FROM_SRC=true . + +docker run -it --rm --device=/dev/kfd --device=/dev/dri torch-serve-dev-image-rocm bash +``` + +## Example Usage + +After installing TorchServe with the required dependencies for ROCm you should be ready to serve your model. + +For a simple example, refer to `serve/examples/image_classifier/mnist/`. diff --git a/docs/apple_silicon_support.md b/docs/hardware_support/apple_silicon_support.md similarity index 90% rename from docs/apple_silicon_support.md rename to docs/hardware_support/apple_silicon_support.md index facd8a7f28..6e0f479b8a 100644 --- a/docs/apple_silicon_support.md +++ b/docs/hardware_support/apple_silicon_support.md @@ -1,19 +1,19 @@ -# Apple Silicon Support +# Apple Silicon Support -## What is supported +## What is supported * TorchServe CI jobs now include M1 hardware in order to ensure support, [documentation](https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories) on github M1 hardware. - - [Regression Tests](https://github.com/pytorch/serve/blob/master/.github/workflows/regression_tests_cpu.yml) - - [Regression binaries Test](https://github.com/pytorch/serve/blob/master/.github/workflows/regression_tests_cpu_binaries.yml) + - [Regression Tests](https://github.com/pytorch/serve/blob/master/.github/workflows/regression_tests_cpu.yml) + - [Regression binaries Test](https://github.com/pytorch/serve/blob/master/.github/workflows/regression_tests_cpu_binaries.yml) * For [Docker](https://docs.docker.com/desktop/install/mac-install/) ensure Docker for Apple silicon is installed then follow [setup steps](https://github.com/pytorch/serve/tree/master/docker) ## Experimental Support -* For GPU jobs on Apple Silicon, [MPS](https://pytorch.org/docs/master/notes/mps.html) is now auto detected and enabled. To prevent TorchServe from using MPS, users have to set `deviceType: "cpu"` in model-config.yaml. - * This is an experimental feature and NOT ALL models are guaranteed to work. +* For GPU jobs on Apple Silicon, [MPS](https://pytorch.org/docs/master/notes/mps.html) is now auto detected and enabled. To prevent TorchServe from using MPS, users have to set `deviceType: "cpu"` in model-config.yaml. + * This is an experimental feature and NOT ALL models are guaranteed to work. * Number of GPUs now reports GPUs on Apple Silicon -### Testing -* [Pytests](https://github.com/pytorch/serve/tree/master/test/pytest/test_device_config.py) that checks for MPS on MacOS M1 devices +### Testing +* [Pytests](https://github.com/pytorch/serve/tree/master/test/pytest/test_device_config.py) that checks for MPS on MacOS M1 devices * Models that have been tested and work: Resnet-18, Densenet161, Alexnet * Models that have been tested and DO NOT work: MNIST @@ -31,10 +31,10 @@ Config file: N/A Inference address: http://127.0.0.1:8080 Management address: http://127.0.0.1:8081 Metrics address: http://127.0.0.1:8082 -Model Store: +Model Store: Initial Models: resnet-18=resnet-18.mar -Log dir: -Metrics dir: +Log dir: +Metrics dir: Netty threads: 0 Netty client threads: 0 Default workers per model: 16 @@ -48,7 +48,7 @@ Custom python dependency for model allowed: false Enable metrics API: true Metrics mode: LOG Disable system metrics: false -Workflow Store: +Workflow Store: CPP log config: N/A Model config: N/A 024-04-08T14:18:02,380 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin... @@ -69,17 +69,17 @@ serve % curl http://127.0.0.1:8080/predictions/resnet-18 -T ./examples/image_cla } ... ``` -#### Conda Example +#### Conda Example ``` -(myenv) serve % pip list | grep torch +(myenv) serve % pip list | grep torch torch 2.2.1 torchaudio 2.2.1 torchdata 0.7.1 torchtext 0.17.1 torchvision 0.17.1 (myenv3) serve % conda install -c pytorch-nightly torchserve torch-model-archiver torch-workflow-archiver -(myenv3) serve % pip list | grep torch +(myenv3) serve % pip list | grep torch torch 2.2.1 torch-model-archiver 0.10.0b20240312 torch-workflow-archiver 0.2.12b20240312 @@ -119,11 +119,11 @@ System metrics command: default 2024-03-12T15:58:54,702 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: densenet161, count: 10 Model server started. ... -(myenv3) serve % curl http://127.0.0.1:8080/predictions/densenet161 -T examples/image_classifier/kitten.jpg +(myenv3) serve % curl http://127.0.0.1:8080/predictions/densenet161 -T examples/image_classifier/kitten.jpg { "tabby": 0.46661922335624695, "tiger_cat": 0.46449029445648193, "Egyptian_cat": 0.0661405548453331, "lynx": 0.001292439759708941, "plastic_bag": 0.00022909720428287983 -} \ No newline at end of file +} diff --git a/docs/hardware_support/hardware_support.rst b/docs/hardware_support/hardware_support.rst new file mode 100644 index 0000000000..267525fc65 --- /dev/null +++ b/docs/hardware_support/hardware_support.rst @@ -0,0 +1,8 @@ +.. toctree:: + :caption: Hardware Support: + + amd_support + apple_silicon_support + linux_aarch64 + nvidia_mps + Intel Extension for PyTorch diff --git a/docs/linux_aarch64.md b/docs/hardware_support/linux_aarch64.md similarity index 100% rename from docs/linux_aarch64.md rename to docs/hardware_support/linux_aarch64.md diff --git a/docs/nvidia_mps.md b/docs/hardware_support/nvidia_mps.md similarity index 100% rename from docs/nvidia_mps.md rename to docs/hardware_support/nvidia_mps.md