Skip to content

Cluster Setup

Aleksandr Movchan edited this page Jul 1, 2024 · 1 revision

Based on the documentation, Ray supports the following cloud providers out of the box: AWS, Azure, GCP, Aliyun, vSphere, and KubeRay. We can also implement the node provider interface to use Ray on other cloud providers like Oracle Cloud but it requires implementing the node provider manually which is a bit more work.

So Kubernetes with KubeRay would be the best choice if you want to use Ray on Oracle Cloud.

Aana on Kubernetes

Step 1: Create a Kubernetes cluster

The first step is to create a Kubernetes cluster on the cloud provider of your choice. Ray has instructions on how to do this for AWS, Azure, and GCP in Managed Kubernetes services docs.

Step 2: Deploy Ray on Kubernetes

Once you have a Kubernetes cluster, you need to install KubeRay on it. KubeRay is a Kubernetes operator that manages Ray clusters on Kubernetes. You can install KubeRay using Helm. Here is an example of how to install KubeRay on a Kubernetes cluster:

helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update

# Install both CRDs and KubeRay operator v1.1.1.
helm install kuberay-operator kuberay/kuberay-operator --version 1.1.1

# Confirm that the operator is running in the namespace `default`.
kubectl get pods
# NAME                                READY   STATUS    RESTARTS   AGE
# kuberay-operator-7fbdbf8c89-pt8bk   1/1     Running   0          27s

KubeRay offers multiple options for operator installations, such as Helm, Kustomize, and a single-namespaced operator. For further information, please refer to the installation instructions in the KubeRay documentation.

Step 3: Create a YAML file for your application

Next, you need to create a YAML file that describes your Ray application. See the example below to get an idea of what the YAML file should look like:

apiVersion: ray.io/v1
kind: RayService
metadata:
  name: <service-name>
spec:
  serviceUnhealthySecondThreshold: 900 # Config for the health check threshold for Ray Serve applications. Default value is 900.
  deploymentUnhealthySecondThreshold: 900 # Config for the health check threshold for Ray dashboard agent. Default value is 900.
  serveConfigV2: |
    <serve config generated by aana build>

  rayClusterConfig:
    rayVersion: '2.20.0' # Should match the Ray version in the image of the containers
    # Ray head pod template.
    headGroupSpec:
      # The `rayStartParams` are used to configure the `ray start` command.
      # See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
      # See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
      rayStartParams:
        dashboard-host: '0.0.0.0'
      # Pod template
      template:
        spec:
          containers:
          - name: ray-head
            image: <base image for the application>
            ports:
            - containerPort: 6379
              name: gcs
            - containerPort: 8265
              name: dashboard
            - containerPort: 10001
              name: client
            - containerPort: 8000
              name: serve
            resources:
              limits:
                cpu: "3" # CPU limit for the head pod
                memory: "28G" # Memory limit for the head pod
                ephemeral-storage: "95Gi" # Ephemeral storage limit for the head pod
              requests:
                cpu: "3" # CPU request for the head pod
                memory: "28G" # Memory request for the head pod
                ephemeral-storage: "95Gi" # Ephemeral storage request for the head pod
    workerGroupSpecs:
    # The pod replicas in this group typed worker
    - replicas: 1 # Number of worker nodes
      minReplicas: 1
      maxReplicas: 10
      groupName: gpu-group
      rayStartParams: {}
      # Pod template
      template:
        spec:
          containers:
          - name: ray-worker
            image: <base image for the application>
            resources:
              limits:
                cpu: "3" # CPU limit for the worker pod
                memory: "28G" # Memory limit for the worker pod
                ephemeral-storage: "95Gi" # Ephemeral storage limit for the worker pod
              requests:
                cpu: "3" # CPU request for the worker pod
                memory: "28G" # Memory request for the worker pod
                ephemeral-storage: "95Gi" # Ephemeral storage request for the worker pod
          # Please add the following taints to the GPU node.
          tolerations:
            - key: "ray.io/node-type"
              operator: "Equal"
              value: "worker"
              effect: "NoSchedule"

serveConfigV2 can be generated by the aana build command. It contains the configuration for the Ray Serve applications.

The full file will look like this:

apiVersion: ray.io/v1
kind: RayService
metadata:
  name: aana-sdk
spec:
  serviceUnhealthySecondThreshold: 900 # Config for the health check threshold for Ray Serve applications. Default value is 900.
  deploymentUnhealthySecondThreshold: 900 # Config for the health check threshold for Ray dashboard agent. Default value is 900.
  serveConfigV2: |
    applications:

    - name: asr_deployment

      route_prefix: /asr_deployment

      import_path: test_project.app_config:asr_deployment

      runtime_env:
        working_dir: "https://mobius-public.s3.eu-west-1.amazonaws.com/test_project.zip"
        env_vars: 
          DB_CONFIG: '{"datastore_type": "sqlite", "datastore_config": {"path": "/tmp/aana_db.sqlite"}}'

      deployments:

      - name: WhisperDeployment
        num_replicas: 1
        max_ongoing_requests: 1000
        user_config:
          model_size: tiny
          compute_type: float32
        ray_actor_options:
          num_cpus: 1.0

    - name: vad_deployment

      route_prefix: /vad_deployment

      import_path: test_project.app_config:vad_deployment

      runtime_env:
        working_dir: "https://mobius-public.s3.eu-west-1.amazonaws.com/test_project.zip"
        env_vars: 
          DB_CONFIG: '{"datastore_type": "sqlite", "datastore_config": {"path": "/tmp/aana_db.sqlite"}}'

      deployments:

      - name: VadDeployment
        num_replicas: 1
        max_ongoing_requests: 1000
        user_config:
          model: https://whisperx.s3.eu-west-2.amazonaws.com/model_weights/segmentation/0b5b3216d60a2d32fc086b47ea8c67589aaeb26b7e07fcbe620d6d0b83e209ea/pytorch_model.bin
          onset: 0.5
          offset: 0.363
          min_duration_on: 0.1
          min_duration_off: 0.1
          sample_rate: 16000
        ray_actor_options:
          num_cpus: 1.0

    - name: whisper_app

      route_prefix: /

      import_path: test_project.app_config:whisper_app

      runtime_env:
        working_dir: "https://mobius-public.s3.eu-west-1.amazonaws.com/test_project.zip"
        env_vars: 
          DB_CONFIG: '{"datastore_type": "sqlite", "datastore_config": {"path": "/tmp/aana_db.sqlite"}}'

      deployments:

      - name: RequestHandler
        num_replicas: 2
        ray_actor_options:
          num_cpus: 0.1


  rayClusterConfig:
    rayVersion: '2.20.0' # Should match the Ray version in the image of the containers
    ######################headGroupSpecs#################################
    # Ray head pod template.
    headGroupSpec:
      # The `rayStartParams` are used to configure the `ray start` command.
      # See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
      # See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
      rayStartParams:
        dashboard-host: '0.0.0.0'
      # Pod template
      template:
        spec:
          containers:
          - name: ray-head
            image: europe-docker.pkg.dev/customised-training-app/eu.gcr.io/aana/aana:0.2-ray-2.20@sha256:8814a3c12c6249a3c2bb216c0cba6eef01267d4c91bb58700f7ffc2311d21a3d
            ports:
            - containerPort: 6379
              name: gcs
            - containerPort: 8265
              name: dashboard
            - containerPort: 10001
              name: client
            - containerPort: 8000
              name: serve
            resources:
              limits:
                cpu: "3"
                memory: "28G"
                ephemeral-storage: "95Gi"
              requests:
                cpu: "3"
                memory: "28G"
                ephemeral-storage: "95Gi"
    workerGroupSpecs:
    # The pod replicas in this group typed worker
    - replicas: 1
      minReplicas: 1
      maxReplicas: 10
      groupName: gpu-group
      rayStartParams: {}
      # Pod template
      template:
        spec:
          containers:
          - name: ray-worker
            image: europe-docker.pkg.dev/customised-training-app/eu.gcr.io/aana/aana:0.2-ray-2.20@sha256:8814a3c12c6249a3c2bb216c0cba6eef01267d4c91bb58700f7ffc2311d21a3d
            resources:
              limits:
                cpu: "3"
                memory: "28G"
                ephemeral-storage: "95Gi"
              requests:
                cpu: "3"
                memory: "28G"
                ephemeral-storage: "95Gi"
          # Please add the following taints to the GPU node.
          tolerations:
            - key: "ray.io/node-type"
              operator: "Equal"
              value: "worker"
              effect: "NoSchedule"

Let's take a look at a few critical sections of the YAML file:

runtime_env: This section specifies the runtime environment for the application. It includes the working directory, environment variables, and potentially python packages that need to be installed.

The working directory should be a URL pointing to a zip file containing the application code. It is possible to include the working directory directly in the docker image, but this is not recommended as it makes it harder to update the application code. See the Remote URIs docs for more information.

The environment variables are passed to the application as a dictionary. In this example, we are passing a configuration for a SQLite database.

You can also specify additional python dependencies using keys like py_modules, pip, conda. For more information, see the docs about handling dependencies.

You can also change the deployment parameters if needed. You can specify the number of replicas for each deployment or even change the model parameters.

Another important section is the base image for the application. Usually, you can use a pre-built image from the ray project. However, Aana requires some additional dependencies to be installed. It also makes sense to include Aana and all other Python dependencies in the image.

Here is an example of a Dockerfile that includes Aana and ray:

FROM rayproject/ray:2.20.0.0ae93f-py310
RUN sudo apt-get update && sudo apt-get install -y libgl1 libglib2.0-0 ffmpeg
RUN pip install https://test-files.pythonhosted.org/packages/2e/e7/822893595c45f91acec902612c458fec9ed2684567dcd57bd3ba1770f2ed/aana-0.2.0-py3-none-any.whl
RUN pip install ray[serve]==2.20

Keep in mind that this image does not have GPU support. If you need GPU support, choose a different base image from the ray project.

Ideally, we should build a few base images for Aana so they can be used directly in the YAML file without any additional build steps and pushing to the registry.

In the example, we are using Artifact Registry from Google Cloud. You can use any other registry like Docker Hub, GitHub Container Registry, or any other registry that supports Docker images.

Another thing that also needs adjustment is the resource limits and requests. You can adjust them based on your application requirements. But keep in mind that the ephemeral storage needs to be set to a reasonably high value otherwise the application will not deploy.

Step 4: Deploy the application

After creating the YAML file, you can deploy the application to the Kubernetes cluster using the following command:

kubectl apply -f <your-yaml-file>.yaml

This will create the necessary resources in the Kubernetes cluster to run your Ray application.

You can also use the same command to update the application if you make changes to the YAML file. For example, if you want to scale the number of replicas for an ASR deployment, you can set num_replicas: 2 in the WhisperDeployment section and then run kubectl apply -f <your-yaml-file>.yaml again and kubernetes will start another replica of the ASR deployment.

Step 5: Monitor the application

To access the Ray dashboard, you can use port forwarding to access it locally:

kubectl port-forward service/aana-sdk-head-svc 8265:8265 8000:8000

This will forward ports 8265 and 8000 from the Ray head pod to your local machine. You can then access the Ray dashboard by opening a browser and going to http://localhost:8265. The application will be available at http://localhost:8000. The documentation will be available at http://localhost:8000/docs and http://localhost:8000/redoc.

Problems

I've encountered a few problems while trying to deploy the application.

Base image

Initially, we used the base image from the ray project but it didn't have all the dependencies needed for the application to work. We had to create a custom image that includes Aana and all other dependencies.

Mainly we had to install libgl1 which is required by OpenCV.

We think it also makes sense to include Aana and all other Python dependencies in the image, so the application can be deployed without any additional Python packages installation.

We would recommend building a few base images for Aana so they can be used directly in the YAML file without any additional build steps and pushing to the registry. We can use ray as a base image and add aana and all other dependencies to it. We would have to add quite a few images to support different versions of Ray, Aana, Python, and CUDA. But we can start with a few most popular combinations and add more if needed. Docker Hub is free for public images so we can use it to host the images there at no additional cost. Building the images can be automated using GitHub Actions but that will add additional cost for the build minutes on GitHub.

SQL database

Right now we are using SQLite as a database for the application. And default location is /var/lib/aana_data. The problem we encountered is that the project didn't have access to /var/lib directory. I've changed it to /tmp/aana_data/db.sqlite but it didn't work either because /tmp/aana_data directory didn't exist. We changed it to /tmp/aana_db.sqlite and it worked. Ideally, the database should be stored in a persistent volume so the data is not lost when the pod is restarted. But for testing purposes, it's fine to use a temporary directory.

SQL migrations running multiple times

The problem we've encountered is that when we deployed the Aana application, it tried to run migrations multiple times leading to table media already exists error. That happened because the Aana application is deployed as multiple Ray applications and each of them tried to run the migrations. We would suggest changing it so the migrations are run only once by the application that deploys the request handler.

Shared storage

We already discussed before that we store some files on the local disk that will not be accessible from other nodes in the cluster. We ran into this problem when we tried to deploy the application on a multi-node cluster.

The solution would be to use a shared storage like NFS. This is a recommendation from the Ray documentation. GKE has Filestore that can be used as a shared storage.

OCI specific problems

We had a lot of issues with OCI and had to switch to GCP to make it work. We will list some of the problems We encountered.

Ephemeral storage

The first problem was with the ephemeral storage. The default disk size on Oracle Cloud (OCI) is 50GB and only ~30GB is available for the application. And it was not enough for the application to deploy. We had to increase the disk size to 100GB to make it work.

The problem with OCI is that you need to resize the disk "manually" using oci-growfs command as explained in the documentation. It probably can be automated using Terraform or Ansible but it's an additional step that needs to be done and you have to be aware of it if you want to deploy the application on OCI.

Pods don't have access to the internet

The second problem was that the pods didn't have access to the internet. We have no idea what was the problem but out-of-the-box pods didn't have access to the internet. We weren't able to find a solution to this problem and we had to switch to GCP to make it work.

Actionable items

  • Create a few base images for Aana so they can be used directly in the Kubernetes setup and Dockerfiles. We can use ray as a base image and add aana and all other dependencies to it. We would have to add quite a few images to support different versions of Ray, Aana, Python, and Cuda. But we can start with a few most popular combinations and add more if needed. Docker Hub is free for public images so we can use it to host the images there at no additional cost. Building the images can be automated using GitHub Actions but that will add an additional cost for the build minutes on GitHub.

  • Change the database location to /tmp/aana_db.sqlite so it works out of the box. Ideally, the database should be stored in a persistent volume so the data is not lost when the pod is restarted. But for testing purposes, it's fine to use a temporary directory.

  • Change the migrations so they are run only once by the application that deploys the request handler.

  • Decide on the shared storage solution. We either say that the application should be using a shared storage like NFS or we need to find a way to make it work without shared storage.