Skip to content

Commit

Permalink
docs: added documentation on Spark tuning
Browse files Browse the repository at this point in the history
  • Loading branch information
aleconf committed Aug 5, 2024
1 parent f3df086 commit 08d0e59
Show file tree
Hide file tree
Showing 3 changed files with 58 additions and 5 deletions.
14 changes: 9 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,14 +90,18 @@ To remove everything including container images:
docker compose --profile ui --profile k9s --profile init-data down -v --remove-orphans --rmi all
```

## Spark tuning
We use Spark jobs to calculate metrics: if you need to tune Spark configuration in order to optimize performance for large files or accelerate computations, please refer to the corresponding section of this [README file](https://github.com/radicalbit/radicalbit-ai-monitoring/blob/main/api/README.md).

## 📖 Documentation
You can find the following documentation:
* An extensive [step-by-step guide](https://docs.oss-monitoring.radicalbit.ai/l2_user_guide/user-guide-installation) to install the development/testing version of the platform, followed by all [key concepts](https://docs.oss-monitoring.radicalbit.ai/l2_user_guide/user-guide-keyconcepts) and a [hands-on guide](https://docs.oss-monitoring.radicalbit.ai/l2_user_guide/how_to) on how to use the GUI.
* A practical [guide](https://docs.oss-monitoring.radicalbit.ai/l1_quickstart) that walks users through monitoring an AI solution on the platform.
* An extensive [step-by-step guide](https://docs.oss-monitoring.radicalbit.ai/user-guide/installation) to install the development/testing version of the platform, followed by all [key concepts](https://docs.oss-monitoring.radicalbit.ai/user-guide/key-concepts) and a [hands-on guide](https://docs.oss-monitoring.radicalbit.ai/user-guide/how-to) on how to use the GUI.
* A practical [guide](https://docs.oss-monitoring.radicalbit.ai/quickstart) that walks users through monitoring an AI solution on the platform.
* A detailed [explanation](https://docs.oss-monitoring.radicalbit.ai/category/model-sections) on the three main model sections.
* An exhaustive [description](https://docs.oss-monitoring.radicalbit.ai/l5_python_sdk) of all classes implemented inside the Python SDK.
* A list of [all available metrics and charts](https://docs.oss-monitoring.radicalbit.ai/l6_all_metrics).
* A [community support](https://docs.oss-monitoring.radicalbit.ai/l7_support) page.
* An exhaustive [description](https://docs.oss-monitoring.radicalbit.ai/python-sdk) of all classes implemented inside the Python SDK.
* A list of [all available metrics and charts](https://docs.oss-monitoring.radicalbit.ai/all-metrics).
* A page related to the [architecture](https://docs.oss-monitoring.radicalbit.ai/architecture) of the platform.
* A [community support](https://docs.oss-monitoring.radicalbit.ai/support) page.

## 🤝 Community
Please join us on our [Discord server](https://discord.gg/x2Ze8TMRsD), to discuss the platform, share ideas, and help shape its future! Get help from experts and fellow users.
Expand Down
45 changes: 45 additions & 0 deletions api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,3 +95,48 @@ By default, the PostgreSQL schema that is used in the platform and in the migrat
store data and use another schema, you need to modify the environment variables of the schema in
the `docker-compose.yaml` accordingly, and you have to either manually modify migrations script or re-create the
migrations with above commands.

## Spark tuning

### Overview

When a reference or current dataset file is uploaded, the platform calculates all available metrics using a Spark job deployed in a Kubernetes cluster. The platform utilizes the [spark-on-k8s](https://github.com/hussein-awala/spark-on-k8s) library to initiate Spark jobs.

### Spark Job structure

A Spark job consists of two main components:

1. **Spark Driver**: Manages the job execution
2. **Spark Executors**: Instances that perform the actual calculations

### Default configuration

In the platform, the default resources configuration for Spark jobs is as follows:

- Spark Driver CPUs: 1
- Spark Driver Memory: 1024 (in MB)
- Spark Driver Memory overhead: 512 (in MB)
- Spark Executor CPUs: 1
- Spark Executor Memory: 1024 (in MB)
- Spark Executor Memory overhead: 512 (in MB)
- Spark Executor Initial instances: 2
- Spark Executor Min instances: 2
- Spark Executor Max instances: 2

### Resources tuning

To optimize performance for larger files or to accelerate computations, you can adjust the following environment variables in the backend container:

```
SPARK_ON_K8S_DRIVER_CPU: 1
SPARK_ON_K8S_DRIVER_MEMORY: 1024
SPARK_ON_K8S_DRIVER_MEMORY_OVERHEAD: 512
SPARK_ON_K8S_EXECUTOR_CPU: 1
SPARK_ON_K8S_EXECUTOR_MEMORY: 1024
SPARK_ON_K8S_EXECUTOR_MEMORY_OVERHEAD: 512
SPARK_ON_K8S_EXECUTOR_INITIAL_INSTANCES: 2
SPARK_ON_K8S_EXECUTOR_MIN_INSTANCES: 2
SPARK_ON_K8S_EXECUTOR_MAX_INSTANCES: 2
```

Adjust these variables as needed to allocate more resources or modify the number of executor instances for your specific use case.
4 changes: 4 additions & 0 deletions docs/docs/user-guide/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,7 @@ See [README file](https://github.com/radicalbit/radicalbit-ai-monitoring/blob/ma
```

After all the containers are up & running, you can go to [http://localhost:5173](http://127.0.0.1:5173/) and play with the platform.

## Spark tuning

We use Spark jobs to calculate metrics: if you need to tune Spark configuration in order to optimize performance for large files or accelerate computations, please refer to the corresponding section of this [README file](https://github.com/radicalbit/radicalbit-ai-monitoring/blob/main/api/README.md).

0 comments on commit 08d0e59

Please sign in to comment.