Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: added documentation on Spark tuning #158

Merged
merged 1 commit into from
Aug 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 9 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,14 +90,18 @@ To remove everything including container images:
docker compose --profile ui --profile k9s --profile init-data down -v --remove-orphans --rmi all
```

## Spark tuning
We use Spark jobs to calculate metrics: if you need to tune Spark configuration in order to optimize performance for large files or accelerate computations, please refer to the corresponding section of this [README file](https://github.com/radicalbit/radicalbit-ai-monitoring/blob/main/api/README.md).

## 📖 Documentation
You can find the following documentation:
* An extensive [step-by-step guide](https://docs.oss-monitoring.radicalbit.ai/l2_user_guide/user-guide-installation) to install the development/testing version of the platform, followed by all [key concepts](https://docs.oss-monitoring.radicalbit.ai/l2_user_guide/user-guide-keyconcepts) and a [hands-on guide](https://docs.oss-monitoring.radicalbit.ai/l2_user_guide/how_to) on how to use the GUI.
* A practical [guide](https://docs.oss-monitoring.radicalbit.ai/l1_quickstart) that walks users through monitoring an AI solution on the platform.
* An extensive [step-by-step guide](https://docs.oss-monitoring.radicalbit.ai/user-guide/installation) to install the development/testing version of the platform, followed by all [key concepts](https://docs.oss-monitoring.radicalbit.ai/user-guide/key-concepts) and a [hands-on guide](https://docs.oss-monitoring.radicalbit.ai/user-guide/how-to) on how to use the GUI.
* A practical [guide](https://docs.oss-monitoring.radicalbit.ai/quickstart) that walks users through monitoring an AI solution on the platform.
* A detailed [explanation](https://docs.oss-monitoring.radicalbit.ai/category/model-sections) on the three main model sections.
* An exhaustive [description](https://docs.oss-monitoring.radicalbit.ai/l5_python_sdk) of all classes implemented inside the Python SDK.
* A list of [all available metrics and charts](https://docs.oss-monitoring.radicalbit.ai/l6_all_metrics).
* A [community support](https://docs.oss-monitoring.radicalbit.ai/l7_support) page.
* An exhaustive [description](https://docs.oss-monitoring.radicalbit.ai/python-sdk) of all classes implemented inside the Python SDK.
* A list of [all available metrics and charts](https://docs.oss-monitoring.radicalbit.ai/all-metrics).
* A page related to the [architecture](https://docs.oss-monitoring.radicalbit.ai/architecture) of the platform.
* A [community support](https://docs.oss-monitoring.radicalbit.ai/support) page.

## 🤝 Community
Please join us on our [Discord server](https://discord.gg/x2Ze8TMRsD), to discuss the platform, share ideas, and help shape its future! Get help from experts and fellow users.
Expand Down
45 changes: 45 additions & 0 deletions api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,3 +95,48 @@ By default, the PostgreSQL schema that is used in the platform and in the migrat
store data and use another schema, you need to modify the environment variables of the schema in
the `docker-compose.yaml` accordingly, and you have to either manually modify migrations script or re-create the
migrations with above commands.

## Spark tuning

### Overview

When a reference or current dataset file is uploaded, the platform calculates all available metrics using a Spark job deployed in a Kubernetes cluster. The platform utilizes the [spark-on-k8s](https://github.com/hussein-awala/spark-on-k8s) library to initiate Spark jobs.

### Spark Job structure

A Spark job consists of two main components:

1. **Spark Driver**: Manages the job execution
2. **Spark Executors**: Instances that perform the actual calculations

### Default configuration

In the platform, the default resources configuration for Spark jobs is as follows:

- Spark Driver CPUs: 1
- Spark Driver Memory: 1024 (in MB)
- Spark Driver Memory overhead: 512 (in MB)
- Spark Executor CPUs: 1
- Spark Executor Memory: 1024 (in MB)
- Spark Executor Memory overhead: 512 (in MB)
- Spark Executor Initial instances: 2
- Spark Executor Min instances: 2
- Spark Executor Max instances: 2

### Resources tuning

To optimize performance for larger files or to accelerate computations, you can adjust the following environment variables in the backend container:

```
SPARK_ON_K8S_DRIVER_CPU: 1
SPARK_ON_K8S_DRIVER_MEMORY: 1024
SPARK_ON_K8S_DRIVER_MEMORY_OVERHEAD: 512
SPARK_ON_K8S_EXECUTOR_CPU: 1
SPARK_ON_K8S_EXECUTOR_MEMORY: 1024
SPARK_ON_K8S_EXECUTOR_MEMORY_OVERHEAD: 512
SPARK_ON_K8S_EXECUTOR_INITIAL_INSTANCES: 2
SPARK_ON_K8S_EXECUTOR_MIN_INSTANCES: 2
SPARK_ON_K8S_EXECUTOR_MAX_INSTANCES: 2
```

Adjust these variables as needed to allocate more resources or modify the number of executor instances for your specific use case.
4 changes: 4 additions & 0 deletions docs/docs/user-guide/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,7 @@ See [README file](https://github.com/radicalbit/radicalbit-ai-monitoring/blob/ma
```

After all the containers are up & running, you can go to [http://localhost:5173](http://127.0.0.1:5173/) and play with the platform.

## Spark tuning

We use Spark jobs to calculate metrics: if you need to tune Spark configuration in order to optimize performance for large files or accelerate computations, please refer to the corresponding section of this [README file](https://github.com/radicalbit/radicalbit-ai-monitoring/blob/main/api/README.md).