docs: added documentation on Spark tuning

radicalbit · Aug 5, 2024 · 08d0e59 · 08d0e59
1 parent f3df086
commit 08d0e59
Show file tree

Hide file tree

Showing 3 changed files with 58 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -90,14 +90,18 @@ To remove everything including container images:
 docker compose --profile ui --profile k9s --profile init-data down -v --remove-orphans --rmi all
 ```
 
+## Spark tuning
+We use Spark jobs to calculate metrics: if you need to tune Spark configuration in order to optimize performance for large files or accelerate computations, please refer to the corresponding section of this [README file](https://github.com/radicalbit/radicalbit-ai-monitoring/blob/main/api/README.md).
+
 ## 📖 Documentation
 You can find the following documentation:
-* An extensive [step-by-step guide](https://docs.oss-monitoring.radicalbit.ai/l2_user_guide/user-guide-installation) to install the development/testing version of the platform, followed by all [key concepts](https://docs.oss-monitoring.radicalbit.ai/l2_user_guide/user-guide-keyconcepts) and a [hands-on guide](https://docs.oss-monitoring.radicalbit.ai/l2_user_guide/how_to) on how to use the GUI.
-* A practical [guide](https://docs.oss-monitoring.radicalbit.ai/l1_quickstart) that walks users through monitoring an AI solution on the platform.
+* An extensive [step-by-step guide](https://docs.oss-monitoring.radicalbit.ai/user-guide/installation) to install the development/testing version of the platform, followed by all [key concepts](https://docs.oss-monitoring.radicalbit.ai/user-guide/key-concepts) and a [hands-on guide](https://docs.oss-monitoring.radicalbit.ai/user-guide/how-to) on how to use the GUI.
+* A practical [guide](https://docs.oss-monitoring.radicalbit.ai/quickstart) that walks users through monitoring an AI solution on the platform.
 * A detailed [explanation](https://docs.oss-monitoring.radicalbit.ai/category/model-sections) on the three main model sections.
-* An exhaustive [description](https://docs.oss-monitoring.radicalbit.ai/l5_python_sdk) of all classes implemented inside the Python SDK.
-* A list of [all available metrics and charts](https://docs.oss-monitoring.radicalbit.ai/l6_all_metrics).
-* A [community support](https://docs.oss-monitoring.radicalbit.ai/l7_support) page.
+* An exhaustive [description](https://docs.oss-monitoring.radicalbit.ai/python-sdk) of all classes implemented inside the Python SDK.
+* A list of [all available metrics and charts](https://docs.oss-monitoring.radicalbit.ai/all-metrics).
+* A page related to the [architecture](https://docs.oss-monitoring.radicalbit.ai/architecture) of the platform.
+* A [community support](https://docs.oss-monitoring.radicalbit.ai/support) page.
 
 ## 🤝 Community
 Please join us on our [Discord server](https://discord.gg/x2Ze8TMRsD), to discuss the platform, share ideas, and help shape its future! Get help from experts and fellow users.

diff --git a/api/README.md b/api/README.md
@@ -95,3 +95,48 @@ By default, the PostgreSQL schema that is used in the platform and in the migrat
 store data and use another schema, you need to modify the environment variables of the schema in
 the `docker-compose.yaml` accordingly, and you have to either manually modify migrations script or re-create the
 migrations with above commands.
+
+## Spark tuning
+
+### Overview
+
+When a reference or current dataset file is uploaded, the platform calculates all available metrics using a Spark job deployed in a Kubernetes cluster. The platform utilizes the [spark-on-k8s](https://github.com/hussein-awala/spark-on-k8s) library to initiate Spark jobs.
+
+### Spark Job structure
+
+A Spark job consists of two main components:
+
+1. **Spark Driver**: Manages the job execution
+2. **Spark Executors**: Instances that perform the actual calculations
+
+### Default configuration
+
+In the platform, the default resources configuration for Spark jobs is as follows:
+
+- Spark Driver CPUs: 1
+- Spark Driver Memory: 1024 (in MB)
+- Spark Driver Memory overhead: 512 (in MB)
+- Spark Executor CPUs: 1
+- Spark Executor Memory: 1024 (in MB)
+- Spark Executor Memory overhead: 512 (in MB)
+- Spark Executor Initial instances: 2
+- Spark Executor Min instances: 2
+- Spark Executor Max instances: 2
+
+### Resources tuning
+
+To optimize performance for larger files or to accelerate computations, you can adjust the following environment variables in the backend container:
+
+```
+SPARK_ON_K8S_DRIVER_CPU: 1
+SPARK_ON_K8S_DRIVER_MEMORY: 1024
+SPARK_ON_K8S_DRIVER_MEMORY_OVERHEAD: 512
+SPARK_ON_K8S_EXECUTOR_CPU: 1
+SPARK_ON_K8S_EXECUTOR_MEMORY: 1024
+SPARK_ON_K8S_EXECUTOR_MEMORY_OVERHEAD: 512
+SPARK_ON_K8S_EXECUTOR_INITIAL_INSTANCES: 2
+SPARK_ON_K8S_EXECUTOR_MIN_INSTANCES: 2
+SPARK_ON_K8S_EXECUTOR_MAX_INSTANCES: 2
+```
+
+Adjust these variables as needed to allocate more resources or modify the number of executor instances for your specific use case.
diff --git a/docs/docs/user-guide/installation.md b/docs/docs/user-guide/installation.md
@@ -29,3 +29,7 @@ See [README file](https://github.com/radicalbit/radicalbit-ai-monitoring/blob/ma
   ```
 
 After all the containers are up & running, you can go to [http://localhost:5173](http://127.0.0.1:5173/) and play with the platform.
+
+## Spark tuning
+
+We use Spark jobs to calculate metrics: if you need to tune Spark configuration in order to optimize performance for large files or accelerate computations, please refer to the corresponding section of this [README file](https://github.com/radicalbit/radicalbit-ai-monitoring/blob/main/api/README.md).