diff --git a/README.md b/README.md index 88492296..9966ea12 100644 --- a/README.md +++ b/README.md @@ -90,14 +90,18 @@ To remove everything including container images: docker compose --profile ui --profile k9s --profile init-data down -v --remove-orphans --rmi all ``` +## Spark tuning +We use Spark jobs to calculate metrics: if you need to tune Spark configuration in order to optimize performance for large files or accelerate computations, please refer to the corresponding section of this [README file](https://github.com/radicalbit/radicalbit-ai-monitoring/blob/main/api/README.md). + ## 📖 Documentation You can find the following documentation: -* An extensive [step-by-step guide](https://docs.oss-monitoring.radicalbit.ai/l2_user_guide/user-guide-installation) to install the development/testing version of the platform, followed by all [key concepts](https://docs.oss-monitoring.radicalbit.ai/l2_user_guide/user-guide-keyconcepts) and a [hands-on guide](https://docs.oss-monitoring.radicalbit.ai/l2_user_guide/how_to) on how to use the GUI. -* A practical [guide](https://docs.oss-monitoring.radicalbit.ai/l1_quickstart) that walks users through monitoring an AI solution on the platform. +* An extensive [step-by-step guide](https://docs.oss-monitoring.radicalbit.ai/user-guide/installation) to install the development/testing version of the platform, followed by all [key concepts](https://docs.oss-monitoring.radicalbit.ai/user-guide/key-concepts) and a [hands-on guide](https://docs.oss-monitoring.radicalbit.ai/user-guide/how-to) on how to use the GUI. +* A practical [guide](https://docs.oss-monitoring.radicalbit.ai/quickstart) that walks users through monitoring an AI solution on the platform. * A detailed [explanation](https://docs.oss-monitoring.radicalbit.ai/category/model-sections) on the three main model sections. -* An exhaustive [description](https://docs.oss-monitoring.radicalbit.ai/l5_python_sdk) of all classes implemented inside the Python SDK. -* A list of [all available metrics and charts](https://docs.oss-monitoring.radicalbit.ai/l6_all_metrics). -* A [community support](https://docs.oss-monitoring.radicalbit.ai/l7_support) page. +* An exhaustive [description](https://docs.oss-monitoring.radicalbit.ai/python-sdk) of all classes implemented inside the Python SDK. +* A list of [all available metrics and charts](https://docs.oss-monitoring.radicalbit.ai/all-metrics). +* A page related to the [architecture](https://docs.oss-monitoring.radicalbit.ai/architecture) of the platform. +* A [community support](https://docs.oss-monitoring.radicalbit.ai/support) page. ## 🤝 Community Please join us on our [Discord server](https://discord.gg/x2Ze8TMRsD), to discuss the platform, share ideas, and help shape its future! Get help from experts and fellow users. diff --git a/api/README.md b/api/README.md index dd4db5bf..c7cb4ca3 100644 --- a/api/README.md +++ b/api/README.md @@ -95,3 +95,48 @@ By default, the PostgreSQL schema that is used in the platform and in the migrat store data and use another schema, you need to modify the environment variables of the schema in the `docker-compose.yaml` accordingly, and you have to either manually modify migrations script or re-create the migrations with above commands. + +## Spark tuning + +### Overview + +When a reference or current dataset file is uploaded, the platform calculates all available metrics using a Spark job deployed in a Kubernetes cluster. The platform utilizes the [spark-on-k8s](https://github.com/hussein-awala/spark-on-k8s) library to initiate Spark jobs. + +### Spark Job structure + +A Spark job consists of two main components: + +1. **Spark Driver**: Manages the job execution +2. **Spark Executors**: Instances that perform the actual calculations + +### Default configuration + +In the platform, the default resources configuration for Spark jobs is as follows: + +- Spark Driver CPUs: 1 +- Spark Driver Memory: 1024 (in MB) +- Spark Driver Memory overhead: 512 (in MB) +- Spark Executor CPUs: 1 +- Spark Executor Memory: 1024 (in MB) +- Spark Executor Memory overhead: 512 (in MB) +- Spark Executor Initial instances: 2 +- Spark Executor Min instances: 2 +- Spark Executor Max instances: 2 + +### Resources tuning + +To optimize performance for larger files or to accelerate computations, you can adjust the following environment variables in the backend container: + +``` +SPARK_ON_K8S_DRIVER_CPU: 1 +SPARK_ON_K8S_DRIVER_MEMORY: 1024 +SPARK_ON_K8S_DRIVER_MEMORY_OVERHEAD: 512 +SPARK_ON_K8S_EXECUTOR_CPU: 1 +SPARK_ON_K8S_EXECUTOR_MEMORY: 1024 +SPARK_ON_K8S_EXECUTOR_MEMORY_OVERHEAD: 512 +SPARK_ON_K8S_EXECUTOR_INITIAL_INSTANCES: 2 +SPARK_ON_K8S_EXECUTOR_MIN_INSTANCES: 2 +SPARK_ON_K8S_EXECUTOR_MAX_INSTANCES: 2 +``` + +Adjust these variables as needed to allocate more resources or modify the number of executor instances for your specific use case. diff --git a/docs/docs/user-guide/installation.md b/docs/docs/user-guide/installation.md index 6ccf5104..6be8a499 100644 --- a/docs/docs/user-guide/installation.md +++ b/docs/docs/user-guide/installation.md @@ -29,3 +29,7 @@ See [README file](https://github.com/radicalbit/radicalbit-ai-monitoring/blob/ma ``` After all the containers are up & running, you can go to [http://localhost:5173](http://127.0.0.1:5173/) and play with the platform. + +## Spark tuning + +We use Spark jobs to calculate metrics: if you need to tune Spark configuration in order to optimize performance for large files or accelerate computations, please refer to the corresponding section of this [README file](https://github.com/radicalbit/radicalbit-ai-monitoring/blob/main/api/README.md).