chore(main): release 1.0.0 (#159)

* chore(main): release 1.0.0 --------- Co-authored-by: Paolo Filippelli <[email protected]> Co-authored-by: paoloyx <[email protected]>
radicalbit · Aug 5, 2024 · 28f485a · 28f485a
1 parent 1c51ace
commit 28f485a
Show file tree

Hide file tree

Showing 23 changed files with 1,023 additions and 6 deletions.
diff --git a/.github/release-manifest.json b/.github/release-manifest.json
@@ -1,3 +1,3 @@
 {
-    ".": "0.9.0"
+    ".": "1.0.0"
 }
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,52 @@
 # Changelog
 
+## [1.0.0](https://github.com/radicalbit/radicalbit-ai-monitoring/compare/v0.9.0...v1.0.0) (2024-08-05)
+
+
+### Features
+
+* add check ([#145](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/145)) ([ba65d21](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/ba65d210ccf9637aa7d77f038a2453d700e4f081))
+* add log loss property for classification models ([#131](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/131)) ([d744235](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/d744235c5df742114474f53cf9c559f3f1dec5c8))
+* add regression line to residuals metric ([#111](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/111)) ([c2f5437](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/c2f54377c36c395b0daea351d31b6b86ace1cb99))
+* added CHI2 drift to float columns when cardinality &lt; 15 ([#121](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/121)) ([62f3852](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/62f3852cece909311e145fa6e1e5a16ee249e094))
+* added field type in drift return, removes threshold of 15 for numerical vatriables ([#141](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/141)) ([b08e97d](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/b08e97d722f65cc956c54f5c3d41925109c4d699))
+* added field type inside drift data ([#142](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/142)) ([a19cd20](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/a19cd20249beee7a2ddb73930401a29c970d9cfe))
+* added new log loss metric for binary - current and reference ([#129](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/129)) ([52fe9a7](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/52fe9a7cc86f46090c3f16c102740691bd3cb6f9))
+* added PSI drift method to int columns ([#114](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/114)) ([07746f5](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/07746f5d3abb639c8227e1ed106c3510731a6641))
+* added regression line calculation ([#109](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/109)) ([c5511ed](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/c5511edd9ef32a3d5cbdf7b51356dcbc076eead3))
+* align demo data ([#136](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/136)) ([039960b](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/039960b0e23eb7b883f65ee2eb371d0fbef01c19))
+* **api:** add field type as column definition property ([#133](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/133)) ([92342b9](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/92342b9c4ad57b047174682d72442c4f12a4074e))
+* **api:** add update model features API ([#143](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/143)) ([80875c2](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/80875c25f6dc013478571b8d6ad6b484d8e6eaa4))
+* **api:** bump libraries and fix log ([#139](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/139)) ([1bd820d](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/1bd820df4090b20c44f3918434429e2150654535))
+* changed indexing method, removed job tests ([#126](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/126)) ([a2aa7c5](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/a2aa7c5633d60c8bed3a6490989d76d6b83ea8a4))
+* improve chi2 ([#128](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/128)) ([ce76c86](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/ce76c86c5dc59afe5d338d3164db8f352c08818b))
+* new field_type added, fixed all tests accordingly ([#140](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/140)) ([dd6e85b](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/dd6e85bb42c1bd32b42991bea13b8d411f4bf795))
+* new PSI value, chi2 new method fixed ([#134](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/134)) ([d59f33c](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/d59f33c109e8a33adcad5a921a1414c582e56904))
+* refactor jobs to test output ([#120](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/120)) ([67e94e9](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/67e94e9da3e44b9f79f4cd27db38552d0decda81))
+* **sdk:** added update model features ([#146](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/146)) ([f439c17](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/f439c17596534bbb6f76382ef725942afc2bc042))
+* **sdk:** improved column definition class ([#138](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/138)) ([d126e15](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/d126e15d94103de9bdcece79b85e377e71539ac5))
+* **ui:** add grafana faro on UI ([#149](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/149)) ([eb440ec](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/eb440ec0755263ebd02be1f31025991dd1aa6b3c))
+* **ui:** add logLoss to modelQuality binary ([#135](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/135)) ([c9d407a](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/c9d407ac47f2f0611d80ea8f00bfe3582f70d9d6))
+* **ui:** add PSI label ([#137](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/137)) ([3c5e0a6](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/3c5e0a61b355278dc3c5f1d6ea0c86f96fd18f81))
+* **ui:** Edit fieldType in Overview -&gt; variables section ([#147](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/147)) ([3f228b2](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/3f228b2eac1559deef09a1efb5f96e59716472a4))
+* **ui:** improve regression section and add residuals charts ([#118](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/118)) ([ffa9947](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/ffa994700ee8820ab2c1e2d74c19817fe0eb0648))
+* **ui:** refactor drift section with field type ([#148](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/148)) ([433f8bd](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/433f8bdd2c8d69fe021f04c09714c639b7495dd7))
+* **ui:** remove counterLabel and hide element in header ([#123](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/123)) ([c3a744c](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/c3a744cb7c6cbb40e6d5935fc1fa823165b00052))
+* **ui:** update label and add legend on chart ([#127](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/127)) ([762e1ac](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/762e1acb0e2799c8ae1932728aba79d94b947f7b))
+* **ui:** use step 3 to select targets,features and predictions ([#130](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/130)) ([7eeedae](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/7eeedae8ce785cf50c9127bf4e325a22769aa8ac))
+
+
+### Bug Fixes
+
+* add schema in postgres ([#122](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/122)) ([e8e143c](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/e8e143c77203da15887529c73522b27f0c37cd9b))
+* added psi to enum drift dto ([#124](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/124)) ([e601c2d](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/e601c2dd116858e3173d845da4ae7157843364e4))
+* edit regression line structure in model quality dto ([#117](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/117)) ([bf9d612](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/bf9d612f9a04aeef42cf28411ed698d48b101061))
+* finished refactoring for withcolumn prefix ([#132](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/132)) ([48c1730](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/48c17305738dd47e8f80b40d0b88faea5e164d96))
+* return value for regression line casted to float ([#115](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/115)) ([d5d862d](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/d5d862deb02354112cafa15552ffc7973d451452))
+* return values for regression line changed from points to coefficients ([#116](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/116)) ([e475478](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/e475478926b86d8fbe7cbb3703e9be23cedb88bb))
+* **sdk:** fix documentation typos ([#125](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/125)) ([92ade35](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/92ade35e0436cb3e18981ab754b966ebd4664034))
+* **ui:** fix regression resize chart ([#119](https://github.com/radicalbit/radicalbit-ai-monitoring/issues/119)) ([43fc389](https://github.com/radicalbit/radicalbit-ai-monitoring/commit/43fc389e8c5ac9319aab419590944821213ed0c3))
+
 ## [0.9.0](https://github.com/radicalbit/radicalbit-ai-monitoring/compare/v0.8.2...v0.9.0) (2024-07-16)
 
 

diff --git a/api/pyproject.toml b/api/pyproject.toml
@@ -1,7 +1,7 @@
 [tool.poetry]
 name = "radicalbit-ai-monitoring"
 # x-release-please-start-version
-version = "0.9.0"
+version = "1.0.0"
 # x-release-please-end
 description = "radicalbit platform"
 authors = ["Radicalbit"]

diff --git a/docs/versioned_docs/version-v1.0.0/all-metrics.md b/docs/versioned_docs/version-v1.0.0/all-metrics.md
@@ -0,0 +1,82 @@
+---
+sidebar_position: 5
+---
+
+# All metrics
+List of all available Metrics and Charts.
+
+## CSV summary
+
+* Number of variables
+* Number of observations
+* Number of missing values
+* Percentage of missing values
+* Number of duplicated rows
+* Percentage of duplicated rows
+* Number of **numerical** variables
+* Number of **categorical** variables
+* Number of **datetime** variables
+
+Summary with all variable name and type (float, int, string, datetime).
+
+## Data quality
+
+* **Numerical** variables
+  * Average
+  * Standard deviation
+  * Minimum
+  * Maximum
+  * Percentile 25%
+  * Median
+  * Percentile 75%
+  * Number of missing values
+  * Histogram with 10 bins
+* **Categorical** variables
+  * Number of missing values
+  * Percentage of missing values
+  * Number of distinct values
+  * For each distinct value:
+    * count of observations
+    * percentage of observations
+* **Ground truth**
+  * if categorical i.e. for a classification model: bar plot *(for both reference and current for an easy comparison)*
+  * if numerical, i.e. for a regression model: histogram with 10 bins *(for both reference and current for an easy comparison)*
+
+## Model quality
+
+* Classification model
+  * Number of classes
+  * Accuracy *(for both reference and current for an easy comparison)*
+  * Line chart of accuracy over time
+  * Confusion matrix
+  * Log loss, *only for binary classification at the moment*
+  * Line chart of log loss over time, *only for binary classification at the moment*
+  * For each class:
+    * Precision *(for both reference and current for an easy comparison)*
+    * Recall *(for both reference and current for an easy comparison)*
+    * F1 score *(for both reference and current for an easy comparison)*
+    * True Positive Rate *(for both reference and current for an easy comparison)*
+    * False Positive Rate *(for both reference and current for an easy comparison)*
+    * Support *(for both reference and current for an easy comparison)*
+* Regression model
+  * Mean squared error *(for both reference and current for an easy comparison)*
+  * Root mean squared error *(for both reference and current for an easy comparison)*
+  * Mean absolute error *(for both reference and current for an easy comparison)*
+  * Mean absolute percentage error *(for both reference and current for an easy comparison)*
+  * R-squared *(for both reference and current for an easy comparison)*
+  * Adjusted R-squared *(for both reference and current for an easy comparison)*
+  * Variance *(for both reference and current for an easy comparison)*
+  * Line charts for all of the above over time
+  * Residual analysis:
+    * Correlation prediction/ground_truth
+    * Residuals plot, i.e, scatter plot for standardised residuals and predictions
+    * Scatter plot for predictions vs ground truth and linear regression line
+    * Histogram of the residuals
+    * Kolmogorov-Smirnov test of normality for residuals
+
+## Data Drift
+
+Data drift for all features using different algorithms depending on the data type: float, int, categorical. We use the following algorithms (but others will be added in the future):
+* [Chi-Square Test](https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test)
+* [Two-Sample Kolmogorov-Smirnov](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Two-sample_Kolmogorov%E2%80%93Smirnov_test)
+* [Population Stability Index](https://scholarworks.wmich.edu/dissertations/3208/)
diff --git a/docs/versioned_docs/version-v1.0.0/architecture.md b/docs/versioned_docs/version-v1.0.0/architecture.md
@@ -0,0 +1,27 @@
+---
+sidebar_position: 6
+---
+
+# Architecture
+
+In this section we explore the architecture of the Radicalbit AI platform.
+The image below shows all the components of the platform:
+
+![Alt text](/img/architecture/architecture.png "Architecture")
+
+## API
+
+API is the core of the platform, it exposes all the functionalities via REST APIs.
+It requires a PostgreSQL database to store data and a Kubernetes cluster to run Spark jobs for metrics evaluations.
+To store all dataset files a distributed storage is used.
+REST APIs could be used via user interface or using the provided Python SDK.
+
+## UI
+
+To use REST APIs with a human friendly interface, a UI is provided.
+It covers all the implemented APIs, starting from model creation and ending with all metrics visualization.
+
+## SDK
+
+To interact with API programmatically, a [_Python SDK_](python-sdk.md) is provided.
+The SDK implements all functionalities exposed via REST API.
diff --git a/docs/versioned_docs/version-v1.0.0/index.md b/docs/versioned_docs/version-v1.0.0/index.md
@@ -0,0 +1,33 @@
+---
+sidebar_position: 1
+---
+
+# Introduction
+Let's discover the **Radicalbit AI Monitoring Platform** in less than 5 minutes.
+
+## Welcome!
+This platform provides a comprehensive solution for monitoring and observing your Artificial Intelligence (AI) models in production.
+
+### Why Monitor AI Models?
+While models often perform well during development and validation, their effectiveness can degrade over time in production due to various factors like data shifts or concept drift. The Radicalbit AI Monitor platform helps you proactively identify and address potential performance issues.
+
+### Key Functionalities
+The platform provides comprehensive monitoring capabilities to ensure optimal performance of your AI models in production. It analyses both your reference dataset (used for pre-production validation) and the current datasets in use, allowing you to put under control:
+* **Data Quality:** evaluate the quality of your data, as high-quality data is crucial for maintaining optimal model performance. The platform analyses both numerical and categorical features in your dataset to provide insights into
+    * *data distribution*
+    * *missing values*
+    * *target variable distribution* (for supervised learning).
+
+* **Model Quality Monitoring:** the platform provides a comprehensive suite of metrics specifically designed at the moment for classification and regression models. \
+For classification these metrics include:
+    * *Accuracy, Precision, Recall, and F1:* These metrics provide different perspectives on how well your model is classifying positive and negative cases.
+    * *False/True Negative/Positive Rates and Confusion Matrix:* These offer a detailed breakdown of your model's classification performance, including the number of correctly and incorrectly classified instances.
+    * *AUC-ROC and PR AUC:* These are performance curves that help visualize your model's ability to discriminate between positive and negative classes.
+
+    For regression these metrics include:
+   * *Mean Absolute Error, Mean Squared Error, Root Mean Squared Error, R²:* These metrics provide different perspectives on how well your model is predicting a numerical value.
+    * *Residual Analysis:* This offers a detailed breakdown of your model's performance, comparing predictions with ground truth and predictions with residuals, i.e. the difference between predictions and ground truth.
+* **Model Drift Detection:** analyse model drift, which occurs when the underlying data distribution changes over time and can affect model performance.
+
+### Current Scope and Future Plans
+This version focuses on classification, both binary and multiclass, and regression models. Support for additional model types is planned for future releases.
diff --git a/docs/versioned_docs/version-v1.0.0/model-sections/_category_.json b/docs/versioned_docs/version-v1.0.0/model-sections/_category_.json
@@ -0,0 +1,8 @@
+{
+  "label": "Model sections",
+  "position": 4,
+  "link": {
+    "type": "generated-index",
+    "description": "Each created models has three main sections: Overview, Reference, and Current. In this document we are thoroughly explaining each of them."
+  }
+}
diff --git a/docs/versioned_docs/version-v1.0.0/model-sections/current.md b/docs/versioned_docs/version-v1.0.0/model-sections/current.md
@@ -0,0 +1,54 @@
+---
+sidebar_position: 3
+---
+
+# Current
+The Current section stores all the information (statistics, model metrics and charts) related to the current dataset, placed side-by-side to the reference ones. The objective is to streamline and highlight every difference between the data over time. Throughout the platform, all the current information is coloured blue or in different shades.
+
+> NOTE: in this section, you will always see the last uploaded current dataset. In case you need previous current analysis, you can browse among them in the `Import` section.
+
+
+## Data Quality
+The **Data Quality** dashboard contains a descriptive analysis of the current variables (blue) placed side-by-side with the reference ones (grey). It adapts itself accordingly to the `Model Type` and shows information such as:
+
+- Number of observations
+- Number of classes (not in regression task)
+- Ground Truth Distribution
+- Histograms for Numerical Features
+- Descriptive Statistics for Numerical Features (average, standard deviation, ranges, percentiles, missing values)
+- Bar Charts for Categorical Features
+- Descriptive Statistics for Categorical Features(missing values, distinct values, frequencies)
+
+![Alt text](/img/current/current-data-quality.png "Current Data Quality")
+
+
+## Model Quality
+
+The **Model Quality** dashboard contains all the metrics used to evaluate the model performance in the current dataset and compare these values to the reference. Many of them are computed through the `prediction`/`probability` compared to the `ground truth`. Naturally, the platform computes the proper metrics according to the chosen `Model Type`. \
+Differently from the reference section, here, the metrics are computed over time thanks to the flagged `timestamp` columns and the `granularity` parameter chosen during the model creation.
+
+![Alt text](/img/current/current-model-quality.png "Current Model Quality")
+
+
+## Data Drift
+
+The **Data Drift** section contains the outcome of some drift detector executed for each variable.
+According to the field type (categorical or numerical), a specific drift is computed:
+
+- Categoricals: **Chi-Square Test**
+- Numerical: **2-Samples-KS Test** (for `float` variables), **PSI** (for `int` variables)
+
+If the dot placed at the side of the variable name is red, it means that a drift has been revealed and the relative chart (and statistical description) can be seen in the `Current/Data Quality` section.
+
+![Alt text](/img/current/current-data-drift.png "Current Data Drift")
+
+
+## Import
+
+The **Import** section lists the path where your current CSVs are stored. If you have a private AWS, the files will be saved in a dedicated S3 bucket otherwise, they will be saved locally with Minio (which shares the same syntax as S3).
+To see your current datasets stored in Minio, visit the address [http://localhost:9091](http://localhost:9091).
+
+Here, you can browse between all the current datasets you have uploaded over time.
+
+![Alt text](/img/current/current-import.png "Current Import")
+