diff --git a/README.md b/README.md index 02ef456296..cd57761fd5 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,7 @@ # :bar_chart: What is Evidently? -Evidently is an open-source Python library for data scientists and ML engineers. It helps evaluate, test, and monitor the performance of ML models from validation to production. +Evidently is an open-source Python library for data scientists and ML engineers. It helps evaluate, test, and monitor the performance of ML models from validation to production. It works with tabular and text data. Evidently has a modular approach with 3 interfaces on top of the shared `metrics` functionality. @@ -34,7 +34,7 @@ Tests are best for automated batch model checks. You can integrate them as a pip ## 2. Reports: interactive dashboards > **Note** -> We added a new Report object starting from v0.1.57.dev0. Reports unite the functionality of Dashboards and JSON profiles with a new, cleaner API. You can still use the old [Dashboards API](https://docs.evidentlyai.com/features/dashboards/generate_dashboards) but it will soon be depreciated. +> We added a new Report object starting from v0.1.57.dev0. Reports unite the functionality of Dashboards and JSON profiles with a new, cleaner API. The old Dashboards API is deprecated and will be removed. ![Report example](docs/images/evidently_reports_main-min.png) @@ -108,7 +108,7 @@ To run the **Data Stability** test suite and display the reports in the notebook data_stability= TestSuite(tests=[ DataStabilityTestPreset(), ]) -data_stability.run(current_data=iris_frame.iloc[:90], reference_data=iris_frame.iloc[90:], column_mapping=None) +data_stability.run(current_data=iris_frame.iloc[:60], reference_data=iris_frame.iloc[60:], column_mapping=None) data_stability ``` @@ -146,7 +146,7 @@ data_drift_report = Report(metrics=[ DataDriftPreset(), ]) -data_drift_report.run(current_data=iris_frame.iloc[:90], reference_data=iris_frame.iloc[90:], column_mapping=None) +data_drift_report.run(current_data=iris_frame.iloc[:60], reference_data=iris_frame.iloc[60:], column_mapping=None) data_drift_report ``` diff --git a/docs/book/.gitbook/assets/reports/metric_column_drift_text-min.png b/docs/book/.gitbook/assets/reports/metric_column_drift_text-min.png new file mode 100644 index 0000000000..a57bd7d2e8 Binary files /dev/null and b/docs/book/.gitbook/assets/reports/metric_column_drift_text-min.png differ diff --git a/docs/book/.gitbook/assets/reports/metric_column_summary_text-min.png b/docs/book/.gitbook/assets/reports/metric_column_summary_text-min.png new file mode 100644 index 0000000000..7074e49da7 Binary files /dev/null and b/docs/book/.gitbook/assets/reports/metric_column_summary_text-min.png differ diff --git a/docs/book/.gitbook/assets/reports/metric_text_descriptors_correlation_nlc-min.png b/docs/book/.gitbook/assets/reports/metric_text_descriptors_correlation_nlc-min.png new file mode 100644 index 0000000000..a4ac893cc6 Binary files /dev/null and b/docs/book/.gitbook/assets/reports/metric_text_descriptors_correlation_nlc-min.png differ diff --git a/docs/book/.gitbook/assets/reports/metric_text_descriptors_correlation_oov-min.png b/docs/book/.gitbook/assets/reports/metric_text_descriptors_correlation_oov-min.png new file mode 100644 index 0000000000..8ffe95dd11 Binary files /dev/null and b/docs/book/.gitbook/assets/reports/metric_text_descriptors_correlation_oov-min.png differ diff --git a/docs/book/.gitbook/assets/reports/metric_text_descriptors_correlation_text_length-min.png b/docs/book/.gitbook/assets/reports/metric_text_descriptors_correlation_text_length-min.png new file mode 100644 index 0000000000..dccc3ede0c Binary files /dev/null and b/docs/book/.gitbook/assets/reports/metric_text_descriptors_correlation_text_length-min.png differ diff --git a/docs/book/.gitbook/assets/reports/metric_text_descriptors_distribution_nlc-min.png b/docs/book/.gitbook/assets/reports/metric_text_descriptors_distribution_nlc-min.png new file mode 100644 index 0000000000..ac682f2507 Binary files /dev/null and b/docs/book/.gitbook/assets/reports/metric_text_descriptors_distribution_nlc-min.png differ diff --git a/docs/book/.gitbook/assets/reports/metric_text_descriptors_distribution_oov-min.png b/docs/book/.gitbook/assets/reports/metric_text_descriptors_distribution_oov-min.png new file mode 100644 index 0000000000..9c9c9bdf9d Binary files /dev/null and b/docs/book/.gitbook/assets/reports/metric_text_descriptors_distribution_oov-min.png differ diff --git a/docs/book/.gitbook/assets/reports/metric_text_descriptors_distribution_text_length-min.png b/docs/book/.gitbook/assets/reports/metric_text_descriptors_distribution_text_length-min.png new file mode 100644 index 0000000000..83bcc2b634 Binary files /dev/null and b/docs/book/.gitbook/assets/reports/metric_text_descriptors_distribution_text_length-min.png differ diff --git a/docs/book/.gitbook/assets/reports/metric_text_descriptors_drift-min.png b/docs/book/.gitbook/assets/reports/metric_text_descriptors_drift-min.png new file mode 100644 index 0000000000..a33af2565b Binary files /dev/null and b/docs/book/.gitbook/assets/reports/metric_text_descriptors_drift-min.png differ diff --git a/docs/book/README.md b/docs/book/README.md index 5c44e79229..23d85981f2 100644 --- a/docs/book/README.md +++ b/docs/book/README.md @@ -2,7 +2,7 @@ Evidently is an open-source Python library for data scientists and ML engineers. It helps evaluate, test, and monitor the performance of ML models from validation to production. -# Quick Start +# Quick Start Quickly check it out (1 min): {% content-ref url="get-started/hello-world.md" %} @@ -29,9 +29,9 @@ You need to provide the data, choose what to evaluate, and the output format. Ev You can integrate Evidently into various ML stacks as a monitoring or evaluation component. -Evidently currently works with tabular data. +Evidently currently works with tabular and text data. -# 1. [Tests](tests/README.md): batch model checks +# 1. Tests suites: batch model checks Tests perform structured data and ML model quality checks. You typically compare two datasets: **reference** and **current**. You can set test parameters manually or let Evidently learn the expectations from the reference. Tests verify a condition and return an explicit **pass** or **fail** result. @@ -52,7 +52,7 @@ Tests are best for automated batch checks. * [User guide: how to generate tests](tests-and-reports/run-tests.md) * [Reference: available tests and presets](reference/all-tests.md) -# 2. [Reports](reports/README.md): interactive dashboards +# 2. Reports: interactive dashboards {% hint style="info" %} We added a new Report object starting from **v0.1.57.dev0**. Reports unite the functionality of Dashboards and JSON profiles with a new, cleaner API. @@ -81,7 +81,7 @@ Reports are best for exploratory analysis, debugging, and documentation. * [User guide: how to run reports](tests-and-reports/get-reports.md) * [Reference: available metrics and metric presets](reference/all-metrics.md) -# 3. [Real-time ML monitoring](integrations/evidently-and-grafana.md) +# 3. Monitors: real-time ML monitoring *Note: this functionality is in early development and subject to an API change*. diff --git a/docs/book/SUMMARY.md b/docs/book/SUMMARY.md index bdb74f7134..e72ad50519 100644 --- a/docs/book/SUMMARY.md +++ b/docs/book/SUMMARY.md @@ -13,6 +13,7 @@ * [Regression Performance](presets/reg-performance.md) * [Classification Performance](presets/class-performance.md) * [NoTargetPerformance](presets/no-target-performance.md) + * [Text Overview](presets/text-overview.md) * [Examples](examples/examples.md) * [Integrations](integrations/README.md) * [Notebook environments](integrations/notebook-environments.md) diff --git a/docs/book/customization/options-for-statistical-tests.md b/docs/book/customization/options-for-statistical-tests.md index 5357bb019a..ded47ddd6b 100644 --- a/docs/book/customization/options-for-statistical-tests.md +++ b/docs/book/customization/options-for-statistical-tests.md @@ -134,3 +134,10 @@ TestShareOfDriftedColumns(lt=0.5) - only for categorical features - returns `p-value` - drift detected when `p_value < threshold` +- `text_content_drift` - Text content drift (domain classifier) + - default for text features + - only for text features + - returns `roc_auc` as drift_score + - drift detected when roc_auc > roc_auc of the random classifier at a set percentile (`threshold`) + - default threshold: 0.05 + - `roc_auc` values can be 0 to 1 (typically 0.5 to 1); higher value mean more confident drift detection diff --git a/docs/book/examples/examples.md b/docs/book/examples/examples.md index 5caf6249cd..654f5ea119 100644 --- a/docs/book/examples/examples.md +++ b/docs/book/examples/examples.md @@ -6,7 +6,7 @@ description: Sample notebooks and tutorials ## Sample notebooks -Simple examples on toy datasets to quickly explore what Evidently can do right out of the box. +Simple examples on toy datasets to show what Evidently can do out of the box. Colab examples contain pre-rendered reports. Title| Jupyter notebook | Colab notebook | Contents --- | --- | --- | --- @@ -24,7 +24,6 @@ Title | Jupyter notebook | Colab notebook | Blog post | Data source Monitor production model decay | [link](../../../examples/data_stories/bicycle_demand_monitoring.ipynb) | [link](https://colab.research.google.com/drive/1xjAGInfh_LDenTxxTflazsKJp_YKmUiD) | [How to break a model in 20 days. A tutorial on production model analytics.](https://evidentlyai.com/blog/tutorial-1-model-analytics-in-production) | Bike sharing UCI: [link](https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset) Compare two models before deployment | [link](../../../examples/data_stories/ibm_hr_attrition_model_validation.ipynb) | [link](https://colab.research.google.com/drive/12AyNh3RLSEchNx5_V-aFJ1_EnLIKkDfr) | [What Is Your Model Hiding? A Tutorial on Evaluating ML Models.](https://evidentlyai.com/blog/tutorial-2-model-evaluation-hr-attrition) | HR Employee Attrition: [link](https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset) Evaluate and visualize historical drift | [link](../../../examples/integrations/mlflow_logging/historical_drift_visualization.ipynb) | [link](https://colab.research.google.com/drive/12AyNh3RLSEchNx5_V-aFJ1_EnLIKkDfr) | [How to detect, evaluate and visualize historical drifts in the data.](https://evidentlyai.com/blog/tutorial-3-historical-data-drift) | Bike sharing UCI: [link](https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset) -Create a custom report (tab) with PSI widget for drift detection | [link](../../../examples/data_stories/california_housing_custom_PSI_widget_and_tab.ipynb) | [link](https://colab.research.google.com/drive/1FuXId8p-lCP9Ho_gHeqxAdoxHRuvY9d0) | --- | California housing sklearn.datasets ## Integrations diff --git a/docs/book/get-started/hello-world.md b/docs/book/get-started/hello-world.md index 0436f10e5c..a3bc4da92f 100644 --- a/docs/book/get-started/hello-world.md +++ b/docs/book/get-started/hello-world.md @@ -26,11 +26,7 @@ $ jupyter nbextension enable evidently --py --sys-prefix Install **Evidently**: ```python -try: - import evidently -except: - !npm install -g yarn - !pip install git+https://github.com/evidentlyai/evidently.git +!pip install evidently ``` # Imports diff --git a/docs/book/get-started/tutorial.md b/docs/book/get-started/tutorial.md index e99a7f3629..9949b096ee 100644 --- a/docs/book/get-started/tutorial.md +++ b/docs/book/get-started/tutorial.md @@ -50,9 +50,9 @@ $ jupyter nbextension enable evidently --py --sys-prefix That's it! -### Google Colab, Kaggle Kernel, Deepnote +### Hosted notebooks -To install `evidently`, run the following command in the notebook cell: +If you are using Google Colab, Kaggle Kernel, Deepnote or Databricks notebooks, run the following command in the notebook cell: ``` !pip install evidently @@ -60,14 +60,14 @@ To install `evidently`, run the following command in the notebook cell: ### Windows -Unfortunately, building visual HTML reports inside a **Jupyter notebook** is **not yet possible** for Windows. You can still install Evidently and get the output as JSON or a separate HTML file. - -To install Evidently, run: +To install Evidently in Jupyter notebook on Windows, run: ```bash $ pip install evidently ``` +**Note**: Nbextension does not work on Windows. If you want to generate visual reports in Jupyter notebook on Windows, you will need to use a different visualization method (see instructions in step 4). This is a new functionality with limited testing. If you face issues, you can get the output as a separate HTML file and view it in a browser. + ## 2. Import Evidently After installing the tool, import `evidently` and the required components. In this tutorial, you will use several **test suites** and **reports**. Each corresponds to a specific type of analysis. @@ -148,7 +148,7 @@ report It will display the HTML report directly in the notebook. {% hint style="info" %} -**Visualizations might work differently in other notebook environments**. If you use Databricks, Kaggle and Deepnote notebooks, you should add an argument to display the report inline: report.show(mode='inline'). Consult [this section](../integrations/notebook-environments.md) for help. +**Note**: If you are using other notebook environments, e.g., Databricks, Kaggle and Deepnote notebooks, or Jypyter notebook on Windows, you should add an argument to display the report inline: report.show(mode='inline'). Consult [this section](../integrations/notebook-environments.md) for help. {% endhint %} First, you can see the Data Drift summary. @@ -173,7 +173,7 @@ The data drift report compares the distributions of each feature in the two data Evidently Reports are very configurable. You can define which Metrics to include and how to calculate them. -To create a custom Report, you need to list individual **Metrics**. Evidently has dozens of Metrics that help evaluate anything from descriptive feature statistics to model quality. You can calculate Metrics on the column level (e.g., mean value of a specific column) or dataset-level (e.g., share of drifted features in the dataset). +To create a custom Report, you need to list individual **Metrics**. Evidently has dozens of Metrics that evaluate anything from descriptive feature statistics to model quality. You can calculate Metrics on the column level (e.g., mean value of a specific column) or dataset-level (e.g., share of drifted features in the dataset). In this example, you can list several Metrics that evaluate individual statistics for the defined column. @@ -191,7 +191,7 @@ You will see a combined report that includes multiple Metrics: ![Part of the custom report, ColumnSummaryMetric.](../.gitbook/assets/tutorial/get-started-column-summary_metric-min.png) -If you want to generate multiple column-level Metrics, there is a helper function. For example, in order to to calculate the same quantile value for all the columns in the list, you can use the generator: +If you want to generate multiple column-level Metrics, there is a helper function. For example, in order to calculate the same quantile value for all the columns in the list, you can use the generator: ``` report = Report(metrics=[ @@ -221,9 +221,9 @@ report ## 6. Define the report output format -You can render the visualizations directly in the notebook as shown above. There are also alternative options. +You can render the visualizations in the notebook as shown above. There are also alternative options. -If you only want to log the calculated metrics and test results, you can export the results as a Python dictionary. +If you only want to log the metrics and test results, you can get the output as a Python dictionary. ```python report.as_dict() @@ -234,7 +234,7 @@ You can also get the output as JSON. report.json() ``` -You can also save HTML or JSON externally. +You can also save HTML or JSON externally and specify a path and file name: ```python report.save_html("file.html") @@ -242,13 +242,13 @@ report.save_html("file.html") ## 7. Run data stability tests -Reports are useful when you want to visually explore the data or model quality or share results with the team. However, it is less convenient if you want to run your checks automatically and only react to meaningful issues. +Reports help visually explore the data or model quality or share results with the team. However, it is less convenient if you want to run your checks automatically and only react to meaningful issues. To integrate Evidently checks in the prediction pipeline, you can use the **Test Suites** functionality. They are also better suited to handle large datasets. Test Suites help compare the two datasets in a structured way. A **Test Suite** contains several individual tests. Each **Test** compares a specific value against a defined condition and returns an explicit pass/fail result. You can apply Tests to the whole dataset or individual columns. -Just like with Reports, you can create a custom Test Suite or use one of the **Presets** that work out of the box. +Just like with Reports, you can create a custom Test Suite or use one of the **Presets**. Let's create a custom one! Imagine you received a new batch of data. Before generating the predictions, you want to check if the quality is good enough to run your model. You can combine several Tests to check missing values, duplicate columns, and so on. @@ -324,19 +324,19 @@ To integrate Evidently checks in the prediction pipeline, you can get the output suite.as_dict() ``` -You can extract necessary information from the JSON or Python dictionary output and design a conditional workflow around it. For example, if some tests fail, you can trigger an alert, retrain the model or generate the report. +You can extract necessary information from the JSON or Python dictionary output and design a conditional workflow around it. For example, if tests fail, you can trigger an alert, retrain the model or generate the report. ## 8. What else is there? * **Go through the steps in more detail** -If you want to walk through all the described steps in more detail, refer to the **User Guide** section of the docs. A good next step is to explore how to pass custom test parameters to define your own [test conditions](../tests-and-reports/custom-test-suite.md). +To understand the described flow in more detail, refer to the **User Guide** section of the docs. A good next step is to explore how to pass custom test parameters to define your own [test conditions](../tests-and-reports/custom-test-suite.md). * **Explore available presets** Both **Tests** and **Reports** have multiple Presets available. Some, like Data Quality, require only input data. You can use them even without the reference dataset. When you have the true labels, you can run Presets like **Regression Performance** and **Classification Performance** to evaluate the model quality and errors. -To understand the contents of each Preset, head to the [Preset overview](../presets). If you want to see the pre-rendered examples of the reports, browse Colab notebooks in the [Examples](../examples/examples.md) section. +To understand the contents of each Preset, head to the [Preset overview](../presets/all-presets.md). If you want to see the pre-rendered examples of the reports, browse Colab notebooks in the [Examples](../examples/examples.md) section. * **Explore available integrations** diff --git a/docs/book/how-to-guides/README.md b/docs/book/how-to-guides/README.md index c018e69385..aa7a46fe5c 100644 --- a/docs/book/how-to-guides/README.md +++ b/docs/book/how-to-guides/README.md @@ -7,8 +7,11 @@ These example notebooks and how-to guides show how to solve specific tasks. Topic | Question| Guide or example | --- | --- | --- Input data | How to load data from different sources to pandas.Dataframes? | -Test and reports | How to generate multiple tests or metrics quickly? | +Test and reports | How to generate multiple Tests or Metrics quickly? | +Test and reports | How to run evaluations on raw text data? | Customization | How to assign a particular method for Data Drift detection?| Customization | How to define a custom list of Missing Values?| -Customization | How to specify a color scheme in Reports and Test Suites? (Needs an update for new API). | -Customization | How to add a custom metric or test? | +Customization | How to specify a color scheme in Reports and Test Suites? (Needs an update). | +Customization | How to add a custom Metric or Test? | +Outputs | How to get Report or Test Suite output in csv? | + diff --git a/docs/book/input-data/column-mapping.md b/docs/book/input-data/column-mapping.md index c04fa07bd5..3d10449455 100644 --- a/docs/book/input-data/column-mapping.md +++ b/docs/book/input-data/column-mapping.md @@ -69,6 +69,17 @@ column_mapping.categorical_features = ['season', 'holiday'] #list of categorical **Why map them:** the column types affect some of the tests, metrics and visualizations. For example, the [drift algorithm](../reference/data-drift-algorithm.md) selects a statistical test based on the column type and ignores DateTime features. Some of the data quality visualizations are different for specific feature types. Some of the tests (e.g. on value ranges) only considers numeral columns, etc. {% endhint %} +## Text data + +To specify that columns contain raw text data: + +```python +column_mapping.text_features = ['email_subject', 'email_body'] +``` + +**Why map them:** if you want to apply text-specific drift detection methods or call other metrics relevant to text data, you should specify them explicitly. Text columns are also excluded from certain tests and metrics similar to ID column. +{% endhint %} + # Additional mapping options There are additional mapping options that apply to specific test suites and reports. diff --git a/docs/book/input-data/data-requirements.md b/docs/book/input-data/data-requirements.md index 84b37b91a0..a59544cd2e 100644 --- a/docs/book/input-data/data-requirements.md +++ b/docs/book/input-data/data-requirements.md @@ -37,7 +37,7 @@ To use Evidently, you need a dataset that contains model prediction logs. It mig * Input feature columns * Prediction column -* Target column +* Target column (if known) * Additional columns such as DateTime and ID The exact schema requirements differ based on the contents of the report or test suite. For example, to evaluate Data Drift or Data Quality, you can pass only the feature columns. To evaluate Model Performance, you also need model prediction and target (true labels or actuals). @@ -58,4 +58,4 @@ If the dataset is too large, you might need to downsample it before passing the ## Supported data types -Right now, Evidently works only with tabular data. We are working to cover other data types. +Right now, Evidently works with tabular and raw text data. You can also pass a dataset that contains different data types: for example, some columns may contain numerical or categorical data, while others contain text. diff --git a/docs/book/installation/install-evidently.md b/docs/book/installation/install-evidently.md index 4f6727cc7b..d2f09fd6f5 100644 --- a/docs/book/installation/install-evidently.md +++ b/docs/book/installation/install-evidently.md @@ -1,10 +1,8 @@ -# Install Evidently +# Installing from PyPI -## Installing from PyPI +## MAC OS and Linux -### MAC OS and Linux - -Evidently is available as a PyPI package. +Evidently is available as a PyPI package. To install it using the pip package manager, run: @@ -12,11 +10,7 @@ To install it using the pip package manager, run: $ pip install evidently ``` -The tool helps build interactive reports in a Jupyter notebook or as a separate HTML file, and generate JSON profiles. - -If you only want to generate **interactive reports as HTML files or JSON profiles**, the installation is now complete. - -To display dashboards **in a Jupyter notebook**, we use jupyter nbextension. If you want to see reports inside a Jupyter notebook, then after installing `evidently` you should run the **two following commands** in the terminal from the Evidently directory. +To display dashboards **in a Jupyter notebook**, we use jupyter nbextension. If you want to display reports inside a Jupyter notebook, then after installing `evidently` you should run the **two following commands** in the terminal from the Evidently directory. To install jupyter nbextension, run: @@ -30,19 +24,19 @@ To enable it, run: $ jupyter nbextension enable evidently --py --sys-prefix ``` -That's it! +That's it! A single run after the installation is enough. {% hint style="info" %} -**Note**: a single run after the installation is enough. There is no need to repeat the last two commands every time. +**Note**: if you **do not install nbextension**, you can still use Evidently. You can get the outputs as JSON, Python dictionary, or generate standalone HTML files to view in the browser. {% endhint %} {% hint style="info" %} **Note**: if you use **Jupyter Lab**, you may experience difficulties with exploring reports inside a Jupyter notebook. However, the report generation in a separate HTML file will work correctly. {% endhint %} -#### Google Colab, Kaggle Kernel, Deepnote +## Hosted notebooks -You can run `evidently` in [Google Colab](https://colab.research.google.com), [Kaggle Notebook](https://www.kaggle.com/code) and [Deepnote](https://deepnote.com). +You can run `evidently` in [Google Colab](https://colab.research.google.com), [Kaggle Notebook](https://www.kaggle.com/code), [Deepnote](https://deepnote.com) or Databricks notebooks. To install `evidently`, run the following command in the notebook cell: @@ -50,11 +44,11 @@ To install `evidently`, run the following command in the notebook cell: !pip install evidently ``` -There is no need to enable nbextension for this case. `Evidently` uses an alternative way to display visuals in the hosted notebooks. +There is no need to enable nbextension for this case. `Evidently` uses an alternative way to display visuals in the hosted notebooks. Consult [this section](../integrations/notebook-environments.md) for help. -### Windows +## Windows -Evidently is available as a PyPI package. +Evidently is available as a PyPI package. To install it using the pip package manager, run: @@ -62,10 +56,4 @@ To install it using the pip package manager, run: $ pip install evidently ``` -The tool helps build interactive reports in a Jupyter notebook or as a separate HTML file, and generate JSON profiles. - -Unfortunately, building reports inside a **Jupyter notebook** is **not yet possible** for Windows. The reason is Windows requires administrator privileges to create symlink. In later versions, we will address this issue. - - - -### +**Note**: Nbextension does not work on Windows. If you want to generate visual reports in Jupyter notebook on Windows, you will need to use a different visulization method when calling the report. Consult [this section](../integrations/notebook-environments.md) for help. This is a new functionality with limited testing. If you face issues, you can get the output as a separate HTML file and view it in a browser. diff --git a/docs/book/integrations/notebook-environments.md b/docs/book/integrations/notebook-environments.md index c6c0445c8f..e7716da0b3 100644 --- a/docs/book/integrations/notebook-environments.md +++ b/docs/book/integrations/notebook-environments.md @@ -2,41 +2,86 @@ description: Using Evidently in Colab and other notebook environments. --- -## Jupyter notebooks +You can use Evidently Python library to generate visual HTML reports, JSON, and Python dictionary output directly in the notebook environment. You can also save the HTML reports externally and open them in the browser. -You can generate the dashboards in **Jupyter notebooks**. +By default, Evidently is tested to work in **Jupyter notebook** on MAC OS and Linux and **Google Colab**. + +Generating visual reports might work differently in other notebook environments. + +# Jupyter notebooks + +You can generate the visual reports in **Jupyter notebooks** on MAC OS and Linux. {% hint style="info" %} -If you want to display the dashboards in Jupyter notebook, make sure you [installed](../get-started/install-evidently.md) the Jupyter **nbextension**. +If you want to display the dashboards in Jupyter notebook, make sure that in addition to installing Evidently you [installed](../installation/install-evidently.md) the Jupyter **nbextension**. It will be used for visualizations. {% endhint %} -## Colab, Kaggle, Deepnote +You should then follow the steps described in the User Guide to [generate reports](../tests-and-reports/get-reports.md) and [run test suites](../tests-and-reports/run-tests.md). -You can also use **Google Colab**, **Kaggle Kernel**, or **Deepnote**. +# Google Colab -To install `evidently` in these environments, run the following command in the notebook cell: +You can also generate visual reports in **Google Collaboratory**. + +To install `evidently`, run the following command in the notebook cell: ``` !pip install evidently ``` -You should then follow the steps described in the User Guide to [get reports](get-reports.md) and [run tests](run-tests.md). +Then follow the steps described in the User Guide. + +# Other notebook environments -**Troubleshooting**: +You can also use Evidently in other notebook environments, including **Jupyter notebooks on Windows**, **Jupyter lab** and hosted notebooks such as **Kaggle Kernel**, **Databricks** or **Deepnote** notebooks. Consult the [installation instructions for details](../installation/install-evidently.md). -Sometimes, you might need to explicitly add an argument to display the report inline: +For most hosted environments, you would need to run the following command in the notebook cell: ``` -iris_data_drift_report.show(mode='inline'). +!pip install evidently ``` -The `show()` method has the argument `mode` which can take the following options: +**Note**: Nbextension is not available on Windows and in hosted notebook environments. Evidently will use a different visualization method in this case. However, it is not possible to thoroughly test in all different environments, and the ability to generate visual reports is **not guaranteed**. -* **auto** - the default option. Ideally, you will not need to specify the value for `mode` and can use the default. But if it does not work (in case we failed to determine the environment automatically), consider setting the correct value explicitly. -* **nbextention** - to show the UI using nbextension. Use this option to display dashboards in Jupyter notebooks (it should work automatically). -* **inline** - to insert the UI directly into the cell. Use this option for Google Colab, Kaggle Kernels, and Deepnote. +## Visual reports in the notebook cell + +To get the visual reports in different notebook environments, you should explicitly add an argument to display the report or test suite `inline` when calling it: + +```python +report.show(mode='inline') +``` + +You can also use this method if you cannot install nbextension. +Here is a complete example of how you can call the report after installation, imports, and data preparation: -## Jupyter Lab +```python +report = Report(metrics=[ + DataDriftPreset(), +]) -If you use **Jupyter Lab**, you won't be able to explore the reports inside a Jupyter notebook. However, the report generation in a separate HTML file will work correctly. +report.run(reference_data=reference, current_data=current) +report.show(mode='inline') +``` + +## Standalone HTML + +If the report does not appear in the cell, consider generating a standalone HTML file and opening it in a browser. + +```python +report = Report(metrics=[ + DataDriftPreset(), +]) + +report.run(reference_data=reference, current_data=current) +report.save_html("file.html") +``` + +You can also specify the path where to save the file. + +## Troubleshooting + +The `show()` method has the argument `mode` which can take the following options: + +* **auto** - the default option. Ideally, you will not need to specify the value for `mode` and can use the default. But if it does not work (in case Evidently failed to determine the environment automatically), consider setting the correct value explicitly. +* **nbextention** - to show the UI using nbextension. Use this option to display dashboards in Jupyter notebooks (it should work automatically). +* **inline** - to insert the UI directly into the cell. Use this option for Google Colab (it should work automatically), Kaggle Kernels, Databricks and Deepnote. diff --git a/docs/book/presets/all-presets.md b/docs/book/presets/all-presets.md index 9474f10ddd..c2698eb440 100644 --- a/docs/book/presets/all-presets.md +++ b/docs/book/presets/all-presets.md @@ -4,20 +4,16 @@ description: An overview of the evaluations you can do with Evidently. Evidently has several pre-built reports and test suites. We call them **Presets**. Each preset evaluates or tests a particular aspect of the data or model quality. -This page links to the **description** of each preset. To see the code and interactive examples in Jupyter notebook or Colab, head here instead: - -{% content-ref url="../examples/examples.md" %} -[Examples](../examples/examples.md). -{% endcontent-ref %} +This page links to the **description** of each preset. To see the code and interactive examples, head to [example notebooks](../examples/examples.md) instead. # Metric Presets -Metric presets are **pre-built reports** that help with visual exploration, debugging and documentation of the data and model performance. +Metric presets are **pre-built reports** that help with visual exploration, debugging and documentation of the data and model performance. You can also use them to calculate and log metrics as JSON or Python dictionary. | | | | | ------- | ------------------------------------------------------ | - | | [**Data Quality**](data-quality.md)

Shows the dataset statistics and feature behavior.

**Requirements**: model inputs. | [**Data Drift**](data-drift.md)

Explores the distribution shift in the model features.

**Requirements**: model inputs, a reference dataset. | [**Target Drift**](target-drift.md)

Explores the distribution shift in the model predictions.

**Requirements:** model predictions and/or target, a reference dataset. | -| [**Classification**](class-performance.md)

Evaluates the classification model quality and errors.

**Requirements**: model predictions and true labels. | [**Regression**](reg-performance.md)

Evaluates the regression model quality and errors.

**Requirements**: model predictions and actuals. | | +| [**Classification**](class-performance.md)

Evaluates the classification model quality and errors.

**Requirements**: model predictions and true labels. | [**Regression**](reg-performance.md)

Evaluates the regression model quality and errors.

**Requirements**: model predictions and actuals. | [**Text Overview**](text-overview.md)

Evaluates text data drift and descriptive statistics.

**Requirements**: model inputs (raw text data) | # Test Presets diff --git a/docs/book/presets/class-performance.md b/docs/book/presets/class-performance.md index 2e343eb420..ce89ec7381 100644 --- a/docs/book/presets/class-performance.md +++ b/docs/book/presets/class-performance.md @@ -1,7 +1,7 @@ **TL;DR:** You can use the pre-built Reports and Test suites to analyze the performance of a classification model. The Presets work for binary and multi-class classification, probabilistic and non-probabilistic classification. -* For visual analysis using Reports, use the `ClassificationPreset`. -* For pipeline checks using Test Suites, use the `MulticlassClassificationTestPreset`, `BinaryClassificationTopKTestPreset` or `BinaryClassificationTestPreset`. +* **Report**: for visual analysis or metrics export, use the `ClassificationPreset`. +* **Test Suite**: for pipeline checks, use the `MulticlassClassificationTestPreset`, `BinaryClassificationTopKTestPreset` or `BinaryClassificationTestPreset`. # Use Case @@ -46,7 +46,7 @@ This report evaluates the quality of a classification model. To run this report, you need to have **both target and prediction** columns available. Input features are optional. Pass them if you want to explore the relations between features and target. -Refer to the [column mapping section](../tests-and-reports/column-mapping.md) to see how to pass model predictions and labels in different cases. +Refer to the [column mapping section](../input-data/column-mapping.md) to see how to pass model predictions and labels in different cases. The tool does not yet work for multi-label classification. It expects a single true label. @@ -300,5 +300,5 @@ Head here to the [All tests](../reference/all-tests.md) table to see the composi ## Examples -* Browse the [examples](../get-started/examples.md) for sample Jupyter notebooks and Colabs. +* Browse the [examples](../examples/examples.md) for sample Jupyter notebooks and Colabs. * See a blog post and a tutorial "[What is your model hiding](https://evidentlyai.com/blog/tutorial-2-model-evaluation-hr-attrition)" where we analyze the performance of two models with identical ROC AUC to choose between the two. diff --git a/docs/book/presets/data-drift.md b/docs/book/presets/data-drift.md index 4e0707ad09..bd55f58ce8 100644 --- a/docs/book/presets/data-drift.md +++ b/docs/book/presets/data-drift.md @@ -1,7 +1,7 @@ -**TL;DR:** You can detect and analyze changes in the input feature distributions. +**TL;DR:** You can detect and analyze changes in the input feature distributions. -* For visual analysis using Reports, use the `DataDriftPreset`. -* For pipeline checks using Test Suites, use the `DataDriftTestPreset`. +* **Report**: for visual analysis or metrics export, use the `DataDriftPreset`. +* **Test Suite**: for pipeline checks, use the `DataDriftTestPreset`. # Use Case @@ -36,7 +36,7 @@ data_drift_report The **Data Drift** report helps detect and explore changes in the input data. -* Applies as suitable **drift detection method** for numerical and categorical features. +* Applies as suitable **drift detection method** for numerical, categorical or text features. * Plots **feature values and distributions** for the two datasets. ## Data Requirements @@ -45,7 +45,7 @@ The **Data Drift** report helps detect and explore changes in the input data. * **Input features**. The dataset should include the features you want to evaluate for drift. The schema of both datasets should be identical. If your dataset contains target or prediction column, they will also be analyzed for drift. -* **Column mapping**. Evidently can evaluate drift both for numerical and categorical features. You can explicitly specify the type of the column in column mapping. If it is not specified, Evidently will define the column type automatically. +* **Column mapping**. Evidently can evaluate drift both for numerical, categorical and text features. You can explicitly specify the type of each column using [column mapping object](../input-data/column-mapping.md). If it is not specified, Evidently will try to identify the numerical and categorical features automatically. It is recommended to use column mapping to avoid errors. If you have text data, you must always specify it. ## How it looks @@ -62,7 +62,7 @@ Dataset Drift sets a rule on top of the results of the statistical tests for ind Evidently uses the default [data drift detection algorithm](../reference/data-drift-algorithm.md) to select the drift detection method based on feature type and the number of observations in the reference dataset. {% hint style="info" %} -You can modify the drift detection logic by selecting a different method, including PSI, K–L divergence, Jensen-Shannon distance, Wasserstein distance, setting a different threshold and condition for the dataset drift. See more details about [setting data drift options](../customization/options-for-statistical-tests.md). You can also implement a [custom drift detection method](../customization/add-custom-metric-or-test.md). +You can modify the drift detection logic by selecting a different method, including PSI, K–L divergence, Jensen-Shannon distance, Wasserstein distance, setting a different threshold and condition for the dataset drift. See more details about [setting data drift parameters](../customization/options-for-statistical-tests.md). You can also implement a [custom drift detection method](../customization/add-custom-metric-or-test.md). {% endhint %} To build up a better intuition for which tests are better in different kinds of use cases, visit our blog to read [an in-depth guide](https://evidentlyai.com/blog/data-drift-detection-large-datasets) to the tradeoffs when choosing the statistical test for data drift. @@ -75,7 +75,7 @@ The table shows the drifting features first. You can also choose to sort the row ### 3. Data Distribution by Feature -By clicking on each feature, you can explore the distributions. +By clicking on each feature, you can explore the distributions or top characteristic words (for text features). ![](../.gitbook/assets/reports/metric_data_drift_table_expand_1-min.png) @@ -188,7 +188,7 @@ If you want to compare descriptive statistics between the two datasets, you can # Examples -* Browse the [examples](../get-started/examples.md) for sample Jupyter notebooks and Colabs. +* Browse the [examples](../examples/examples.md) for sample Jupyter notebooks and Colabs. You can also explore [blog posts](https://www.evidentlyai.com/tags/data-drift) about drift detection, including [How to handle drift](https://www.evidentlyai.com/blog/ml-monitoring-data-drift-how-to-handle) or [how to analyze historical drift patterns](https://evidentlyai.com/blog/tutorial-3-historical-data-drift). diff --git a/docs/book/presets/data-quality.md b/docs/book/presets/data-quality.md index 8fa4956848..18fe95719e 100644 --- a/docs/book/presets/data-quality.md +++ b/docs/book/presets/data-quality.md @@ -1,7 +1,7 @@ **TL;DR:** You can explore and track various dataset and feature statistics. -* For visual analysis using Reports, use the `DataQualityPreset`. -* For pipeline checks using Test Suites, use the `DataQualityTestPreset`or `DataStabilityTestPreset`. +* **Report**: for visual analysis or metrics export, use the `DataQualityPreset`. +* **Test Suite**: for pipeline checks, use the `DataQualityTestPreset`or `DataStabilityTestPreset`. # Use Cases @@ -48,15 +48,16 @@ The Data Quality report provides detailed feature statistics and a feature behav * **Input features**. You need to pass only the input features. Target and prediction are optional. * **One or two datasets**. If you want to perform a side-by-side comparison, pass two datasets with identical schema. You can also pass a single dataset. -* **Column mapping**. Feature types (numerical, categorical, datetime) will be parsed based on pandas column type. If you want to specify a different feature mapping strategy, you can explicitly set the feature type using `column_mapping`. +* **Column mapping**. Feature types (numerical, categorical, datetime) will be parsed based on pandas column type. If you want to specify a different feature mapping strategy, you can explicitly set the feature type using `column_mapping`. You might also need to specify additional column mapping: -* If you have a **datetime** column and want to learn how features change with time, specify the datetime column in the `column_mapping`. +* If you have a **datetime** index column and want to learn how features change with time, specify the datetime column in the `column_mapping`. * If you have a **target** column and want to see features distribution by target, specify the target column in the `column_mapping`. * Specify the **task** if you want to explore interactions between the features and the target. This section looks slightly different for classification and regression tasks. By default, if the target has a numeric type and has >5 unique values, Evidently will treat it as a regression problem. Everything else is treated as a classification problem. If you want to explicitly define your task as `regression` or `classification`, you should set the `task` parameter in the `column_mapping` object. +* If you have **text** features, you should specify it in the column mapping to generate descriptive statistics specific to text. {% hint style="info" %} -You can read more to understand [column mapping](../tests-and-reports/column-mapping.md) and [data requirements](../tests-and-reports/input-data.md) for Evidently reports in the corresponding sections of documentation. +You can read more to understand [column mapping](../input-data/column-mapping.md) and [data requirements](../input-data/data-requirements.md) for Evidently reports in the corresponding sections of documentation. {% endhint %} ## How it looks @@ -89,6 +90,10 @@ The table shows relevant statistical summaries for each feature based on its typ ![](../.gitbook/assets/reports_data_quality_overview_datetime.png) +##### Example for a text feature: + +![](../.gitbook/assets/reports/metric_column_summary_text-min.png) + #### 2.2. Feature in time If you click on "details", each feature would include additional visualization to show feature behavior in time. @@ -309,4 +314,4 @@ You can use the `DataStabilityTestPreset` when you receive a new batch of input ## Examples -* Browse our [example](../get-started/examples.md) notebooks to see sample Reports and Test Suites. +* Browse our [example](../examples/examples.md) notebooks to see sample Reports and Test Suites. diff --git a/docs/book/presets/no-target-performance.md b/docs/book/presets/no-target-performance.md index 33f16fc317..7c767c3427 100644 --- a/docs/book/presets/no-target-performance.md +++ b/docs/book/presets/no-target-performance.md @@ -1,6 +1,6 @@ **TL;DR:** You can combine different checks to test data quality, stability, and drift when you have a model with delayed feedback. -For Test Suite, use the `NoTargetPerformanceTestPreset`. +* **Test Suite**: for pipeline checks, use the `NoTargetPerformanceTestPreset`. # Use Case @@ -27,7 +27,9 @@ no_target_performance ## Data requirements -You need to provide **two** datasets with identical schema. They should include input features and predictions. The **reference** dataset serves as a benchmark (e.g., previous data batch). Evidently analyzes the change by comparing the **current** production data to the **reference** data. +* You need to provide **two** datasets with identical schema. The **reference** dataset serves as a benchmark (e.g., previous data batch). Evidently analyzes the change by comparing the **current** production data to the **reference** data. +* They should include **input features and predictions**. +* **Column mapping**. You can explicitly specify the types input columns and target using [column mapping](../input-data/column-mapping.md). If it is not specified, Evidently will try to guess the types automatically. It is recommended to use column mapping to avoid errors. If you have text features, you must always specify this in column mapping. ## How it works @@ -61,4 +63,4 @@ Head here to the [All tests](../reference/all-tests.md) table to see the descrip ## Examples -* Browse the [example](../get-started/examples.md) notebooks to see sample Test Suite. +* Browse the [example](../examples/examples.md) notebooks to see a sample Test Suite. diff --git a/docs/book/presets/reg-performance.md b/docs/book/presets/reg-performance.md index 638ee689bb..3d32967294 100644 --- a/docs/book/presets/reg-performance.md +++ b/docs/book/presets/reg-performance.md @@ -1,7 +1,7 @@ **TL;DR:** You can monitor and analyze the performance of a regression model. -* For visual analysis using Reports, use `RegressionPreset`. -* For pipeline checks using Test Suites, use `RegressionTestPreset`. +* **Report**: for visual analysis or metrics export, use the `RegressionPreset`. +* **Test Suite**: for pipeline checks, use the `RegressionTestPreset`. # Use case @@ -43,9 +43,9 @@ It can also compare the performance against the past, or the performance of an a ## Data Requirements -To run this report, you need to have input features, and **both target and prediction** columns available. Input features are optional. Pass them if you want to explore the relations between features and target. +* To run this report, you need to have input features, and **both target and prediction** columns available. Input features are optional. Pass them if you want to explore the relations between features and target. -To generate a comparative report, you will need **two** datasets. The **reference** dataset serves as a benchmark. Evidently analyzes the change by comparing the **current** production data to the **reference** data. +* To generate a comparative report, you will need **two** datasets. The **reference** dataset serves as a benchmark. Evidently analyzes the change by comparing the **current** production data to the **reference** data. ![](<../.gitbook/assets/two\_datasets\_regression (1).png>) @@ -309,5 +309,5 @@ Head here to the [All tests](../reference/all-tests.md) table to see the composi # Examples -* Browse the [examples](../get-started/examples.md) for sample Jupyter notebooks and Colabs. +* Browse the [examples](../examples/examples.md) for sample Jupyter notebooks and Colabs. * See a tutorial "[How to break a model in 20 days](https://evidentlyai.com/blog/tutorial-1-model-analytics-in-production)" where we create a demand prediction model and analyze its gradual decay. diff --git a/docs/book/presets/target-drift.md b/docs/book/presets/target-drift.md index 9901aab4c5..987af1db4d 100644 --- a/docs/book/presets/target-drift.md +++ b/docs/book/presets/target-drift.md @@ -1,7 +1,7 @@ **TL;DR:** You can detect and explore changes in the target function (prediction) and detect distribution drift. -* For visual analysis using Reports, use `TargetDriftPreset`. -* For pipeline checks using Test Suites, use a `TestColumnDrift` test and apply it to the prediction or target column. Since it is a single test, there is no need for a Preset. +* **Report**: for visual analysis or metrics export, use the`TargetDriftPreset`. +* **Test Suite**: for pipeline checks, use a `TestColumnDrift` test and apply it to the prediction or target column. Since it is a single test, there is no need for a Preset. # Use case @@ -9,9 +9,9 @@ You can analyze target or prediction drift: **1. To monitor the model performance without ground truth.** When you do not have true labels or actuals, you can monitor Prediction Drift to react to meaningful changes. For example, to detect when there is a distribution shift in predicted values, probabilities, or classes. You can often combine it with the [Data Drift analysis.](data-drift.md) -**2. When you are debugging the model decay.** If you observe a drop in performance, you can evaluate Target Drift to see how the behavior of the target changed and explore the shift in the relationship between the features and prediction. +**2. When you are debugging the model decay.** If you observe a drop in performance, you can evaluate Target Drift to see how the behavior of the target changed and explore the shift in the relationship between the features and prediction (target). -**3. Before model retraining.** Before feeding fresh data into the model, you might want to verify whether it even makes sense. If there is no target drift, the concept is stable, and retraining might not be necessary. +**3. Before model retraining.** Before feeding fresh data into the model, you might want to verify whether it even makes sense. If there is no target drift and no data drift, the retraining might not be necessary. To run drift checks as part of the pipeline, use the Test Suite. To explore and debug, use the Report. @@ -41,11 +41,11 @@ You can generate this preset both for numerical targets (e.g. if you have a regr ## Data Requirements -To run this preset, you need to have **target and/or prediction** columns available. Input features are optional. Pass them if you want to analyze the correlations between the features and target (prediction). +* You will need **two** datasets. The **reference** dataset serves as a benchmark. Evidently analyzes the change by comparing the **current** production data to the **reference** data. -Evidently estimates the drift for the **target** and **predictions** in the same manner. If you pass both columns, Evidently will generate two sets of plots. If you pass only one of them (either target or predictions), Evidently will build one set of plots. +* To run this preset, you need to have **target and/or prediction** columns available. Input features are optional. Pass them if you want to analyze the correlations between the features and target (prediction). Evidently estimates the drift for the **target** and **predictions** in the same manner. If you pass both columns, Evidently will generate two sets of plots. If you pass only one of them (either target or predictions), Evidently will build one set of plots. -You will need **two** datasets. The **reference** dataset serves as a benchmark. Evidently analyzes the change by comparing the **current** production data to the **reference** data. +* **Column mapping**. Evidently can evaluate drift both for numerical and categorical targets. You can explicitly specify the type of target using the task parameter in [column mapping](../input-data/column-mapping.md). If it is not specified, Evidently will try to identify the target type automatically. It is recommended to use column mapping to avoid errors. ## How it looks @@ -60,7 +60,7 @@ Evidently uses the default [data drift detection algorithm](../reference/data-dr ![](<../.gitbook/assets/num\_targ\_drift (1).png>) {% hint style="info" %} -You can modify the drift detection logic by selecting a different method already available in the library, including PSI, K–L divergence, Jensen-Shannon distance, Wasserstein distance, and/or by setting a different threshold. See more details about [setting data drift options](../customization/options-for-statistical-tests.md). You can also implement a [custom drift detection method](../customization/add-custom-metric-or-test.md). +You can modify the drift detection logic by selecting a different method already available in the library, including PSI, K–L divergence, Jensen-Shannon distance, Wasserstein distance, and/or by setting a different threshold. See more details about [setting data drift parameters](../customization/options-for-statistical-tests.md). You can also implement a [custom drift detection method](../customization/add-custom-metric-or-test.md). {% endhint %} ### 2. Target (Prediction) Correlations @@ -147,4 +147,4 @@ You can get the report output as a JSON or a Python dictionary: # Examples -* Browse the [examples](../get-started/examples.md) for sample Jupyter notebooks and Colabs. +* Browse the [examples](../examples/examples.md) for sample Jupyter notebooks and Colabs. diff --git a/docs/book/presets/text-overview.md b/docs/book/presets/text-overview.md new file mode 100644 index 0000000000..af479ce264 --- /dev/null +++ b/docs/book/presets/text-overview.md @@ -0,0 +1,123 @@ +**TL;DR:** You can explore and compare text datasets. + +* **Report**: for visual analysis or metrics export, use the `TextOverviewPreset`. + +# Use case + +You can evaluate and explore text data: + +**1. To monitor input data for NLP models.** When you do not have true labels or actuals, you can monitor changes in the input data (data drift) and descriptive text characteristics. You can run batch checks, for example, comparing the latest batch of text data to earlier or training data. You can often combine it with evaluating [Prediction Drift](target-drift.md). + +**2. When you are debugging the model decay.** If you observe a drop in the model performance, you can use this report to understand changes in the input data patterns. + +**3. Exploratory data analysis.** You can use the visual report to explore the text data you want to use for training. You can also use it to compare any two datasets. + +# Text Overview Report + +If you want to visually explore the text data, you can create a new Report object and use the `TextOverviewPreset`. + +## Code example + +```python +text_overview_report = Report(metrics=[ + TextOverviewPreset(column_name="Review_Text") +]) + +text_overview_report.run(reference_data=ref, current_data=cur) +text_overview_report +``` + +Note that to calculate text-related metrics, you must also import additional libraries: + +``` +import nltk +nltk.download('words') +nltk.download('wordnet') +nltk.download('omw-1.4') +``` + +## How it works + +The `TextOverviewPreset` provides an overview and comparison of text datasets. +* Generates a **descriptive summary** of the text columns in the dataset. +* Performs **data drift detection** to compare the two texts using the domain classifier approach. +* Shows distributions of the **text descriptors** in two datasets, and their **correlations** with other features. +* Performs **drift detection for text descriptors**. + +## Data Requirements + +* You can pass **one or two** datasets. The **reference** dataset serves as a benchmark. Evidently analyzes the change by comparing the **current** production data to the **reference** data. If you pass a single dataset, there will be no comparison. + +* To run this preset, you must have **text columns** in your dataset. Additional features and prediction/target are optional. Pass them if you want to analyze the correlations with text descriptors. + +* **Column mapping**. You must explicitly specify the columns that contain text features in [column mapping](../input-data/column-mapping.md) to run this report. + +## How it looks + +The report includes 5 components. All plots are interactive. + +### 1. Text Column Summary + +The report first shows the **descriptive statistics** for the text column(s). + +![](<../.gitbook/assets/reports/metric_column_summary_text-min.png>) + +### 2. Text Descriptors Distribution + +The report generates several features that describe different text properties and shows the distributions of these text descriptors. + +#### Text length + +![](<../.gitbook/assets/reports/metric_text_descriptors_distribution_text_length-min.png>) + +#### Non-letter characters + +![](<../.gitbook/assets/reports/metric_text_descriptors_distribution_nlc-min.png>) + +#### Out-of-vocabulary words + +![](<../.gitbook/assets/reports/metric_text_descriptors_distribution_oov-min.png>) + +### 3. Text Descriptors Correlations + +If the dataset contains numerical features and/or target, the report will show the **correlations between features and text descriptors** in the current and reference dataset. It helps detects shifts in the relationship. + +#### Text length + +![](<../.gitbook/assets/reports/metric_text_descriptors_correlation_text_length-min.png>) + +#### Non-letter characters + +![](<../.gitbook/assets/reports/metric_text_descriptors_correlation_nlc-min.png>) + +#### Out-of-vocabulary words + +![](<../.gitbook/assets/reports/metric_text_descriptors_correlation_oov-min.png>) + + +### 4. Text Column Drift + +If you pass two datasets, the report performs drift detection using the default [data drift method for texts](../reference/data-drift-algorithm.md) (domain classifier). It returns the ROC AUC of the binary classifier model that can discriminate between reference and current data. If the drift is detected, it also shows the top words that help distinguish between the reference and current dataset. + +![](<../.gitbook/assets/reports/metric_column_drift_text-min.png>) + +### 5. Text Descriptors Drift + +If you pass two datasets, the report also performs drift detection for text descriptors to show statistical shifts in patterns between test characteristics. + +![](<../.gitbook/assets/reports/metric_text_descriptors_drift-min.png>) + +## Metrics output + +You can also get the report output as a JSON or a Python dictionary. + +## Report customization + +* You can [specify a different drift detection threshold](../customization/options-for-statistical-tests.md). +* You can use a [different color schema for the report](../customization/options-for-color-schema.md). +* You can create a different report or test suite from scratch, taking this one as an inspiration. + +# Examples + +* Head to an [example how-to notebook](https://github.com/evidentlyai/evidently/blob/main/examples/how_to_questions/how_to_run_calculations_over_text_data.ipynb) to see an example Text Overview preset and other metrics and tests for text data. + diff --git a/docs/book/reference/all-metrics.md b/docs/book/reference/all-metrics.md index ef0c1d5a32..05dbd6a281 100644 --- a/docs/book/reference/all-metrics.md +++ b/docs/book/reference/all-metrics.md @@ -37,6 +37,7 @@ We are doing our best to maintain this page up to date. In case of discrepancies | `TargetDriftPreset` | Evaluates the prediction or target drift.

Target or prediction is required. Input features are optional.

**Contents**:
`ColumnDriftMetric(column_name=target, prediction)`
`ColumnCorrelationsMetric(column_name=target, prediction)`
`TargetByFeaturesTable(columns=columns)` or `all` if not listed

If regression:
`ColumnValuePlot(column_name=target, prediction)` | **Optional**: [How to set data drift parameters](../customization/options-for-statistical-tests.md). | | `RegressionPreset` | Evaluates the quality of a regression model.

Prediction and target are required. Input features are optional.

**Contents**:
`RegressionQualityMetric()`
`RegressionPredictedVsActualScatter()`
`RegressionPredictedVsActualPlot()`
`RegressionErrorPlot()`
`RegressionAbsPercentageErrorPlot()`
`RegressionErrorDistribution()`
`RegressionErrorNormality()`
`RegressionTopErrorMetric()`
`RegressionErrorBiasTable(columns=columns)`or `all` if not listed | **Optional**:
`columns` | | `ClassificationPreset` | Evaluates the quality of a classification model.

Prediction and target are required. Input features are optional.

**Contents**:
`ClassificationQualityMetric()`
`ClassificationClassBalance()`
`ClassificationConfusionMatrix()`
`ClassificationQualityByClass()`

If probabilistic classification, also:
`ClassificationClassSeparationPlot()`
`ClassificationProbDistribution()`
`ClassificationRocCurve()`
`ClassificationPRCurve()`
`ClassificationPRTable()`
`ClassificationQualityByFeatureTable(columns=columns)` or `all` if not listed | **Optional**: | +| `TextOverviewPreset(column_name=”text”)` | Evaluates data drift and descriptive statistics for text data.

Input features (text) are required.

**Contents**:
`ColumnSummaryMetric()`
`TextDescriptorsDistribution()`
`TextDescriptorsCorrelation()`

If reference data is provided, also:
`ColumnDriftMetric()`
`TextDescriptorsDriftMetric()`| **Required**:
`column_name` | # Data Integrity @@ -49,7 +50,7 @@ DatasetMissingValuesMetric(missing_values=["", 0, "n/a", -9999, None], replace=T |---|---|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `DatasetSummaryMetric()` | Dataset-level.

Calculates various descriptive statistics for the dataset, incl. the number of columns, rows, cat/num features, missing values, empty values, and duplicate values. | **Required**:
n/a

**Optional:** | | `DatasetMissingValuesMetric()` | Dataset-level.

Calculates the number and share of missing values in the dataset. Displays the number of missing values per column. | **Required**:
n/a

**Optional:**| -| `ColumnSummaryMetric(column_name="age")` | Column-level.

Calculates various descriptive statistics for the column, incl. the number of missing, empty, duplicate values, etc.

The stats depend on the column type: numerical, categorical, or DateTime. | **Required**:
`column_name`

**Optional:**
n/a| +| `ColumnSummaryMetric(column_name="age")` | Column-level.

Calculates various descriptive statistics for the column, incl. the number of missing, empty, duplicate values, etc.

The stats depend on the column type: numerical, categorical, text or DateTime. | **Required**:
`column_name`

**Optional:**
n/a| | `ColumnMissingValuesMetric(column_name="education")`
| Column-level.

Calculates the number and share of missing values in the column. | **Required**:
n/a

**Optional:**| | `ColumnRegExpMetric(column_name="relationship", reg_exp=r".*child.*")` | Column-level.

Calculates the number and share of the values that do not match a defined regular expression. | **Required:****Optional:**| @@ -65,6 +66,8 @@ DatasetMissingValuesMetric(missing_values=["", 0, "n/a", -9999, None], replace=T | `ColumnCorrelationsMetric(column_name="education")` | Column-level.

Calculates the correlations between the defined column and all the other columns in the dataset. | **Required:**
`column_name`

**Optional:**
n/a | | `ColumnValueListMetric(column_name="relationship", values=["Husband", "Unmarried"])` | Column-level.

Calculates the number of values in the list / out of the list / not found in a given column. The value list should be specified. | **Required:****Optional:**
n/a | | `ColumnValueRangeMetric(column_name="age", left=10, right=20)` | Column-level.

Calculates the number and share of values in the specified range / out of range in a given column. Plots the distributions. | **Required:** | +| `TextDescriptorsDistribution(column_name=”text”)` | Column-level.

Calculates and visualizes distributions for auto-generated text descriptors (text length, the share of out-of-vocabulary words, etc.) | **Required:** | +| `TextDescriptorsCorrelationMetric(column_name=”text”)` | Column-level.

Calculates and visualizes correlations between auto-generated text descriptors and other columns in the dataset.| **Required:** | # Data Drift @@ -77,6 +80,7 @@ To modify the logic or select a different test, you should set [data drift param | `DatasetDriftMetric()`
| Dataset-level.

Calculates the number and share of drifted features. Returns true/false for the dataset drift at a given threshold (defined by the share of drifting features). Each feature is tested for drift individually using the default algorithm, unless a custom approach is specified.| **Required:**
n/a

**Optional:**[How to set data drift parameters](../customization/options-for-statistical-tests.md). | | `DataDriftTable()` | Dataset-level.

Calculates data drift for all columns in the dataset, or for a defined list of columns. Returns drift detection results for each column and visualizes distributions in a table. Uses the default drift algorithm of test selection, unless a custom approach is specified.| **Required:**
n/a

**Optional:** [How to set data drift parameters](../customization/options-for-statistical-tests.md)| | `ColumnDriftMetric('age')` | Column-level.

Calculates data drift for a defined column. Visualizes distributions. Uses the default-selected test unless a custom is specified. | **Required:**
**Optional:** [How to set data drift parameters](../customization/options-for-statistical-tests.md)|| +| `TextDescriptorsDriftMetric(column_name=”text”)` | Column-level.

Calculates data drift for auto-generated text descriptors and visualizes the distributions of text characteristics. | **Required:**
**Optional:**| # Classification diff --git a/docs/book/reference/data-drift-algorithm.md b/docs/book/reference/data-drift-algorithm.md index d7ca8b0b06..41ac21450e 100644 --- a/docs/book/reference/data-drift-algorithm.md +++ b/docs/book/reference/data-drift-algorithm.md @@ -1,4 +1,4 @@ -In some tests and metrics, Evidently uses the default Data Drift Detection algorithm. It helps detect the distribution drift in the individual features, prediction, or target. +In some tests and metrics, Evidently uses the default Data Drift Detection algorithm. It helps detect the distribution drift in the individual features, prediction, or target. This page describes how the **default** algorithm works. ## How it works @@ -6,7 +6,7 @@ Evidently compares the distributions of the values in a given column (or columns There is a default logic to choosing the appropriate drift test for each column. It is based on: -* column type: categorical or numerical +* column type: categorical, numerical or text data * the number of observations in the reference dataset * the number of unique values in the column (n\_unique) @@ -26,14 +26,23 @@ For **larger data with \> 1000 observations** in the reference dataset: All metrics use a threshold = 0.1 by default. {% hint style="info" %} -**You can always modify this drift detection logic**. You can select any of the statistical tests available in the library (including PSI, K-L divergence, Jensen-Shannon distance, Wasserstein distance, etc.), specify custom thresholds, or pass a custom test. To do that, use the [DataDriftOptions](../customization/options-for-statistical-tests.md) object. +**You can always modify this drift detection logic**. You can select any of the statistical tests available in the library (including PSI, K-L divergence, Jensen-Shannon distance, Wasserstein distance, etc.), specify custom thresholds, or pass a custom test. You can read more about using [data drift parameters and available drift detection methods](../customization/options-for-statistical-tests.md). {% endhint %} +For **text data**: + +* Text content drift using a **domain classifier**. Evidently trains a binary classification model to discriminate between data from reference and current distributions. + +
+Text content drift detection method +The drift score, in this case, is the ROC-AUC score of the domain classifier computed on a validation dataset. The ROC AUC of the created classifier is compared to the ROC AUC of the random classifier at a set percentile (threshold). To ensure the result is statistically meaningful, we repeat the calculation 1000 times with randomly assigned target class probabilities. This produces a distribution with a mean of 0,5. We then take the 95th percentile (default) of this distribution and compare it to the ROC-AUC score of the classifier. If the classifier score is higher, we consider the data drift to be detected. You can also set a different percentile as a parameter. +
+ ## Dataset-level drift The method above calculates drift for each column individually. -To detect dataset-level drift, you can set a rule on top of the individual feature results. For example, you can declare dataset drift if at least 50% of all features drifted or if ⅓ of the most important features drifted. Some of the Evidently tests and presets include such defaults. You can always modify them and set custom parameters. +To detect dataset-level drift, you can set a rule on top of the individual feature results. For example, you can declare dataset drift if at least 50% of all features (columns) drifted or if ⅓ of the most important features drifted. Some of the Evidently tests and presets include such defaults. You can always modify them and set custom parameters. ## Nulls in the input data