Text data support in Reports and TestSuites
Breaking Changes:
- Python 3.6 is no longer supported
Updates:
- New parameter “text_features” was added to ColumnMapping. text_features parameter takes a list with feature names: “column_mapping.text_features=[’text_feature_1’, ‘text_feature_2’, …, ‘text_feature_k’]”
- The following metrics now support text features:
- DatasetSummaryMetric()
- DatasetMissingValuesMetric()
- ColumnSummaryMetric(column_name="name")
- ColumnMissingValuesMetric(column_name="name")
- ColumnRegExpMetric(column_name="name", reg_exp=r".ticket.")
- ConflictPredictionMetric()
- ConflictTargetMetric()
- DatasetCorrelationsMetric()
- DatasetDriftMetric()
- DataDriftTable()
- ColumnDriftMetric(column_name=”name”)
- TargetByFeaturesTable(columns=columns)
- ClassificationQualityByFeatureTable()
- RegressionErrorBiasTable()
- All metric presets now support text features
- All tests based on metrics that support text features also support texts
- The following test presets now support text features:
- NoTargetPerformanceTestPreset
- DataStabilityTestPreset
- DataQualityTestPreset
- DataDriftTestPreset
- Added metric TextDescriptorsDriftMetric for text data
- Added metric TextDescriptorsDistribution for text data
- Added metric TextDescriptorsCorrelationMetric for text data
- Added TextOverviewPreset(column_name=”name”) for text data. The preset includes:
- ColumnSummaryMetric
- TextDescriptorsDistribution
- TextDescriptorsCorrelation
- ColumnDriftMetric (if reference dataset is provided)
- TextDescriptorsDriftMetric (if reference dataset is provided)
Changes:
- Method get_parameters(self) -> Optional[tuple] from Metric(Generic[TResult]) class was updated and became optional. The algorithm to determine metric parameters and create a tuple with metric parameters and its values is updated to cover more parameter types. Since the algorithm has been significantly updated, get_parameters method should be implemented in a custom metric class only if specific behaviour is needed. For most standard scenarios, the default version is sufficient.
- Metric deduplication now includes not only metric calculation parameters but features as well. This helps to deduplicate calculations of the same metrics on top of the same features and as a result, reduce calculation time.
Fixes: