Skip to content

Text data support in Reports and TestSuites

Compare
Choose a tag to compare
@emeli-dral emeli-dral released this 26 Jan 17:33

Breaking Changes:

  • Python 3.6 is no longer supported

Updates:

  • New parameter “text_features” was added to ColumnMapping. text_features parameter takes a list with feature names: “column_mapping.text_features=[’text_feature_1’, ‘text_feature_2’, …, ‘text_feature_k’]”
  • The following metrics now support text features:
    • DatasetSummaryMetric()
    • DatasetMissingValuesMetric()
    • ColumnSummaryMetric(column_name="name")
    • ColumnMissingValuesMetric(column_name="name")
    • ColumnRegExpMetric(column_name="name", reg_exp=r".ticket.")
    • ConflictPredictionMetric()
    • ConflictTargetMetric()
    • DatasetCorrelationsMetric()
    • DatasetDriftMetric()
    • DataDriftTable()
    • ColumnDriftMetric(column_name=”name”)
    • TargetByFeaturesTable(columns=columns)
    • ClassificationQualityByFeatureTable()
    • RegressionErrorBiasTable()
  • All metric presets now support text features
  • All tests based on metrics that support text features also support texts
  • The following test presets now support text features:
    • NoTargetPerformanceTestPreset
    • DataStabilityTestPreset
    • DataQualityTestPreset
    • DataDriftTestPreset
  • Added metric TextDescriptorsDriftMetric for text data
  • Added metric TextDescriptorsDistribution for text data
  • Added metric TextDescriptorsCorrelationMetric for text data
  • Added TextOverviewPreset(column_name=”name”) for text data. The preset includes:
    • ColumnSummaryMetric
    • TextDescriptorsDistribution
    • TextDescriptorsCorrelation
    • ColumnDriftMetric (if reference dataset is provided)
    • TextDescriptorsDriftMetric (if reference dataset is provided)

Changes:

  • Method get_parameters(self) -> Optional[tuple] from Metric(Generic[TResult]) class was updated and became optional. The algorithm to determine metric parameters and create a tuple with metric parameters and its values is updated to cover more parameter types. Since the algorithm has been significantly updated, get_parameters method should be implemented in a custom metric class only if specific behaviour is needed. For most standard scenarios, the default version is sufficient.
  • Metric deduplication now includes not only metric calculation parameters but features as well. This helps to deduplicate calculations of the same metrics on top of the same features and as a result, reduce calculation time.

Fixes: