Skip to content

v0.140.0

Compare
Choose a tag to compare
@tlento tlento released this 26 Jan 23:10
· 1657 commits to main since this release
a3af95b

Highlights

We've added a number of new features, including:

  • Derived metrics
  • Support for joining against versioned dimensions (Slowly Changing Dimensions, or SCD)
  • Percentile measures
  • dbt Cloud support

Breaking Changes

  • Result layout is changing from one row per metric/null dimension valued pair to one row per null dimension value regardless of number of metrics in the query. This only affects queries for multiple metrics where the requested dimensions contain null values. See the description on the relevant PR for more detailed information and an example illustrating how the output will change.
  • Updates to the required SqlClient protocol could cause typechecking failures for users injecting a custom SqlClient implementation into the MetricFlowClient
  • Version minimum changes in SQLAlchemy and snowflake-sqlalchemy could cause dependency conflicts when installed in python environments with libraries requiring an older version of either of these dependencies.

New Feature Details

Derived Metrics (@WilliamDee)

MetricFlow now enables the user to reference metrics in the definition of a metric - an expression of metrics. This feature will further simplify and DRY out code by removing the need to create pre-aggregated subqueries in the data source definition or duplicated measure definitions. For example:

metric:
  name: net_sales_per_user
  owners:
    - [email protected]
  type: derived
  type_params:
    expr: gross_sales - cogs / active_users
    metrics:
      # these are all metrics (can be a derived metric, meaning building a derived metric with derived metrics)
      - name: gross_sales
      - name: cogs
      - name: users
        constraint: is_active # Optional additional constraint
        alias: active_users # Optional alias to use in the expr

Versioned dimension (SCD Type II) join support (@tlento)

MetricFlow now supports versioned dimension (Slowly Changing Dimension (SCD) Type II) joins!
Given an SCD Type II table with an entity key and dimension values with an appropriate start and end timestamp column, you can now fetch the slowly changing dimension from an SCD Type II table through extra configurations in your data source.
For specific details and examples, please see the documentation on slowly changing dimensions.

Percentile measures (@kyleli626)

MetricFlow now supports percentile calculations in measures! Simply specify percentile for the agg type in your data sources and input the desired percentile within agg_params as seen in the documentation for configuring measures. This feature also provides a median aggregation type as a convenience around the appropriate percentile configuration. For example:

measures:
  - name: p99_transaction_value
    description: The 99th percentile transaction value
    expr: transaction_amount_usd
    agg: percentile
    agg_params:
      percentile: .99
      # False will calculate the discrete percentile and True will calculate the continuous percentile
      use_discrete_percentile: False
    create_metric: True
  - name: median_transaction_value
    description: The median transaction value
    expr: transaction_amount_usd
    agg: median
    create_metric: True

Note that MetricFlow allows for choosing between continuous or discrete percentiles via the use_discrete_percentile parameter.

dbt Cloud support (@QMalcolm)

MetricFlow is now available to use with dbt Cloud. Instead of requiring additional MetricFlow config yamls to enable dbt in your MetricFlow model (as in the previous dbt metrics release), the MetricFlow CLI can work off of a semantic model built from dbt Cloud. To use, simply follow these two steps:

  1. Install our dbt cloud package (e.g., by running pip install "metricflow[dbt-cloud]")
  2. Add the following to .metricflow/config.yaml:
dbt_cloud_job_id: <job_id>
# The following service token MUST have dbt Metadata API access for the project containing the specified job
dbt_cloud_service_token: <dbt_service_token>

Other changes

Added

  • Support for querying metrics without grouping by dimensions (@WilliamDee)
  • A cancel_request API in the SQL client for canceling running queries, with the necessary support for SQL isolation levels and asynchronous query submission (@plypaul)
  • Support for passing in query tags for Snowflake queries (@plypaul)
  • DataFlowPlan optimization to reduce source table scans (@plypaul)
  • Internal API to enable developers to fetch joinable data source targets from an input data source (@courtneyholcomb)

Updated

  • Improved readability of validation error messages (@QMalcolm)
  • Made Postgres engine tests merge-blocking in CI to reduce cycle time on detecting engine-specific errors (@tlento)
  • Updated poetry and python versions in CI to align with our build process and verify all supported Python versions (@tlento)
  • Eliminated data source level primary time dimension requirement in cases where all measures have an aggregation time dimension set (@QMalcolm)
  • Extended support for typed values for bind parameters (@courtneyholcolm)
  • Removed the optional Python levenshtein package from build dependencies in order to streamline package version requirements (@plypaul)
  • Consolidated join validation logic to eliminate code duplication and speed development (@plypaul)
  • Factored join building logic out of DataflowToSqlQueryPlanBuilder to streamline development (@tlento)
  • Improved visibility on underlying errors thrown by sql client requests (@courtneyholcomb)
  • Updated SQLAlchemy and snowflake-sqlalchemy minimum version requirements to resolve a version incompatibility introduced with SQLAlchemy 1.4.42 (@tlento)
  • Added CI coverage for Databricks SQL Warehouse execution environments (@tlento)

Fixed

  • Resolved error encountered in Databricks whenever table rename methods were invoked (@courtneyholcomb)
  • Fixed bug with warehouse measure validation where an error would be inappropriately thrown when users with measure-specific agg_time_dimension configurations attempted to run the full validation suite (@WilliamDee)
  • Issue with parsing explain output for Databricks SQL warehouse configurations (@courtneyholcomb)
  • Floating point comparison errors in CI tests (@tlento)
  • Issue with incomplete time range constraint validation that could result in invalid queries(@plypaul)
  • Resolved GitHub upgrade warnings on use of deprecated APIs and node.js build versions (@tlento)
  • Resolved python-levenshtein optimization warning on CLI startup (@jzhu13)
  • Resolved SQLAlchemy warning about the impending deprecation of the engine.table_names method (@Jstein77)
  • Improved error message for queries with time range constraints which were too narrow for the chosen time granularity (@kyleli626)
  • Eliminate SQL rendering error in BigQuery which would intermittently produce invalid GROUP BY specifications (@tlento)