diff --git a/README.md b/README.md index 40ed1730..6a100826 100644 --- a/README.md +++ b/README.md @@ -6,9 +6,9 @@ [![Downloads](https://static.pepy.tech/badge/semantic-link-labs)](https://pepy.tech/project/semantic-link-labs) -This is a python library intended to be used in [Microsoft Fabric notebooks](https://learn.microsoft.com/fabric/data-engineering/how-to-use-notebook). This library was originally intended to contain functions used for [migrating semantic models to Direct Lake mode](https://github.com/microsoft/semantic-link-labs?tab=readme-ov-file#direct-lake-migration-1). However, it quickly became apparent that functions within such a library could support many other useful activities in the realm of semantic models, reports, lakehouses and really anything Fabric-related. As such, this library contains a variety of functions ranging from running [Vertipaq Analyzer](https://github.com/microsoft/semantic-link-labs?tab=readme-ov-file#vertipaq_analyzer) or the [Best Practice Analyzer](https://github.com/microsoft/semantic-link-labs?tab=readme-ov-file#run_model_bpa) against a semantic model to seeing if any [lakehouse tables hit Direct Lake guardrails](https://github.com/microsoft/semantic-link-labs?tab=readme-ov-file#get_lakehouse_tables) or accessing the [Tabular Object Model](https://github.com/microsoft/semantic-link-labs/#tabular-object-model-tom) and more! +This is a python library intended to be used in [Microsoft Fabric notebooks](https://learn.microsoft.com/fabric/data-engineering/how-to-use-notebook). This library was originally intended to contain functions used for [migrating semantic models to Direct Lake mode](https://github.com/microsoft/semantic-link-labs?tab=readme-ov-file#direct-lake-migration). However, it quickly became apparent that functions within such a library could support many other useful activities in the realm of semantic models, reports, lakehouses and really anything Fabric-related. As such, this library contains a variety of functions ranging from running [Vertipaq Analyzer](https://semantic-link-labs.readthedocs.io/en/0.4.2/sempy_labs.html#sempy_labs.import_vertipaq_analyzer) or the [Best Practice Analyzer](https://semantic-link-labs.readthedocs.io/en/0.4.2/sempy_labs.html#sempy_labs.run_model_bpa) against a semantic model to seeing if any [lakehouse tables hit Direct Lake guardrails](https://semantic-link-labs.readthedocs.io/en/0.4.2/sempy_labs.lakehouse.html#sempy_labs.lakehouse.get_lakehouse_tables) or accessing the [Tabular Object Model](https://semantic-link-labs.readthedocs.io/en/0.4.2/sempy_labs.tom.html) and more! -Instructions for migrating import/DirectQuery semantic models to Direct Lake mode can be found [here](https://github.com/microsoft/semantic-link-labs?tab=readme-ov-file#direct-lake-migration-1). +Instructions for migrating import/DirectQuery semantic models to Direct Lake mode can be found [here](https://github.com/microsoft/semantic-link-labs?tab=readme-ov-file#direct-lake-migration). If you encounter any issues, please [raise a bug](https://github.com/microsoft/semantic-link-labs/issues/new?assignees=&labels=&projects=&template=bug_report.md&title=). diff --git a/notebooks/Migration to Direct Lake.ipynb b/notebooks/Migration to Direct Lake.ipynb index 5afe11ef..c567b215 100644 --- a/notebooks/Migration to Direct Lake.ipynb +++ b/notebooks/Migration to Direct Lake.ipynb @@ -1 +1 @@ -{"cells":[{"cell_type":"markdown","id":"5c27dfd1-4fe0-4a97-92e6-ddf78889aa93","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["### Install the latest .whl package\n","\n","Check [here](https://pypi.org/project/semantic-link-labs/) to see the latest version."]},{"cell_type":"code","execution_count":null,"id":"d5cae9db-cef9-48a8-a351-9c5fcc99645c","metadata":{"jupyter":{"outputs_hidden":true,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["%pip install semantic-link-labs"]},{"cell_type":"markdown","id":"969a29bf","metadata":{},"source":["### Import the library and set initial parameters"]},{"cell_type":"code","execution_count":null,"id":"29c923f8","metadata":{},"outputs":[],"source":["import sempy_labs as labs\n","from sempy_labs import migration, report, directlake\n","\n","dataset_name = '' #Enter the import/DQ semantic model name\n","workspace_name = None #Enter the workspace of the import/DQ semantic model. It set to none it will use the current workspace.\n","new_dataset_name = '' #Enter the new Direct Lake semantic model name\n","new_dataset_workspace_name = None #Enter the workspace where the Direct Lake model will be created. If set to None it will use the current workspace.\n","lakehouse_name = None #Enter the lakehouse to be used for the Direct Lake model. If set to None it will use the lakehouse attached to the notebook.\n","lakehouse_workspace_name = None #Enter the lakehouse workspace. If set to None it will use the new_dataset_workspace_name."]},{"cell_type":"markdown","id":"5a3fe6e8-b8aa-4447-812b-7931831e07fe","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["### Create the [Power Query Template](https://learn.microsoft.com/power-query/power-query-template) file\n","\n","This encapsulates all of the semantic model's Power Query logic into a single file."]},{"cell_type":"code","execution_count":null,"id":"cde43b47-4ecc-46ae-9125-9674819c7eab","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["migration.create_pqt_file(dataset = dataset_name, workspace = workspace_name)"]},{"cell_type":"markdown","id":"bf945d07-544c-4934-b7a6-cfdb90ca725e","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["### Import the Power Query Template to Dataflows Gen2\n","\n","- Open the [OneLake file explorer](https://www.microsoft.com/download/details.aspx?id=105222) and sync your files (right click -> Sync from OneLake)\n","\n","- Navigate to your lakehouse. From this window, create a new Dataflows Gen2 and import the Power Query Template file from OneLake (OneLake -> Workspace -> Lakehouse -> Files...), and publish the Dataflows Gen2.\n","\n","
\n","Important!: Make sure to create the Dataflows Gen2 from within the lakehouse window. That will ensure that all the tables automatically map to that lakehouse as the destination. Otherwise, you will have to manually map each table to its destination individually.\n","
"]},{"cell_type":"markdown","id":"9975db7d","metadata":{},"source":["### Create the Direct Lake model based on the import/DQ semantic model\n","\n","Calculated columns are not migrated to the Direct Lake model as they are not supported in Direct Lake mode."]},{"cell_type":"code","execution_count":null,"id":"0a3616b5-566e-414e-a225-fb850d6418dc","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["import time\n","labs.create_blank_semantic_model(dataset = new_dataset_name, workspace = new_dataset_workspace_name)\n","\n","time.sleep(2)\n","\n","migration.migrate_calc_tables_to_lakehouse(\n"," dataset = dataset_name,\n"," new_dataset = new_dataset_name,\n"," workspace = workspace_name,\n"," new_dataset_workspace = new_dataset_workspace_name,\n"," lakehouse = lakehouse_name,\n"," lakehouse_workspace = lakehouse_workspace_name)\n","migration.migrate_tables_columns_to_semantic_model(\n"," dataset = dataset_name,\n"," new_dataset = new_dataset_name,\n"," workspace = workspace_name,\n"," new_dataset_workspace = new_dataset_workspace_name,\n"," lakehouse = lakehouse_name,\n"," lakehouse_workspace = lakehouse_workspace_name)\n","migration.migrate_calc_tables_to_semantic_model(\n"," dataset = dataset_name,\n"," new_dataset = new_dataset_name,\n"," workspace = workspace_name,\n"," new_dataset_workspace = new_dataset_workspace_name,\n"," lakehouse = lakehouse_name,\n"," lakehouse_workspace = lakehouse_workspace_name)\n","migration.migrate_model_objects_to_semantic_model(\n"," dataset = dataset_name,\n"," new_dataset = new_dataset_name,\n"," workspace = workspace_name,\n"," new_dataset_workspace = new_dataset_workspace_name)\n","migration.migrate_field_parameters(\n"," dataset = dataset_name,\n"," new_dataset = new_dataset_name,\n"," workspace = workspace_name,\n"," new_dataset_workspace = new_dataset_workspace_name)\n","time.sleep(2)\n","migration.refresh_semantic_model(dataset = new_dataset_name, workspace = new_dataset_workspace_name)\n","migration.refresh_calc_tables(dataset = new_dataset_name, workspace = new_dataset_workspace_name)\n","migration.refresh_semantic_model(dataset = new_dataset_name, workspace = new_dataset_workspace_name)"]},{"cell_type":"markdown","id":"bb98bb13","metadata":{},"source":["### Show migrated/unmigrated objects"]},{"cell_type":"code","execution_count":null,"id":"5db2f22c","metadata":{},"outputs":[],"source":["migration.migration_validation(\n"," dataset = dataset_name,\n"," new_dataset = new_dataset_name, \n"," workspace = workspace_name, \n"," new_dataset_workspace = new_dataset_workspace_name)"]},{"cell_type":"markdown","id":"fa244e9d-87c2-4a66-a7e0-be539a0ac7de","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["### Rebind all reports using the old semantic model to the new Direct Lake semantic model"]},{"cell_type":"code","execution_count":null,"id":"d4e867cc","metadata":{},"outputs":[],"source":["report.report_rebind_all(\n"," dataset = dataset_name,\n"," dataset_workspace = workspace_name,\n"," new_dataset = new_dataset_name,\n"," new_dataset_workpace = new_dataset_workspace_name,\n"," report_workspace = workspace_name)"]},{"cell_type":"markdown","id":"3365d20d","metadata":{},"source":["### Rebind reports one-by-one (optional)"]},{"cell_type":"code","execution_count":null,"id":"056b7180-d7ac-492c-87e7-ac7d0e4bb929","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["report_name = '' # Enter report name which you want to rebind to the new Direct Lake model\n","\n","report.report_rebind(\n"," report = report_name,\n"," dataset = new_dataset_name,\n"," report_workspace=workspace_name,\n"," dataset_workspace = new_dataset_workspace_name)"]},{"cell_type":"markdown","id":"526f2327","metadata":{},"source":["### Show unsupported objects"]},{"cell_type":"code","execution_count":null,"id":"a47376d7","metadata":{},"outputs":[],"source":["dfT, dfC, dfR = directlake.show_unsupported_direct_lake_objects(dataset = dataset_name, workspace = workspace_name)\n","\n","print('Calculated Tables are not supported...')\n","display(dfT)\n","print(\"Learn more about Direct Lake limitations here: https://learn.microsoft.com/power-bi/enterprise/directlake-overview#known-issues-and-limitations\")\n","print('Calculated columns are not supported. Columns of binary data type are not supported.')\n","display(dfC)\n","print('Columns used for relationship cannot be of data type datetime and they also must be of the same data type.')\n","display(dfR)"]},{"cell_type":"markdown","id":"ed08ba4c","metadata":{},"source":["### Schema check between semantic model tables/columns and lakehouse tables/columns\n","\n","This will list any tables/columns which are in the new semantic model but do not exist in the lakehouse"]},{"cell_type":"code","execution_count":null,"id":"03889ba4","metadata":{},"outputs":[],"source":["directlake.direct_lake_schema_compare(dataset = new_dataset_name, workspace = new_dataset_workspace_name)"]},{"cell_type":"markdown","id":"2229963b","metadata":{},"source":["### Show calculated tables which have been migrated to the Direct Lake semantic model as regular tables"]},{"cell_type":"code","execution_count":null,"id":"dd537d90","metadata":{},"outputs":[],"source":["directlake.list_direct_lake_model_calc_tables(dataset = new_dataset_name, workspace = new_dataset_workspace_name)"]}],"metadata":{"kernel_info":{"name":"synapse_pyspark"},"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"name":"python","version":"3.12.3"},"microsoft":{"language":"python"},"nteract":{"version":"nteract-front-end@1.0.0"},"spark_compute":{"compute_id":"/trident/default"},"synapse_widget":{"state":{},"version":"0.1"},"widgets":{}},"nbformat":4,"nbformat_minor":5} +{"cells":[{"cell_type":"markdown","id":"5c27dfd1-4fe0-4a97-92e6-ddf78889aa93","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["### Install the latest .whl package\n","\n","Check [here](https://pypi.org/project/semantic-link-labs/) to see the latest version."]},{"cell_type":"code","execution_count":null,"id":"d5cae9db-cef9-48a8-a351-9c5fcc99645c","metadata":{"jupyter":{"outputs_hidden":true,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["%pip install semantic-link-labs"]},{"cell_type":"markdown","id":"969a29bf","metadata":{},"source":["### Import the library and set initial parameters"]},{"cell_type":"code","execution_count":null,"id":"29c923f8","metadata":{},"outputs":[],"source":["import sempy_labs as labs\n","from sempy_labs import migration, report, directlake\n","\n","dataset_name = '' #Enter the import/DQ semantic model name\n","workspace_name = None #Enter the workspace of the import/DQ semantic model. It set to none it will use the current workspace.\n","new_dataset_name = '' #Enter the new Direct Lake semantic model name\n","new_dataset_workspace_name = None #Enter the workspace where the Direct Lake model will be created. If set to None it will use the current workspace.\n","lakehouse_name = None #Enter the lakehouse to be used for the Direct Lake model. If set to None it will use the lakehouse attached to the notebook.\n","lakehouse_workspace_name = None #Enter the lakehouse workspace. If set to None it will use the new_dataset_workspace_name."]},{"cell_type":"markdown","id":"5a3fe6e8-b8aa-4447-812b-7931831e07fe","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["### Create the [Power Query Template](https://learn.microsoft.com/power-query/power-query-template) file\n","\n","This encapsulates all of the semantic model's Power Query logic into a single file."]},{"cell_type":"code","execution_count":null,"id":"cde43b47-4ecc-46ae-9125-9674819c7eab","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["migration.create_pqt_file(dataset = dataset_name, workspace = workspace_name)"]},{"cell_type":"markdown","id":"bf945d07-544c-4934-b7a6-cfdb90ca725e","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["### Import the Power Query Template to Dataflows Gen2\n","\n","- Open the [OneLake file explorer](https://www.microsoft.com/download/details.aspx?id=105222) and sync your files (right click -> Sync from OneLake)\n","\n","- Navigate to your lakehouse. From this window, create a new Dataflows Gen2 and import the Power Query Template file from OneLake (OneLake -> Workspace -> Lakehouse -> Files...), and publish the Dataflows Gen2.\n","\n","
\n","Important!: Make sure to create the Dataflows Gen2 from within the lakehouse window. That will ensure that all the tables automatically map to that lakehouse as the destination. Otherwise, you will have to manually map each table to its destination individually.\n","
"]},{"cell_type":"markdown","id":"9975db7d","metadata":{},"source":["### Create the Direct Lake model based on the import/DQ semantic model\n","\n","Calculated columns are not migrated to the Direct Lake model as they are not supported in Direct Lake mode."]},{"cell_type":"code","execution_count":null,"id":"0a3616b5-566e-414e-a225-fb850d6418dc","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["import time\n","labs.create_blank_semantic_model(dataset = new_dataset_name, workspace = new_dataset_workspace_name)\n","\n","time.sleep(2)\n","\n","migration.migrate_calc_tables_to_lakehouse(\n"," dataset = dataset_name,\n"," new_dataset = new_dataset_name,\n"," workspace = workspace_name,\n"," new_dataset_workspace = new_dataset_workspace_name,\n"," lakehouse = lakehouse_name,\n"," lakehouse_workspace = lakehouse_workspace_name)\n","migration.migrate_tables_columns_to_semantic_model(\n"," dataset = dataset_name,\n"," new_dataset = new_dataset_name,\n"," workspace = workspace_name,\n"," new_dataset_workspace = new_dataset_workspace_name,\n"," lakehouse = lakehouse_name,\n"," lakehouse_workspace = lakehouse_workspace_name)\n","migration.migrate_calc_tables_to_semantic_model(\n"," dataset = dataset_name,\n"," new_dataset = new_dataset_name,\n"," workspace = workspace_name,\n"," new_dataset_workspace = new_dataset_workspace_name,\n"," lakehouse = lakehouse_name,\n"," lakehouse_workspace = lakehouse_workspace_name)\n","migration.migrate_model_objects_to_semantic_model(\n"," dataset = dataset_name,\n"," new_dataset = new_dataset_name,\n"," workspace = workspace_name,\n"," new_dataset_workspace = new_dataset_workspace_name)\n","migration.migrate_field_parameters(\n"," dataset = dataset_name,\n"," new_dataset = new_dataset_name,\n"," workspace = workspace_name,\n"," new_dataset_workspace = new_dataset_workspace_name)\n","time.sleep(2)\n","labs.refresh_semantic_model(dataset = new_dataset_name, workspace = new_dataset_workspace_name)\n","migration.refresh_calc_tables(dataset = new_dataset_name, workspace = new_dataset_workspace_name)\n","labs.refresh_semantic_model(dataset = new_dataset_name, workspace = new_dataset_workspace_name)"]},{"cell_type":"markdown","id":"bb98bb13","metadata":{},"source":["### Show migrated/unmigrated objects"]},{"cell_type":"code","execution_count":null,"id":"5db2f22c","metadata":{},"outputs":[],"source":["migration.migration_validation(\n"," dataset = dataset_name,\n"," new_dataset = new_dataset_name, \n"," workspace = workspace_name, \n"," new_dataset_workspace = new_dataset_workspace_name)"]},{"cell_type":"markdown","id":"fa244e9d-87c2-4a66-a7e0-be539a0ac7de","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["### Rebind all reports using the old semantic model to the new Direct Lake semantic model"]},{"cell_type":"code","execution_count":null,"id":"d4e867cc","metadata":{},"outputs":[],"source":["report.report_rebind_all(\n"," dataset = dataset_name,\n"," dataset_workspace = workspace_name,\n"," new_dataset = new_dataset_name,\n"," new_dataset_workpace = new_dataset_workspace_name,\n"," report_workspace = workspace_name)"]},{"cell_type":"markdown","id":"3365d20d","metadata":{},"source":["### Rebind reports one-by-one (optional)"]},{"cell_type":"code","execution_count":null,"id":"056b7180-d7ac-492c-87e7-ac7d0e4bb929","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["report_name = '' # Enter report name which you want to rebind to the new Direct Lake model\n","\n","report.report_rebind(\n"," report = report_name,\n"," dataset = new_dataset_name,\n"," report_workspace=workspace_name,\n"," dataset_workspace = new_dataset_workspace_name)"]},{"cell_type":"markdown","id":"526f2327","metadata":{},"source":["### Show unsupported objects"]},{"cell_type":"code","execution_count":null,"id":"a47376d7","metadata":{},"outputs":[],"source":["dfT, dfC, dfR = directlake.show_unsupported_direct_lake_objects(dataset = dataset_name, workspace = workspace_name)\n","\n","print('Calculated Tables are not supported...')\n","display(dfT)\n","print(\"Learn more about Direct Lake limitations here: https://learn.microsoft.com/power-bi/enterprise/directlake-overview#known-issues-and-limitations\")\n","print('Calculated columns are not supported. Columns of binary data type are not supported.')\n","display(dfC)\n","print('Columns used for relationship cannot be of data type datetime and they also must be of the same data type.')\n","display(dfR)"]},{"cell_type":"markdown","id":"ed08ba4c","metadata":{},"source":["### Schema check between semantic model tables/columns and lakehouse tables/columns\n","\n","This will list any tables/columns which are in the new semantic model but do not exist in the lakehouse"]},{"cell_type":"code","execution_count":null,"id":"03889ba4","metadata":{},"outputs":[],"source":["directlake.direct_lake_schema_compare(dataset = new_dataset_name, workspace = new_dataset_workspace_name)"]},{"cell_type":"markdown","id":"2229963b","metadata":{},"source":["### Show calculated tables which have been migrated to the Direct Lake semantic model as regular tables"]},{"cell_type":"code","execution_count":null,"id":"dd537d90","metadata":{},"outputs":[],"source":["directlake.list_direct_lake_model_calc_tables(dataset = new_dataset_name, workspace = new_dataset_workspace_name)"]}],"metadata":{"kernel_info":{"name":"synapse_pyspark"},"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"name":"python","version":"3.12.3"},"microsoft":{"language":"python"},"nteract":{"version":"nteract-front-end@1.0.0"},"spark_compute":{"compute_id":"/trident/default"},"synapse_widget":{"state":{},"version":"0.1"},"widgets":{}},"nbformat":4,"nbformat_minor":5}