leverage ibis expression for getting readablerelations #2046

sh-rp · 2024-11-10T18:11:28Z

Description

Implements support for using ibis expressions to generate sql statements for relations. To this end this PR implements a new type of readable relation which gets used that wraps an ibis unboundtable expression, but still accesses data the old way.

Todos:

Figure out typing

netlify · 2024-11-10T18:11:42Z

✅ Deploy Preview for dlt-hub-docs ready!

Name	Link
🔨 Latest commit	`acd329f`
🔍 Latest deploy log	https://app.netlify.com/sites/dlt-hub-docs/deploys/6745ad7a7bf17200086e948e
😎 Deploy Preview	https://deploy-preview-2046--dlt-hub-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

sh-rp · 2024-11-11T08:51:47Z

Another note to self, we probably need to run columns names through the normalizer. Or we assume the user will use normalized names as they are present in the schema when building these expressions.

sh-rp · 2024-11-11T08:52:36Z

tests/load/test_read_interfaces.py

+    df = items_table.df()
+    assert len(df.index) == total_records
+
+    df = double_items_table.df()


regular dlt dataset execution methods (df, arrow, iter_arrow...) work everywhere

rudolfix

this is really cool! IMO we should keep ibis as optional dependency (it only works with python 3.10+). so we have two options:

separate relation
enable the proxy behavior if ibis is found, if not we fallback to the current behavior

i'd probably go for the second option. I'm just a little bit worried about the typing

in both cases we should implement a few common expressions we already have in our existing relation (limit, head, column selection).

regarding the schema: column lineage you can do with sqlglot. it makes sense to invest a little bit of time to understand how it is done:

https://github.com/tobymao/sqlglot/blob/main/posts/ast_primer.md (btw. it seems that we are not finding tables in expression in a correct way)
https://sqlglot.com/sqlglot/lineage.html

we can add sqlglot as a regular dependency. and use it everywhere we have sql SELECT statements.

…transpiling sql via postgres in some cases and selecting the right dialect in others

sh-rp · 2024-11-25T20:41:52Z

dlt/destinations/dataset/ibis_relation.py

+
+    def select(self, *columns: str) -> "ReadableIbisRelation":
+        """set which columns will be selected"""
+        return self._proxy_expression_method("select", *columns)  # type: ignore


all the other ibis methods are not defined here yet. we'd need to add them to the interface and raise an error if they are called when the regular relation is returned I think. wdyt?

I think we just need to type dlt.dataset and pipeline.dataset properly (see my other comment). and create a Protocol for ibis expressions to type this class

sh-rp · 2024-11-26T12:58:08Z

@rudolfix the only thing still missing here is proper typing for this ibis wrapper. The "brute force" way of doing it would be to define all the methods of the ibis expression we can make use of on the SupportsReadableRelation Interface, forward those calls in the ibirelation like we do with limit etc, and raise on the regular relation. Alternatively I could do some kind of wildcard typing to make the linter shut up, but it would be less strict. Maybe you can give me your opinion, if you don't mind, then I'll just decide on one.

rudolfix

very cool! tldr;>

we can track schema changes in 90% of cases easily (or at least as good as in upstream object)
we can do better with typing but maybe in separate PR
tests need to be better
can we do without dynamic installation for ibis?

rudolfix · 2024-11-26T15:13:53Z

dlt/common/libs/ibis.py

@@ -119,3 +137,37 @@ def create_ibis_backend(
        con = ibis.duckdb.from_connection(duck)

    return con
+
+
+def create_unbound_ibis_table(


should we move ibis module to helpers? you are importing modules that are not in common. so we really need to refer to ibis in common?

rudolfix · 2024-11-26T15:31:14Z

dlt/destinations/dataset/ibis_relation.py

+    "dlt.destinations.filesystem": "duckdb",  # works
+    "dlt.destinations.dremio": "presto",  # works
+    # NOTE: can we discover the current dialect in sqlalchemy?
+    "dlt.destinations.sqlalchemy": "mysql",  # may work


by running configure() on factory with partial option. OFC we need the dialect part somewhere configured ie in partial connection string (without password)

rudolfix · 2024-11-26T15:35:53Z

dlt/destinations/dataset/ibis_relation.py

+
+    def select(self, *columns: str) -> "ReadableIbisRelation":
+        """set which columns will be selected"""
+        return self._proxy_expression_method("select", *columns)  # type: ignore


I think we just need to type dlt.dataset and pipeline.dataset properly (see my other comment). and create a Protocol for ibis expressions to type this class

rudolfix · 2024-11-26T15:55:01Z

dlt/destinations/dataset/ibis_relation.py

+
+
+# TODO: provide ibis expression typing for the readable relation
+class ReadableIbisRelation(BaseReadableDBAPIRelation):


Is there any protocol or base class in ibis that we can add as a base to get correct typing? otherwise we'd need to add all the methods ourselves.

rudolfix · 2024-11-27T13:28:25Z

dlt/destinations/dataset/factory.py

+from dlt.destinations.dataset.dataset import ReadableDBAPIDataset
+
+
+def dataset(


do you want to improve typing in this PR? if so we could

make SupportsReadableDataset generic where it is parametrized by relation type (bound to SupportsReadableRelation)

use overload on the literal TDatasetType when in auto/ibis you return SupportsReadableDataset parametrized by ReadableIbisRelation, otherwise by just SupportsReadableRelation

we can make (2) dynamic dependent if ibis is present

rudolfix · 2024-11-27T13:34:36Z

dlt/destinations/dataset/ibis_relation.py

+
+    def __getitem__(self, columns: Union[str, Sequence[str]]) -> "ReadableIbisRelation":
+        # casefold column-names
+        columns = [self.sql_client.capabilities.casefold_identifier(col) for col in columns]


this is column getter right? so you will receive ibis Column object(s) right? then maybe select them from the schema like we do in regular relation?

this will cover many cases already at make ibis backward compatible with base expression

rudolfix · 2024-11-27T13:37:39Z

dlt/destinations/dataset/ibis_relation.py

+    def compute_columns_schema(self) -> TTableSchemaColumns:
+        """provide schema columns for the cursor, may be filtered by selected columns"""
+        # TODO: provide column lineage tracing with sqlglot lineage
+        return None


maybe we could try some simple implementation:

we preserve table schema until first expression is made

we preserve table schema for head limit and select method

maybe we also recognize a few exceptions that do not modify schema ie filter and free form filter expressions as here: https://ibis-project.org/tutorials/ibis-for-pandas-users#filtering-rows

rudolfix · 2024-11-27T13:39:06Z

dlt/destinations/dataset/ibis_relation.py

+        if attr is None:
+            raise AttributeError(f"'{self.__class__.__name__}' object has no attribute '{name}'")
+
+        if not callable(attr):


I think ibis supports a column getter with a dot notation. my take would be to

start supporting it upstream

modify the schema accordingly

rudolfix · 2024-11-27T13:40:40Z

tests/load/test_read_interfaces.py

+    # ensure ibis is installed for these tests
+    import subprocess
+
+    subprocess.check_call(


this is hardcore. arent we able to use dependency group for ibis?

rudolfix · 2024-11-27T13:44:19Z

tests/load/test_read_interfaces.py

+    assert len(df.index) == 5
+
+    # check chained expression with join, column selection, order by and limit
+    joined_table = (


you do not need to test all the expressions but you should test various expression forms:

those that return Table (like here)

those that select a column

a single column getter with . notation

a free form filters:

expr = penguins.bill_length_mm > 37.0

materializations like their dataframe and arrow getters

there are also expressions that add and remove columns to schema

expressions that return Expr (not Table - those do some weird things, I'm not sure SQL can be generated for them)?

sh-rp self-assigned this Nov 10, 2024

sh-rp linked an issue Nov 10, 2024 that may be closed by this pull request

ibis support for datasets / destinations #2003

Closed

sh-rp changed the title ~~[Experiment] Leverage ibis expressions & sqlot do to the query building in our Readable Relations~~ [Experiment] Leverage ibis expressions & sqlglot do to the query building in our Readable Relations Nov 10, 2024

sh-rp commented Nov 11, 2024

View reviewed changes

rudolfix reviewed Nov 11, 2024

View reviewed changes

sh-rp mentioned this pull request Nov 13, 2024

Use ibis expressions in ReadableDatasets for better control over what is loaded #2058

Open

sh-rp linked an issue Nov 13, 2024 that may be closed by this pull request

Use ibis expressions in ReadableDatasets for better control over what is loaded #2058

Open

sh-rp force-pushed the exp/ibis_expressions branch from b636df4 to b9cf262 Compare November 13, 2024 15:53

sh-rp changed the title ~~[Experiment] Leverage ibis expressions & sqlglot do to the query building in our Readable Relations~~ leverage ibis expression for getting readablerelations Nov 13, 2024

sh-rp linked an issue Nov 19, 2024 that may be closed by this pull request

Release Dataset Feature #2074

Closed

5 tasks

sh-rp force-pushed the exp/ibis_expressions branch from 390f9a0 to 936db9c Compare November 19, 2024 09:55

sh-rp added 6 commits November 25, 2024 17:24

add ibis dataset in own class for now

8c111e9

make error clearer

f928279

fix some linting and fix broken test

4048b3c

make most destinations work with selecting the right db and catalog, …

830af97

…transpiling sql via postgres in some cases and selecting the right dialect in others

add missing motherduck and sqlalchemy mappings

e289ad1

casefold identifiers for ibis wrapper calss

b6850e8

sh-rp force-pushed the exp/ibis_expressions branch from 7762611 to b6850e8 Compare November 25, 2024 16:27

sh-rp added 2 commits November 25, 2024 18:24

re-organize existing dataset code to prepare ibis relation integration

34323da

integrate ibis relation into existing code

0eb6f58

sh-rp force-pushed the exp/ibis_expressions branch from ff330b6 to 0eb6f58 Compare November 25, 2024 20:24

re-order tests

c06525b

sh-rp commented Nov 25, 2024

View reviewed changes

sh-rp added 4 commits November 26, 2024 11:01

fall back to default dataset if table not in schema

48e4034

make dataset type selectable

1fb17e0

add dataset type selection test and fix bug in tests

f19a98d

update docs for ibis expressions use

acd329f

sh-rp force-pushed the exp/ibis_expressions branch from 1a8b80e to acd329f Compare November 26, 2024 11:14

sh-rp marked this pull request as ready for review November 26, 2024 11:23

sh-rp requested a review from rudolfix November 26, 2024 11:23

rudolfix requested changes Nov 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

leverage ibis expression for getting readablerelations #2046

leverage ibis expression for getting readablerelations #2046

sh-rp commented Nov 10, 2024 •

edited

Loading

netlify bot commented Nov 10, 2024 •

edited

Loading

sh-rp commented Nov 11, 2024

sh-rp Nov 11, 2024

rudolfix left a comment

sh-rp Nov 25, 2024

rudolfix Nov 26, 2024

sh-rp commented Nov 26, 2024

rudolfix left a comment

rudolfix Nov 26, 2024

rudolfix Nov 26, 2024

rudolfix Nov 26, 2024

rudolfix Nov 26, 2024

rudolfix Nov 27, 2024

rudolfix Nov 27, 2024

rudolfix Nov 27, 2024

rudolfix Nov 27, 2024

rudolfix Nov 27, 2024

rudolfix Nov 27, 2024



		# TODO: provide ibis expression typing for the readable relation
		class ReadableIbisRelation(BaseReadableDBAPIRelation):

		from dlt.destinations.dataset.dataset import ReadableDBAPIDataset


		def dataset(

leverage ibis expression for getting readablerelations #2046

Are you sure you want to change the base?

leverage ibis expression for getting readablerelations #2046

Conversation

sh-rp commented Nov 10, 2024 • edited Loading

Description

netlify bot commented Nov 10, 2024 • edited Loading

✅ Deploy Preview for dlt-hub-docs ready!

sh-rp commented Nov 11, 2024

Choose a reason for hiding this comment

rudolfix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sh-rp commented Nov 26, 2024

rudolfix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sh-rp commented Nov 10, 2024 •

edited

Loading

netlify bot commented Nov 10, 2024 •

edited

Loading