Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSSION] More SqlLogicTest test coverage for queries, including join queries #13470

Open
Tracked by #13525 ...
findepi opened this issue Nov 18, 2024 · 18 comments
Open
Tracked by #13525 ...

Comments

@findepi
Copy link
Member

findepi commented Nov 18, 2024

During SDF's upgrade to DataFusion 43 we found a bug #13425 . This was possible thanks to extensive test coverage for certain query shapes that we have internally, which is good. The bad part is that the bug could be caught at the PR stage, or before the release, should similar tests exist in DataFusion project.

Would it be useful if there were more SLT query tests, including the ones that allowed to catch #13425?

cc @alamb @schulte-lukas @gliga

@alamb
Copy link
Contributor

alamb commented Nov 19, 2024

Yes, 100% I think adding additional test coverage in all areas would be super helpful

We / I also regularly hit bugs that are caught by InfluxData's test suite during upgrades of DataFusion but were not caught during the DataFusion PR

Other Areas that would benefit from improved testing from my experience:

  1. Metadata handling (like attaching field level metadata, etc)
  2. UNION coverage (which we use extensively)
  3. Dictionary handling (which we also use a lot)

@Dandandan
Copy link
Contributor

Dandandan commented Nov 22, 2024

One way to extend join coverage is to have some more "answer checking" for different benchmarks, such as:

(Earlier doing this for TPC-H found many issues in joins handling, decimal issues, etc.)

We can do the same for the imdb join order benchmark, clickbench, etc.

@Omega359
Copy link
Contributor

I have been looking over the sqllogictests this weekend and I think first of all that they need a good reorganization to clean them up, split them into smaller chunks testing just specific portions of functionality and to reduce the size of some of the files.

I also have started looking at other projects's tests to see how they've done their testing. I am unsure about actually incorporating some of those tests directly into DF (duckdb for example, MIT license vs Apache) but they could be used as a basis for seeing where DF's test suite is falling short and expanding the test suite some more.

@Omega359
Copy link
Contributor

I came across this paper which I found interesting for join testing. It won't work for DF yet since DF doesn't support query hints (at least as best as I know).

@schulte-lukas
Copy link

I would also love to see a separate test-runner (maybe nightly) of the official SQLLite SLTs (where applicable).

At this point DF seems to be able to pass the majority without modification (except delete). Since they are (a) pre-vetted by sqllite and (b) test foundational capability it would make sense to me that they are included in datafusion

@Omega359
Copy link
Contributor

Omega359 commented Dec 1, 2024

I would also love to see a separate test-runner (maybe nightly) of the official SQLLite SLTs (where applicable).

At this point DF seems to be able to pass the majority without modification (except delete). Since they are (a) pre-vetted by sqllite and (b) test foundational capability it would make sense to me that they are included in datafusion

I actually am part way through porting those slt's to something that df's slt test runner can run

@alamb
Copy link
Contributor

alamb commented Dec 2, 2024

Can't wait to see what happens here

@comphead
Copy link
Contributor

comphead commented Dec 2, 2024

We / I also regularly hit bugs that are caught by InfluxData's test suite during upgrades of DataFusion but were not caught during the DataFusion PR

I'm wondering what are test strategies in Influx? is it something we can reuse in DF as well to avoid regression

@alamb
Copy link
Contributor

alamb commented Dec 2, 2024

I'm wondering what are test strategies in Influx? is it something we can reuse in DF as well to avoid regression

We have a bunch of our own basic tests for end to end functionality in our system (some is SQL based, some is "InfluxQL" based)

We have often seen issues where features we use heavily (metadata, and UnionExec, for example) are not as heavily covered in the existing DataFusion test suites 🤔

@comphead
Copy link
Contributor

comphead commented Dec 2, 2024

I'm wondering what are test strategies in Influx? is it something we can reuse in DF as well to avoid regression

We have a bunch of our own basic tests for end to end functionality in our system (some is SQL based, some is "InfluxQL" based)

We have often seen issues where features we use heavily (metadata, and UnionExec, for example) are not as heavily covered in the existing DataFusion test suites 🤔

is it real world use cases ?

if its not I found something used for MariaDB and get investigate it https://mariadb.com/kb/en/random-query-generator-tests/

@Omega359
Copy link
Contributor

Omega359 commented Dec 2, 2024

After the slt work I was thinking of taking a look at the work that @2010YOUY01 has done with sqlancer to see if there are areas that I could add into it. I noticed that the datafusion-contrib for that was merged into sqlancer main a few months back - https://github.com/sqlancer/sqlancer/tree/main/src/sqlancer/datafusion.

sqlancer has quite a few options wrt test oracles that might be useful to look at. The one I was looking at (DQP) I think would need support for query hints (or corresponding set options) to function but if we could get it to work would allow a lot of join testing to happen.

@alamb
Copy link
Contributor

alamb commented Dec 2, 2024

I agree -- I think using sqlancer is a great idea -- I suspect it simply needs someone to maintain and file tickets

@Omega359
Copy link
Contributor

Omega359 commented Dec 3, 2024

I would also love to see a separate test-runner (maybe nightly) of the official SQLLite SLTs (where applicable).
At this point DF seems to be able to pass the majority without modification (except delete). Since they are (a) pre-vetted by sqllite and (b) test foundational capability it would make sense to me that they are included in datafusion

I actually am part way through porting those slt's to something that df's slt test runner can run

I've gotten many of them to run so far after removing or changing troublesome features (create index, valuesort, etc). There are differences with sqllite vs DataFusion wrt how sqllite handles integers vs real numbers that I'm still working through slowly. I do see a number of failures that I believe are real or at least point to limitations of DF. I'm hoping to have a full 'clean' run sometime in the next week that only fails on likely valid queries.

Here is an example of something that is currently failing:

CREATE TABLE tab0(pk INTEGER PRIMARY KEY, col0 INTEGER, col1 FLOAT, col2 TEXT, col3 INTEGER, col4 FLOAT, col5 TEXT);
INSERT INTO tab0 VALUES(0,1164,1149.64,'eyxra',1147,1174.8,'fpigl');
INSERT INTO tab0 VALUES(1,1165,1150.72,'zqxbf',1148,1175.52,'zmukh');
INSERT INTO tab0 VALUES(2,1166,1154.12,'itglg',1149,1176.1,'cfmkm');
INSERT INTO tab0 VALUES(3,1167,1156.44,'xmaca',1150,1177.85,'ggpiv');
INSERT INTO tab0 VALUES(4,1168,1157.98,'lewrx',1151,1178.59,'hjscv');
INSERT INTO tab0 VALUES(5,1169,1158.81,'enubf',1152,1180.13,'apmjh');
INSERT INTO tab0 VALUES(6,1171,1159.96,'olnov',1153,1181.89,'smeyx');
INSERT INTO tab0 VALUES(7,1172,1160.51,'yngfz',1154,1182.90,'vakuy');
INSERT INTO tab0 VALUES(8,1174,1161.72,'awjqq',1155,1183.93,'qzqaw');
SELECT pk, col0 FROM tab0 WHERE col0 IN (SELECT col3 FROM tab0 WHERE col4 > 73.18) ORDER BY 2 DESC;

SanityCheckPlan
caused by
Error during planning: Plan: ["SortPreservingMergeExec: [col0@1 DESC]", "  CoalesceBatchesExec: target_batch_size=8192", "    HashJoinExec: mode=Partitioned, join_type=RightSemi, on=[(col3@0, col0@1)]", "      CoalesceBatchesExec: target_batch_size=8192", "        RepartitionExec: partitioning=Hash([col3@0], 20), input_partitions=20", "          RepartitionExec: partitioning=RoundRobinBatch(20), input_partitions=1", "            CoalesceBatchesExec: target_batch_size=8192", "              FilterExec: CAST(col4@1 AS Float64) > 73.18, projection=[col3@0]", "                MemoryExec: partitions=1, partition_sizes=[9]", "      SortExec: expr=[pk@0 DESC], preserve_partitioning=[true]", "        CoalesceBatchesExec: target_batch_size=8192", "          RepartitionExec: partitioning=Hash([col0@1], 20), input_partitions=1", "            MemoryExec: partitions=1, partition_sizes=[9]"] does not satisfy order requirements: [col0@1 DESC]. Child-0 order: [[pk@0 DESC]]

@gliga
Copy link

gliga commented Dec 4, 2024

It would be great if there was a common infrastructure across
databases for slt. High level shown below.

------------     --------------   ----------------
|  DF slt  |     | SQLite slt |   | whatever slt |
------------     --------------   ----------------
    |                 |                 |
 ----------------------------------------------    --------------
 |     slt transpiler                         |<---| slt bank   |
 ----------------------------------------------    --------------
    |                 |                 |
    v                 v                 v
------------     --------------   ----------------
|  DF slt  |     | SQLite slt |   | whatever slt |
------------     --------------   ----------------

Boxes on the top are existing slt tests available in various
databases/repositories, e.g., SQLite (which is mostly discussed in
this thread).

slt bank is a set of slt tests independent of any dialect (or even
potentially dialect specific) that is contributed by third party
groups, e.g., researchers.

slt transpiler takes existing slt tests and transpiles to the
desired target (e.g., DF). Translation steps could be:

  • Take one record at a time
  • Transpile SQL to the target dialect (using one of the existing tools)
  • Adjust output (likely most challenging)
  • Output record: if it works keep it if it does not comment out in the output

The transpiler would also be a repository of links to existing slts.

Boxes at the bottom are the resulting slt tests.

Ideally, the transpilation would not be a one off process, but it
would be done for example every time slt tests are run. Benefit of
such an approach would be to ensure that any updates made to existing
tests (e.g., new tests added to SQLite) are reflected in the target
run (e.g., DF runs).

@gliga
Copy link

gliga commented Dec 4, 2024

@Omega359 I see you went quite far and already had some good findings.
Would you need any help at this point?

@Omega359
Copy link
Contributor

Omega359 commented Dec 4, 2024

@Omega359 I see you went quite far and already had some good findings. Would you need any help at this point?

Right now I'm just working through issues with the sqlite test files. My biggest issue actually is that the test runner seems to get very very very very slow sometimes (most of the time actually). I don't know why atm - I forked off the rust sqllogictest repo and tried a few things to try and automate converting value-oriented results to row-oriented results and at one point it ran super fast but otherwise it's 'fire off a run and come back in 1/2 hour'.

I'll push up my changes to my repo on github in the next day or so. I suspect it's a good few days of solid work going through all the tests and fixing/changing things.

@Omega359
Copy link
Contributor

Omega359 commented Dec 4, 2024

It would be great if there was a common infrastructure across databases for slt.

Boxes on the top are existing slt tests available in various databases/repositories, e.g., SQLite (which is mostly discussed in this thread).

slt bank is a set of slt tests independent of any dialect (or even potentially dialect specific) that is contributed by third party groups, e.g., researchers.

slt transpiler takes existing slt tests and transpiles to the desired target (e.g., DF).

The transpiler would also be a repository of links to existing slts.

Ideally, the transpilation would not be a one off process, but it would be done for example every time slt tests are run. Benefit of such an approach would be to ensure that any updates made to existing tests (e.g., new tests added to SQLite) are reflected in the target run (e.g., DF runs).

I am absolutely not against this at all - but it's a pretty large undertaking.

Currently the definition of what the format of a slt file is .. fluid. The original format from sqlite isn't directly compatible with the rust crate which has been a large part of my work. Fixing that would be step 1.

Having a base set of slt tests that work around most every db would be nice however given what I've seen so far those would be very basic tests. Even things like what the field type of a Float field isn't stable across db's (or at least the sqllogictest runner impl's). Support for create index .. not in DF (yet). Mysql using DIV for integer division vs / (mysql/oracle treat / as always float result iirc). Just a few basic examples

@2010YOUY01
Copy link
Contributor

After the slt work I was thinking of taking a look at the work that @2010YOUY01 has done with sqlancer to see if there are areas that I could add into it. I noticed that the datafusion-contrib for that was merged into sqlancer main a few months back - https://github.com/sqlancer/sqlancer/tree/main/src/sqlancer/datafusion.

sqlancer has quite a few options wrt test oracles that might be useful to look at. The one I was looking at (DQP) I think would need support for query hints (or corresponding set options) to function but if we could get it to work would allow a lot of join testing to happen.

I think adding test oracles like DQP to sqllogictests is also a project worth to do.

SQLancer is a fuzzer implementation with query generator + test oracles (for property tests), and these two component are loosely coupled: test oracles don't necessarily have to be implemented with SQLancer's query generator, the invariants can be tested as long as there is a valid query, maybe the best strategy is to implement them both in existing test cases(slt) and SQLancer's randomly generated queries.

For DQP, I remember there is a configuration we can turn off 🤔, and there will be no join-reordering, so join order can be controlled by writing table order in SQL queries differently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants