Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicates in some selections with GROUP+HAVING #666

Closed
stepapo opened this issue Apr 10, 2024 · 8 comments · Fixed by #685
Closed

Duplicates in some selections with GROUP+HAVING #666

stepapo opened this issue Apr 10, 2024 · 8 comments · Fixed by #685
Milestone

Comments

@stepapo
Copy link

stepapo commented Apr 10, 2024

Describe the bug

Sometimes ORM adds columns to GROUP statement, which leads to retrieving duplicate rows.

To Reproduce

Looking for books with any of Tags ID 2 or 3, which at the same time have more than one tag. Should be two books, but results in three, including one duplicate.

$books = $this->orm->books->findBy([
    ICollection::AND,
    [CompareGreaterThanFunction::class, [CountAggregateFunction::class, 'tags->id'], 1],
    ['tags->id' => [2, 3]],
]);
Assert::same($books->count(), 2);
SELECT "books".* 
FROM "books" AS "books" 
LEFT JOIN "books_x_tags" AS "books_x_tags__COUNT" ON ("books"."id" = "books_x_tags__COUNT"."book_id") 
LEFT JOIN "tags" AS "tags__COUNT" ON ("books_x_tags__COUNT"."tag_id" = "tags__COUNT"."id") 
LEFT JOIN "books_x_tags" AS "books_x_tags_any" ON ("books"."id" = "books_x_tags_any"."book_id") 
LEFT JOIN "tags" AS "tags_any" ON ("books_x_tags_any"."tag_id" = "tags_any"."id") 
GROUP BY "books"."id", "tags_any"."id" 
HAVING ((COUNT("tags__COUNT"."id") > 1) AND (("tags_any"."id" IN (2, 3))))

Expected behavior

Adding "tags_any"."id" IN (2, 3) to HAVING requires column "tags_any"."id" to be added in GROUP, which leads to duplicates in selection. In MySQL, the column could be moved from GROUP to SELECT, but in Postgres, that does not do the trick. In both it would be probably ok to SELECT DISTINCT, or to remove column from GROUP and move condition to WHERE (as it was in 4.0 version).

Versions

  • Database: PostgreSQL 13.13.0
  • Orm: dev-main
  • Dbal: 5.0.0-rc4
@stepapo stepapo added the bug label Apr 10, 2024
@hrach
Copy link
Member

hrach commented Apr 10, 2024

Thank you for another great report. I hope this is the last one, not because you would stop, but because it is that way :D

However, this time the fix... is complicated.

@stepapo
Copy link
Author

stepapo commented Apr 10, 2024

Undestood. I also think it's the last one. I found only one more and it was already in v4.0. Now it's fixed, sort of, by NotSupportedException "Relationship cannot be fetched...". I'm worried that the same is going to happen to this one too :-D

hrach added a commit that referenced this issue Apr 11, 2024
@hrach
Copy link
Member

hrach commented Apr 11, 2024

Hm, MySQL is not working correctly (#667) with DISTINCT. Sadly, I cannot move it from having to where, because it would stop working with OR.

@stepapo
Copy link
Author

stepapo commented Apr 12, 2024

I cannot move it from having to where, because it would stop working with OR.

Seems to me that OR is already solved by including condition "tags_any"."id" IN (2, 3) in LEFT JOIN ... ON. Works well. Maybe the same solution could be used for AND?

$books = $this->orm->books->findBy([
    ICollection::OR,
    [CompareGreaterThanFunction::class, [CountAggregateFunction::class, 'tags->id'], 1],
    ['tags->id' => [2, 3]],
])->fetchAll();
SELECT "books".* 
FROM "books" AS "books" 
LEFT JOIN "books_x_tags" AS "books_x_tags__COUNT" ON ("books"."id" = "books_x_tags__COUNT"."book_id") 
LEFT JOIN "tags" AS "tags__COUNT" ON ("books_x_tags__COUNT"."tag_id" = "tags__COUNT"."id") 
LEFT JOIN "books_x_tags" AS "books_x_tags_any" ON ("books"."id" = "books_x_tags_any"."book_id")
LEFT JOIN "tags" AS "tags_any" ON (
    ("books_x_tags_any"."tag_id" = "tags_any"."id") 
    AND "tags_any"."id" IN (2, 3)
) 
GROUP BY "books"."id" 
HAVING ((COUNT("tags__COUNT"."id") > 1) OR ((COUNT("tags_any"."id") > 0)))

@stepapo
Copy link
Author

stepapo commented Apr 15, 2024

Just for reference, I tried simply commenting out this line and both OR and AND cases now work as well as all other tests in MySQL and Postgres. However, it brings down performance significantly for simple AND filters that use AnyAggregator.

I figured that current DbalExpressionResult probably does not allow combining both WHERE and HAVING conditions within a single query. Without that, it might be hard to achieve correct results and optimal performance at the same time. Understandably, complex filters need to be all in one or the other.

It is still a great job making most of this work correctly and this case is not a "must have" for me.

@hrach
Copy link
Member

hrach commented Apr 16, 2024

Thanks for exploring, yes, I had a few ideas but didn't have a chance to explore them. I hope I'll get to it at the end of this week :)

@hrach
Copy link
Member

hrach commented Oct 28, 2024

@stepapo finally fixed; if you could test/gave feedback if this is fixed for you, it would be awesome :) thx

@stepapo
Copy link
Author

stepapo commented Oct 29, 2024

Firstly thank you for moving this further. I have few observations, not necessarily fatal errors, might create issues later. For now just one thing:

I was able to update my project from ORM 4 to 5, finally getting my complex query to show correct results. However, the performance drop is significant. Probably due to all the HAVING clauses and ON clause filters. Even when using only simple "any" aggregations that worked in version 4.

$this->model->books->findBy([
	ICollection::OR,
	['id' => 1],
	['tags->id' => 2],
]);

ORM 4:

SELECT "books".* 
FROM "books" AS "books" 
LEFT JOIN "books_x_tags" AS "books_x_tags" ON ("books"."id" = "books_x_tags"."book_id") 
LEFT JOIN "tags" AS "tags" ON ("books_x_tags"."tag_id" = "tags"."id") 
WHERE (("books"."id" = 3)) OR (("tags"."id" = 1)) 
GROUP BY "books"."id"

ORM 5:

SELECT "books".* 
FROM "books" AS "books" 
LEFT JOIN "books_x_tags" AS "books_x_tags_any" ON ("books"."id" = "books_x_tags_any"."book_id") 
LEFT JOIN "tags" AS "tags_any" ON (("books_x_tags_any"."tag_id" = "tags_any"."id") AND "tags_any"."id" = 1) 
GROUP BY "books"."id", "books"."title" 
HAVING ((("books"."id" = 3)) OR ((COUNT("tags_any"."id") > 0)))

This is just a simple query, but in complex queries, the drop is from miliseconds to seconds. The results are correct in both. While I am eager to use new aggregation features, I also need my current queries to work fast without change. I don't know how hard and whether even achievable it would be to optimize this. Not to delay release further, my suggestion is a "legacy mode" that would allow user to use old ORM 4 queries (possibly by overriding AnyAggregator::aggregateExpression to always just return $expression;).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants