-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug preventing orderedAggregate
and distinctAggregator
from being used together (alternative approach)
#578
base: master
Are you sure you want to change the base?
Conversation
f5c179b
to
66c8f7e
Compare
This is based on the refactoring in #575. |
I think this is my preferred approach but I won't be able to get to it immediately. |
86f19ab
to
ce8015a
Compare
ce8015a
to
b4d0804
Compare
Hi @shane-circuithub, please ping me if this is is urgent for you. I'll get to it in due course but I'm short of spare mental cycles at the moment. |
b4d0804
to
ad27062
Compare
It's not necessarily urgent that you merge this, because I can just pin to this commit for now, but I am actually using this. I didn't realise when I first wrote this, but this is actually needed for ordered-set aggregation functions to work. Take the following query for example: SELECT
percentile_cont(0.5) WITHIN GROUP (ORDER BY column1)
FROM (
VALUES
(1),
(2),
(3),
(4)
) _;
If you try to recreate that with Opaleye (without this commit), it will produce the following (paraphrased): SELECT
percentile_cont(inner_1) WITHIN GROUP (ORDER BY inner_2)
FROM (
SELECT
0.5 AS inner_1,
column1 AS inner_2
FROM (
VALUES
(1),
(2),
(3),
(4)
) _
) _;
With this PR, this becomes: SELECT
percentile_cont(0.5) WITHIN GROUP (ORDER BY inner_1)
FROM (
SELECT
column1 AS inner_1
FROM (
VALUES
(1),
(2),
(3),
(4)
) _
) _;
With this PR, The other "valid" way to construct ordered set aggregation functions (which admittedly I care much less about supporting) is with the direct argument in a SELECT
percentile_cont(column1) WITHIN GROUP (ORDER BY column2)
FROM (
VALUES
(0.5, 1),
(0.5, 2),
(0.5, 3),
(0.5, 4),
(0.8, 5),
(0.8, 6),
(0.8, 7),
(0.8, 8)
) _
GROUP BY
column1; Without this PR, if you try to write this in Opaleye it will produce the following SQL (again paraphrased): SELECT
percentile_cont(inner_2) WITHIN GROUP (ORDER BY inner_3)
FROM (
SELECT
column1 AS inner_1,
column1 AS inner_2,
column2 AS inner_3
FROM (
VALUES
(0.5, 1),
(0.5, 2),
(0.5, 3),
(0.5, 4),
(0.8, 5),
(0.8, 6),
(0.8, 7),
(0.8, 8)
) _
) _
GROUP BY
inner_1;
Even though SELECT
percentile_cont(inner_1) WITHIN GROUP (ORDER BY inner_2)
FROM (
SELECT
column1 AS inner_1,
column2 AS inner_2
FROM (
VALUES
(0.5, 1),
(0.5, 2),
(0.5, 3),
(0.5, 4),
(0.8, 5),
(0.8, 6),
(0.8, 7),
(0.8, 8)
) _
) _
GROUP BY
inner_1;
Because this PR rewrites only references, and because it keeps track of already-rewritten references and makes sure to only to rename each reference once, both the This is not to put pressure on you to deal with this right now if you don't have the capacity (again I'm happy to just pin to this commit in the short-term), but it just explains an additional motivation that I wasn't aware of when I first wrote this. |
Could you double check? That test passes here. EDIT: No, it fails here (as expected)! |
Hmm, and if fails in CI. Maybe a difference between Postgres versions? I'm using 13 locally and CI is using 11. https://github.com/tomjaguarpaw/haskell-opaleye/actions/runs/6587659165/job/17898358429#step:17:627 |
Hmm, no it fails in CI in 13 too. I'm really confused why it doesn't fail for me locally! https://github.com/tomjaguarpaw/haskell-opaleye/actions/runs/6587744501/job/17898614746#step:17:631 |
No, it does fail locally. Not sure what I was thinking. Never mind. |
The error is "in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list", so I think I have to understand that first. |
I think we have a bigger problem, because if I slightly change your test case then your fix is no longer sufficient. I appreciate that your fix is enough to make it work with well-formed examples that should work but I'd much rather find a fix that doesn't allow generating incorrect SQL. @ Test/Test.hs:626 @ testStringArrayAggregateOrderedDistinct :: Test
testStringArrayAggregateOrderedDistinct = it "" $ q `selectShouldReturnSorted` expected
where q =
O.aggregateOrdered
- (O.asc snd)
+ (O.asc (\x -> snd x O..++ snd x))
(PP.p2 (O.arrayAgg, O.distinctAggregator . O.stringAgg . O.sqlString $ ","))
table7Q
expected = [ ( map fst sortedData
@ Test/Test.hs:1506 @ main = do |
I can now understand the aggregator
|
I think the correct API for this is probably something like makeAggregator ::
AggregatorFunction w a b ->
Order a ->
Aggregator a b
makeDistinctAggregator ::
AggregatorFunction w a b ->
Order w ->
Aggregator a b where the stringAgg ::
AggregatorFunction
(Wrap (Field SqlText), Wrap (Field SqlText))
(Field SqlText, Field SqlText)
(Field SqlText) and wrap :: Field a -> Wrap (Field a)
ascWrap :: Order (Wrap (Field a))
asc :: Order (Field a)
asc = contramap wrap ascWrap I think this would work. It's a bit heavyweight, but actually resolves my preexisting concerns about the semantics of I've no idea whether the same approach would work for the |
…tinctAggregator` to be used with `orderAggregate`
ad27062
to
56e733e
Compare
It's redundant because we only need to rebind PrimExprs that came from a lateral subquery. As explained at [1], rather than carefully analysing which PrimExprs came from a lateral subquery we can just rebind everything. A previous commit [2] changed things so that everything mentioned in aggregator order expressions is rebound. Instead we probably should have done what this commit does, that is, added an Unpackspec constraint to aggregate. Fixes #587 This is a simpler approach to resolving the issues discussed at * #585 * #578 This still suffers from the problem described at: #578 (comment) i.e. if we find a way of duplicating field names, such as O.asc (\x -> snd x O..++ snd x) then we can still create crashing queries. The benefit of this comment is that there is a way of generating non-crashing queries! [1] https://github.com/tomjaguarpaw/haskell-opaleye/blob/52a4063188dd617ff91050dc0f2e27fc0570633c/src/Opaleye/Internal/Aggregate.hs#L111-L114 [2] d848317, part of #576
It's redundant because we only need to rebind PrimExprs that came from a lateral subquery. As explained at [1], rather than carefully analysing which PrimExprs came from a lateral subquery we can just rebind everything. A previous commit [2] changed things so that everything mentioned in aggregator order expressions is rebound. Instead we probably should have done what this commit does, that is, added an Unpackspec constraint to aggregate. Fixes #587 This is a simpler approach to resolving the issues discussed at * #585 * #578 This still suffers from the problem described at: #578 (comment) i.e. if we find a way of duplicating field names, such as O.asc (\x -> snd x O..++ snd x) then we can still create crashing queries. The benefit of this comment is that there is a way of generating non-crashing queries! [1] https://github.com/tomjaguarpaw/haskell-opaleye/blob/52a4063188dd617ff91050dc0f2e27fc0570633c/src/Opaleye/Internal/Aggregate.hs#L111-L114 [2] d848317, part of #576
It's redundant because we only need to rebind PrimExprs that came from a lateral subquery. As explained at [1], rather than carefully analysing which PrimExprs came from a lateral subquery we can just rebind everything. A previous commit [2] changed things so that everything mentioned in aggregator order expressions is rebound. Instead we probably should have done what this commit does, that is, added an Unpackspec constraint to aggregate. Fixes #587 This is a simpler approach to resolving the issues discussed at * #585 * #578 This still suffers from the problem described at: #578 (comment) i.e. if we find a way of duplicating field names, such as O.asc (\x -> snd x O..++ snd x) then we can still create crashing queries. The benefit of this comment is that there is a way of generating non-crashing queries! [1] https://github.com/tomjaguarpaw/haskell-opaleye/blob/52a4063188dd617ff91050dc0f2e27fc0570633c/src/Opaleye/Internal/Aggregate.hs#L111-L114 [2] d848317, part of #576
(This is an alternative fix to the bug described in #577 that doesn't require adding
Eq
andOrd
constraints toPrimExpr
.)This PR contains two commits.
The first commit adds a (failing) test case demonstrating the inability to combine
orderedAggregate
anddistinctAggregator
together, even in the limited cases allowed by PostgreSQL. The problem is that PostgreSQL only allows this if the expressions given in theORDER BY
clause match the expressions given as arguments to the aggregation function.The second commit fixes this bug. It does so by adding
Eq
andOrd
constraints toSymbol
(but notPrimExpr
) and using them to detect identical references anywhere within thePrimExpr
s contained with theAggregate
. Instead of renaming the entire expressions, we only rename references to fields (which could possibly be lateral references, though this isn't checked). We maintain aMap
of already-renamed symbols such that identical references will be renamed to the same symbol, fulfilling PostgreSQL's restriction.