-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serious performance degradation due to SQL query nesting #340
Comments
Hang on... pressed submit, by mistake. |
Regarding the It seems that this should be a once-per-connection query issued by |
Okay... I'm done with my monster post! |
I should give you notice up front that unless you can convince me that issuing hundreds of thousands of consecutive individual queries with one-row results is something you really need to do then I'm not going to be very inclined to work on this performance issue! |
Here are a few things you should check.
|
Thank you for providing the context about your request-response cycle which does indeed convince me that low latency is important for your application. At this point the most helpful thing you can do is write the benchmark alluded to in my previous comment. |
We moved away from using JOINs via Opaleye because the nesting is more severe with them. However, even with simple lookups by PK, the queries are way slower. Here's the log from a single request/response cycle along with the reason for why those queries are being made:
Does this query pattern seem reasonable for a typical request/response cycle? If the query times are added it comes to 56ms and the response was sent out in 61ms So, querying the DB efficiently is critical to the kind of webapps we're writing. A different endpoint written in Rails, which is making 10 SQL queries, is responding in 47.8ms out of which ActiveRecord is taking only 11.7ms! The DB layer is essentially 5x faster. |
Isn's this a foregone conclusion? Query strings generated by Opaleye are easily 40-50x longer than the ones generated by Rails, due to enumeration all column names across multiple levels of nesting.
Is nesting the query a necessity? Can't multiple levels of nesting be collapsed in the internal Opaleye AST itself, or while converting the AST to the query-string? Your thoughts? |
I'm not saying you should benchmark them to see who's faster. I'm saying you should benchmark them to see what proportion of Opaleye's time is spent in query generation as opposed to query execution.
No
Yes
My thought is I'm not going to implement that until I have a robust benchmark. |
If I'm going to spend my time doing something that I want to be able to very carefully assess its impact. |
Am I missing something here? Doesn't the data shared above demonstrate that query execution ( on PG side) is a perf-bottleneck? Do you have another hypothesis about query generation (on Opaleye side) also being a perf-bottleneck? |
Yes, I suspect query generation is also a bottleneck. It manipulates Haskell In order to fix this performance issue I need a robust way of getting performance numbers out for both the generation and execution stages. |
Execution stage would basically be an instrumented version of |
Yes, I would say split
|
Where are we on this? |
We are wrapping up all docs/fixes that we came across while building our
Haskell module. This will be picked once all of that is over. A few weeks.
On 30-Oct-2017 1:22 PM, "tomjaguarpaw" <[email protected]> wrote:
Where are we on this?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#340 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABu0caXDGGxzyDUE42nje3NzvTHoqElks5sxYA-gaJpZM4QAlxZ>
.
|
I'm planning to pick this up now. How would you like the benchmarking tooling to work? What would you like your dev workflow to be? |
I need some function that I can run that will produce performance statistics for the cases you care about. I would like statistics about the run time of query generation and also the run time of query execution. I don't think I mind much more than that. |
So, just a function (or bunch of functions) that can be run via GHCi? Will running it via GHCi cause the perf numbers to vary wildly? Do you use byte-code or object-code in GHCi? Will that have any impact? |
You can also have a |
So, let's benchmark the following use-cases:
Anything else you (or anyone else?) can think of? |
Seems fine to me. We can always add new use-cases later. |
Why has |
It's been deprecated because it's going to be moved into an internal module. It's fine for you to use it in this benchmark harness. |
I think in view of #476 this issue should be re-opened. Probably prepare if for Hackoberfest so that we can tackle this as a community and get it fixed once and for all. |
Yes, would be great to have a solid benchmark suite so that we can see what difference any improvements make and ensure that we don't get performance regressions. |
I will look again at the benchmarks in #347 |
My idea about this is that instead of generating unique column names each time we use a select we instead generate unique tables names. That means that instead of
we could generate
An orthogonal simplification would be do combine clauses where valid. For example, instead of
we could generate
With transformations like this we could collapse a lot of the subquery nesting. |
There's a nice improvement on the branch mergeLimitOffset (cae2950). Before:
After
|
This certainly looks a lot easier to debug! I'm still sceptical it brings performance improvements, but definitely looks worth having |
Actually, I take that back - if the pretty printer is slow, then printing less is almost definitely going to be faster! |
@saurabhnanda wrote some benchmarks that do show performance improvements when cruft is removed from small queries. I haven't specifically targeted those benchmarks with these improvements but I'm hopeful that something good for performance will come out of it. |
Hi, here is another data point of where opaleye's nested queries make postgres slower than needed. We have a work-queue implemented in postgres, with the following code. It does the equivalent of: SELECT *
FROM myproject.jobs
WHERE "status_code" = 'pending' AND "queue" 'myqueue'
ORDER BY "priority" DESC NULLS FIRST,
"attempts" ASC NULLS LAST,
"creation_time" ASC NULLS LAST In Opaleye: proc () -> do
row <- orderBy (desc jrPriority <> asc jrAttempts <> asc jrCreationTime) (jobQuery hdl) -< ()
restrict -< jrQueue row .== pgStrictText queue
codeRestrict -< row
returnA -< row
where
codeRestrict = case mbCode of
Nothing -> proc _row -> returnA -< ()
Just code -> proc row -> restrict -< jrStatusCode row .== pgStrictText code Opaleye-generated query: SELECT
"id0_1" as "result1_2"
FROM (SELECT
*
FROM (SELECT
*
FROM (SELECT
*
FROM (SELECT
*
FROM (SELECT
*
FROM (SELECT
"id" as "id0_1",
"queue" as "queue1_1",
"priority" as "priority2_1",
"creation_time" as "creation_time3_1",
"payload_json" as "payload_json4_1",
"status_code" as "status_code5_1",
"attempts" as "attempts6_1",
"worker" as "worker7_1",
"start_time" as "start_time8_1",
"end_time" as "end_time9_1",
"last_heartbeat" as "last_heartbeat10_1",
"result_json" as "result_json11_1"
FROM "myproject"."jobs" as "T1") as "T1") as "T1"
ORDER BY "priority2_1" DESC NULLS FIRST,
"attempts6_1" ASC NULLS LAST,
"creation_time3_1" ASC NULLS LAST) as "T1"
WHERE (("status_code5_1") = (CAST(E'pending' AS text))) AND (("queue1_1") = (CAST(E'myqueue' AS text)))) as "T1"
LIMIT 1) as "T1") as "T1" Performance:
Simplifying the query, removing Opaleye's nesting, improves performance 2x (34 ms -> 18 ms), removing the SELECT
"id",
"queue",
"priority",
"creation_time",
"payload_json",
"status_code",
"attempts",
"worker",
"start_time",
"end_time",
"last_heartbeat",
"result_json"
FROM "myproject"."jobs"
WHERE ("status_code" = (CAST(E'pending' AS text))) AND (("queue") = (CAST(E'myqueue' AS text)))
ORDER BY "priority" DESC NULLS FIRST,
"attempts" ASC NULLS LAST,
"creation_time" ASC NULLS LAST
LIMIT 1;
|
Also note how the sort changed: -Sort Method: quicksort Memory: 3201kB
+Sort Method: top-N heapsort Memory: 25kB |
@nh2 Can you also share the PostgreSQL version this was ran on? |
Postgresql 9.6 with recommended settings for SSD (changing to |
Thanks. That's a fair few major versions back now, I wonder what PG 14 would make of the above query. https://dbfiddle.uk and friends might be able to help here, though it's a bit of work. (I'm also not saying "oh just upgrade" - obviously if we can fix this for PG 9.6, we should). |
Thanks for the report @nh2. Could you please check that the performance degradation is not due to the relative placement of orderBy (desc jrPriority <> asc jrAttempts <> asc jrCreationTime) (jobQuery hdl) could you please put the If that doesn't recapture the missing performance then I will investigate how to remove these nested queries to see if that does the job. I'm not sure how I can most effectively proceed though, given that I have neither Postgres 9.6 (Debian stable is on 13.7) nor any data to run the query on. Suggestions welcome. |
Oh, I see the plan does the EDIT: Hmm, I don't mean |
@tomjaguarpaw I can try to do that, but it may take some time. I should also be able to try the same query on a local Postgres 13 or 14.
That one is easy with Nix:
|
OK, I understand what is happening in the I have to say, I think this is a weakness of Postgres, and I wouldn't be surprised if it were fixed in a version later than 9.6 (the first version in the 9.6 series was released nearly 6 years ago). So please check with
In principle I'm willing to help try to fix this regardless, but I shall certainly be more motivated to do so if it's still bad when you try one, or especially both, of the above. |
I believe that upgrading to PG 14 won't fix this problem: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=943a8f083302602b734c290242066432 |
Here's a bit more info: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=4a1eb1d20a03aa3ca0cc8ca1bd17d000 It looks like the important thing is that the |
Hmm, the "optimal" version there uses quicksort though (and they all have the same predicted "cost", but I'm not if that's meaningful without data in the table). |
I was more looking at the overall plan and when a subquery scan happened. I agree populating the table will probably matter too |
|
Related to #334 & #284
Some comparative numbers while running the same "fetch by PK" type of query in Rails/ActiveRecord and Opaleye. The query generated by Opaleye is ~3-8x slower. (compare the query-timing from PG logs for the 3x difference. 8x difference is observed as soon as a similar query is re-fired -- it seems that bind/execute, which Rails is firing, caches better at the PG-level, compared to what Opaleye is firing).
Rails/ActiveRecord
Timing from PG Logs: 14.046ms (total of parse, bind, & execute step)
Opaleye
Timing from a wrapper written on top of Opaleye.runQuery: 53.026ms
Timing from PG Logs: 40.622 (total of all three queries given below)
Questions to ponder:
SELECT - FROM - WHERE - ORDER - LIMIT/OFFSET
be applied in a single SQL query without nesting? The same question can be extended to JOINs (Opaleye generated left join queries seems to slow down the Postgresql's query planner #284)Conclusion
For a RDBMS-backed webapp, which is making 4-10 queries in every request/response cycle, this performance degradation adds up really quickly. Similar endpoints written in Rails are responding under 80ms, and the average response time in Haskell is about 250-300ms, most of which can be attributed the queries being generated by Opaleye.
If this isn't fixed, Opaleye's killer feature (query composition), ends-up being a great PoC, but unusable in production. What can we do to fix this?
The text was updated successfully, but these errors were encountered: