Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review query planner hack #499

Open
JimBacon opened this issue Nov 13, 2023 · 0 comments
Open

Review query planner hack #499

JimBacon opened this issue Nov 13, 2023 · 0 comments

Comments

@JimBacon
Copy link
Member

There is a function called optimiseQueryPlan in ReportEngine.php, which refers to this thread as the reasoning: https://stackoverflow.com/questions/6037843/extremely-slow-postgresql-query-with-order-and-limit-clauses.

It’s a hack that tries to avoid a horribly slow approach the optimiser takes to some queries. It did successfully improve the performance of many queries quite considerably, but would be less necessary now that most occurrences reporting is done against ES. In addition, we have just upgraded to PG13 from PG11 which may change the situation.

An example was encountered where the hack did the opposite of what was intended and cause a query to take 2 minutes instead of 2 tenths of a second. That query was

SELECT o.*
FROM cache_occurrences_functional o
WHERE 1=1
ORDER BY o.id+0 ASC
LIMIT 3;

What is not known is whether this hack could be removed altogether now, or whether the example was a rare case of when the hack doesn’t work and it’s better left in place. In addition, we have just upgraded to PG13 from PG11 which may change the situation.

John says
We could probably test this quite easily by simulating a few queries that showed the problem once everything is in place. From memory a good example is to select records of a rare taxa, limit to a few records and sort by ID desc. A bad query plan assumes it will find all the records quickly if it starts at the end of the table and does an index scan backwards. A good plan would realise the records are distributed, so use an index to find the records of that taxon, then walk backwards to find the required limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant