-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop and recreation of database indices on upgrade #3679
Comments
Since the update may change the database shema, it is necessary to remove the indices, because of the constraints. And since the update is done rarely I would say it is a acceptable cost. |
Hello @dartcafe , to give some context, a very large instance that uses the Polls App had a huge downtime when upgrading. Analyzing the problem we discovered that it is related to this indices situation. The ideal way Nextcloud application manages the database schema is the following:
You should not have to worry about "the update may change the database schema", you know if the update changes the schema or not because that depends on the Migration classes you added. To ensure that Polls remains compatible with very large instances, we propose to write the PR to replace the RepairSteps in place in polls app with standard migration classes, would you be open to review and merge it? |
I know of the context. But:
Don't get me wrong, I totally understand the request. But there was an intention behind the current way doing the migration. And I was totally aware of the differences to the recommended way, which I decied to go away from. I t just made my work easier. And it had never been an issue for years. Changing back to the recommended way is possible, but has side effects and needs maintenance. So please give me time to think about it. Maybe the preconditions for a vue3 beta and possibly release will be done until that. |
I forgot, since it is not a bug by definition: It is always helpful to understand and maybe for reproduction, if details of the environment are delivered. This way, it is possible to understand the value, which the change could develop. What I mean, what time got measured, how many db entries are responsible for the issue, etc. |
So in short while still investigating and not yet having found the root cause.
Furthermore we started working on first steps towards zero downtime deployments at least for some scenarios so not doing DB changes when not necessary would be one important aspect. |
The number of records I can see if I can get these values somehow and post them here. @ChristophWurst might remember the number of records maybe. |
Thanks for the first insight. 22 minutes is indeed huge. Are you able to identify any particular long running index creation? I ask because if one index or a kind of index is/are responsible for the run time, maybe some further action has to be done, to also avoid long running index recreation times even if the index creation is neccessary for a single migration. I suspect the foreign key constraints are the time killer, since the other indices are mostly simple ones. Am I right, if I guess it is MariaDB/MySQL? |
MySQL cluster with 3 nodes in total, so the index also needs to be replicated in the end. If I remember correctly with proxysql between web and db tier. |
So this is the query running for 22 minutes when run through an explain
Which is the query being triggered when opening the frontend |
Can you add the row count of the polls_* tables? |
Is ist true, that there are 7566 polls in the database? |
@dartcafe Yes, that is true, see rough numbers below:
|
😮 Oooookay. N..nice. |
I would like to come back to the index creation. Can you tell the time consuming index creation? |
I don't know exactly but the DB has been under heavy load during the upgrade (with system in maintenance mode and cut of from client requests via network) for an hour. That is all I can tell at the moment. |
As a middle ground proposition or a first step, how about #3689 ? |
Further reasoning why not replacing indexes every time without an actual need for it in terms of "no structural changes, no need to recreate indexes" On the large instance if you ran the following SQL command before an upgrade (in this case 7.1.4) and more than 1,5h after the upgrade (to 7.2.1)
Before the upgrade
After the upgrade
As you can see, even after such a long time the indexes present and recreated aren't picked up by the DB because it lags the relevant index stats (say goodby to them when you drop an index). The query when run with an index: 1.8seconds, without hitting all the indexes 22 minutes So while #3688 is one fix or mitigation in terms of improving performance, not recreating indexes when there is no need to via #3689 is equally valid and I would propose to do both. Take the above case, it by now is so bad, that I do not expect the system to ever come back to life, because it takes the DB down, so how should the DB improve queries or their respective execution plan if you just took down the Nextcloud system in front of the DB since the DB peeked out at 100% utilization. So to finish my "speech" and rest my case: I vote for doing both yet as a first quick fix, make the index recreation smarter (still needs some fixing) and ship that to unblock upgrades, optimize the queries in general in a second step. That one is potentially more complex since @come-nc mentioned that slitting the queries up might not bring significant performance improvements. So ultimately as seen above the main performance boost is to ensure queries hit indexes in order to be fast (I know, I am stating the obvious things, but just wanted to make it explicit). |
As seen above after recreating the indexes, even after 1,5h the query still misses to hit 3 indexes. Lacking the index stats the DB believes the query will be faster not using the index which is proven wrong, but the DB doesn't know (yet). |
Christoph's or our interpretation: same query, same indices available, but the DB doesn't use UNIQ_votes. That means three of the query parts run without an index after the upgrade, while they ran with an index before. MariaDB [nextcloud]> show index from oc_polls_votes; ^ that was after the upgrade. Cardinality=0. Research suggested that this means the db doesn't have accurate statistics about the index. Thus the db might wrongly assume it's better to execute the query without the index. |
For the sake of completeness, stats from prod
|
Happy to discuss all the infos above either here or at https://cloud.nextcloud.com/call/68qjowkv (the polls chat) |
Closing since solved by linked PR in 7.2.2 |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
What went wrong, what did you observe?
Upgrades are heavy on the database due to
polls/appinfo/info.xml
Lines 61 to 68 in 7e41c26
What did you expect, how polls should behave instead?
Use migration files or the optional indices APIs to add indices. Add them only once. Only drop them when they are not needed anymore.
What steps does it need to replay this bug?
Installation method
Installed/updated from the appstore (Apps section of your site)
Installation type
Updated from a minor version within same major version (i.e. 4.0.0 to 4.1.1)
Affected polls version
v7.2.1
Which browser did you use, when experiencing the bug?
Other browser
No response
Add your browser log here
No response
Additional client environment information
No response
NC version
Nextcloud 27
Other Nextcloud version
No response
PHP engine version
PHP 8.1
Other PHP version
No response
Database engine
MySQL
Database Engine version or other Database
No response
Which user-backends are you using?
Add your nextcloud server log here
No response
Additional environment informations
No response
Configuration report
No response
List of activated Apps
No response
Nextcloud Signing status
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: