-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We must migrate the mysql databases from utf8mb3 to utf8mb4 #938
Comments
This is blocked by the investigation currently being led by 2U. Migrating from utf8mb3 to utf8mb4 would require to extend the maximum length of certain columns. Until this investigation is complete, we should probably put all new users on utf8mb4, such that they don't have to migrate later. |
@regisb: I think we should be careful to distinguish between changing the connection settings and converting existing table data. We can change the connection settings without rebuilding tables or changing CharField lengths. The length of those fields might have been a problem a couple versions ago, but MySQL 8.0 is going to be making new tables with a row format of DYNAMIC by default anyway, and the indexing-length limits there are substantially higher. So changing the connection settings would leave a bunch of older tables as |
@regisb: I'm talking with folks at 2U about switching over the Studio connection initially, followed by LMS if that goes smoothly, hopefully in the next couple of weeks. Migrating existing tables from |
Yes, that would be great Dave. |
As discussed today in our Tutor users group meetup, we should tackle this issue in time for Redwood: https://openedx.atlassian.net/wiki/spaces/COMM/pages/3583016961/Tutor+Users+Group#2024-04-08 |
Listing my findings here for future reference. Running
|
Hey @ormsbee. While working on a fix for this issue, I upgraded the MySQL character_set and collation in Tutor-nightly to utf8mb4 and utf8mb4_general_ci respectively which you can find in this commit for reference. This is working perfectly fine for me, however when I looked into the DB, I noticed some columns were not upgraded to the utf8mb4 character set. The list of these columns and tables is: Similarly, some columns seemed to force the utf8mb4_unicode_cicollation and some columns were forcing the utf8mb4_bin collation. The list of these columns and tables is: The rest followed the utf8mb4_general_ci collation as specified. If I used utf8mb4_unicode_ci instead as the default collation, then some columns seemed to force the utf8mb4_general_ci collation. The list of these columns and tables is: The rest followed the utf8mb4_unicode_ci collation as specified. What I wanted to confirm from you is if these are intentionally saved in this specific charset and collation by edx-platform for a reason or if they are just an issue on our end that we need to resolve. One more thing I wanted to confirm is if we are using the utf8mb4_general_ci collation or the utf8mb4_unicode_ci collation or any other collation for the Redwood release or later on. The previous releases in Tutor were using the utf8mb3_general_ci collation. CC: @DawoudSheraz |
Thank you so much for taking on this important work!
We should prefer
With respect to the explicit encodings set in the system...
Tagging @bradenmacdonald, since he wrote the split_modulestore stuff here. It should be fine to update these to
These are intentionally using the different collations to indicate whether something should be treated like a machine identifier (case-sensitive) or like normal char fields (case-insensitive). They should keep these encodings and collations.
You can ignore these tables. They will be removed shortly. |
Also, I'm not sure how we want to handle the migrations of really big tables for the community. Converting something like |
As background, the explicitly created encodings/collations for Those use the https://github.com/openedx/openedx-learning/blob/main/openedx_learning/lib/collations.py This also helps us normalize case sensitivity between sqlite and MySQL so tests behave more like production. |
Also, we need to summarize this work in the operator release notes for Redwood before the release goes out. |
@ormsbee Hi, thanks for the detailed input on this.
I was thinking of using factory-boy to generate dummy/test data for CSM, around 10-20M, and see how it plays out. Or should we increase the record count? I understand CSM is The heaviest table when it comes to data. |
Oh, actually I think I may have a hypothesis on the |
I think the important thing is that we're going to have to make it opt-in, i.e. we don't require that table be converted, and we don't trigger it automatically. Because on some sites, running that conversion could literally take days (or weeks if they're trying to do it while serving some traffic). |
@DawoudSheraz, @Danyal-Faheem: Is it still the intention to get this into Redwood? |
Hey Dave. So far, it is the plan. We will update the issue if we run into challenges. |
Hi @ormsbee, after discussing with @DawoudSheraz, we've decided that we're going to add the utf8mb4 upgrade only for new tutor installations, i.e. tutor 18.0.0 and onwards. There's already a PR up for that in #1065. A method to upgrade the individual tables one by one for existing tutor installations will be provided in a later release but hopefully pretty soon. |
Hi @ormsbee. The default charset and collation in Tutor 18.0.0 aka redwood and onwards will We've also developed a tutor
We're still in the process of testing how this command works out for earlier installations and how long it takes on tables with a lot of data. From my initial testing, I dumped about 1.2 Million rows into the The upgrade process had the following benchmarks:
For now, we've only tried to upgrade the tables the Feel free to share your thoughts on this matter and if you have something that we should add/remove. CC: @regisb @DawoudSheraz |
Update on this issue: For the redwood release, i.e, Tutor v18.0.0, we will only be upgrading the default charset and collations of MySQL for new installations. A migration script to upgrade the tables one by one will be added in the form of a Tutor do command in a later release or in the form of a plugin. CC: @ormsbee |
Currently, Open edX runs with MySQL tables that are encoded with utf8mb3. It is time that we upgrade to utf8mb4 (the default).
For reference:
#887
https://discuss.openedx.org/t/palm-upgrade-running-rdbms-in-tutor-mysql-vs-mariadb-utf8-vs-utf8mb4/11030
edX.org is currently upgrading their environment to MySQL 8.0 (and utf8mb4). We should use this opportunity to learn from them any pitfalls they faced during the upgrade.
The text was updated successfully, but these errors were encountered: