Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL log rotation can break under certain circumstances #523

Closed
barryprice opened this issue Sep 21, 2024 · 13 comments · Fixed by #543
Closed

MySQL log rotation can break under certain circumstances #523

barryprice opened this issue Sep 21, 2024 · 13 comments · Fixed by #543
Labels
bug Something isn't working

Comments

@barryprice
Copy link

barryprice commented Sep 21, 2024

Steps to reproduce

  1. Deploy (on jammy/amd64 here), and wait for enough workload to generate a lot of logs

Expected behavior

logrotate should rotate mysql's log files to avoid filling the disk

Actual behavior

Under certain circumstances, the charm appears to get into a confused state and MySQL starts writing to log files that should already have bee rotated, the bundled logrotate logic stops working and doesn't rotate the actual open log files - which eventually will fill the disk.

logrotate config:

$ grep ^/var /etc/logrotate.d/flush_mysql_logs 
/var/snap/charmed-mysql/common/var/log/mysql/error.log {
/var/snap/charmed-mysql/common/var/log/mysql/general.log {
/var/snap/charmed-mysql/common/var/log/mysql/slowquery.log {

Actual contents of the log directory when the charm's in this state:

$ ls -l /var/snap/charmed-mysql/common/var/log/mysql/
total 268
drwxrwx--- 2 snap_daemon snap_daemon 122880 Sep 12 07:34 archive_error
drwxrwx--- 2 snap_daemon snap_daemon 139264 Sep 12 07:33 archive_general
drwxrwx--- 2 snap_daemon snap_daemon   4096 Sep 10 13:29 archive_slowquery

We found this file under the archive_general dir recently:

-rw-r----- 1 snap_daemon root 44G Sep 15 23:28 /var/snap/charmed-mysql/common/var/log/mysql/archive_general/general.log-20240912_0733

Versions

Operating system: Ubuntu 22.04.4 LTS

Juju CLI: 2.9.51-ubuntu-amd64

Juju agent: 2.9.49

Charm revision: 8.0/stable / 240

LXD: none

Log output

Juju debug log: not relevant

Additional context

The installed mysql snap is:

charmed-mysql        8.0.36         103    latest/stable  dataplatformbot  held
@barryprice barryprice added the bug Something isn't working label Sep 21, 2024
Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/DPE-5535.

This message was autogenerated

@paulomach
Copy link
Contributor

Thanks for the report @barryprice , we will check this asap

@paulomach
Copy link
Contributor

@barryprice doing a quick check, logrotate configuration seems to be working as expected.

The retention period is 7 days, which will account for max 7*24*60 = 10080 files per log type in each the archive_* directories.

Can you check if that big file was within this retention period? It can be the case to disable general log, and/or tweak the retention period.

@barryprice
Copy link
Author

The logrotate config only seems to refer to files that don't exist - there is no, e.g. /var/snap/charmed-mysql/common/var/log/mysql/error.log on the unit. But maybe I'm missing some magic there, or misunderstanding how that works.

Just to be clear, we expect 7 days total retention, and rotation once per hour?

The file in question began with data from three days prior via head ( which matches the filename, created at or around 20240912_0733) - but it was still being written to on 2024-09-15.

@paulomach
Copy link
Contributor

paulomach commented Sep 26, 2024

Hi @barryprice , thanks for the quick reply.
This seems to indicates some issue on the log flushing, that happens in tandem with the log rotation.
If a log file in the archive_* is still being written it means that the flush failed, since flush will close the current file descriptor and write a new file on standard path, e.g. /var/snap/charmed-mysql/common/var/log/mysql/error.log

The juju debug log should have entries for failed calls to the flush log routines, can you retrieve the logs, with debug log level?

Just to be clear, we expect 7 days total retention, and rotation once per hour?

Once per minute

@barryprice
Copy link
Author

Apologies for the delay here. I've provided the entire unit log (i.e. /var/log/juju/unit-mysql-29.log from the VM in question) to Paulo internally.

@barryprice
Copy link
Author

barryprice commented Oct 10, 2024

We've worked through this internally, for anyone following in the same situation who needs a quick fix:

Where x is the problem unit in question:

# for Juju 2:
juju run-action --wait mysql/x get-password username=serverconfig

# for Juju 3:
juju run --wait=2m mysql/x get-password username=serverconfig

Note the returned password, SSH to the unit and:

mysql -u serverconfig -p --protocol=tcp

Enter the password returned by the earlier run / run-action to get a MySQL client prompt, and then:

FLUSH GENERAL LOGS;
FLUSH ERROR LOGS;
FLUSH SLOW LOGS;

You should find that this creates the missing files and cleans up the older ones.

I think the charm could/should be better able to prevent this from happening in the first place, but that's the workaround AIUI.

Edit: I've updated the initial description and title to closer match our understanding after going over this, thanks!

@barryprice barryprice changed the title logrotate config doesn't match layout on disk MySQL log rotation can break under certain circumstances Oct 13, 2024
@hloeung
Copy link

hloeung commented Oct 14, 2024

Adding to this, on this unit I'm looking at, there are 14GiB of general logs with the oldest being:

-rw-r----- 1 snap_daemon root 16075781 Oct 13 14:20 general.log-20241013_1420

ubuntu@juju-730dff-prod-admin-insights-ubuntu-db-30:/var/snap/charmed-mysql/common/var/log/mysql$ sudo du --max-depth=1 -x -h
8.0K    ./archive_slowquery
128K    ./archive_error
14G     ./archive_general
15G     .

The flush general logs command above clears this freeing up some space.

There's also this:

ubuntu@juju-730dff-prod-admin-insights-ubuntu-db-30:/var/snap/charmed-mysql/common/var/log/mysql$ sudo lsof -p $(pidof mysqld)  | grep /var/snap
mysqld  31714 snap_daemon  cwd       DIR              252,1      4096 521573 /var/snap/charmed-mysql/common/var/lib/mysql
mysqld  31714 snap_daemon    1w      REG              252,1       144 523126 /var/snap/charmed-mysql/common/var/log/mysql/archive_error/error.log-20241011_1532 (deleted)
mysqld  31714 snap_daemon    2w      REG              252,1       144 523126 /var/snap/charmed-mysql/common/var/log/mysql/archive_error/error.log-20241011_1532 (deleted)

So the mysqld process is holding onto logs which likely has been rotated out by logrotate? The flush error logs seems to clear this closing out file descriptors opened to these to allow it to be freed.

@barryprice
Copy link
Author

We upgraded to the 8.0/stable r313 release a few days back, but I guess this fix either wasn't included or doesn't actually fix the problem here, as we were alerted for a 33G general.log-20241206_2212 over the weekend :(

@kot0dama
Copy link

And another one today for a 33GB /var/snap/charmed-mysql/common/var/log/mysql/archive_general/general.log-20241212_0013 logfile.

@taurus-forever
Copy link
Contributor

Dear @kot0dama , @barryprice , the logrotation fix
is available in charm revision 314/315 (8.0./edge only) :-(
Unfortunately it was merged late and was not included is SolQA testing (rev 312/313).

We are discussing the possible hotfixes here...

taurus-forever added a commit that referenced this issue Dec 13, 2024
The config option general_log is not compatible with the real
productions due to dumping all SQL queries to the general.log:
https://dev.mysql.com/doc/refman/8.0/en/query-log.html

> ... The server writes information to this log when clients connect or
> disconnect, and it logs each SQL statement received from clients...

In the past, it was a necessary evil as we had to keep record all
connects/disconnects from the clients for security reasons.
Nowadays we have audit_log plugin enabled by default, therefor no need to
pay such a high performance price and struggle from the general.log size

#523 (comment)
>  33GB /var/snap/charmed-mysql/common/var/log/mysql/archive_general/general.log-20241212_0013

Additionally, we are working on the logs compression spec (separate topic).

---

While working here, we have noticed unnecessary enabled plugin audit_log_filter
which also produces log files in /var/lib/mysql. Disabling it for now,
as it is not necessary for audit_log plugin announcer functionality.

We will consider re-enabling the plugin without rush in
https://warthogs.atlassian.net/browse/DPE-6235
@taurus-forever
Copy link
Contributor

The followup tickets / PR:
https://warthogs.atlassian.net/browse/DPE-6234
https://warthogs.atlassian.net/browse/DPE-6235
#566

The plan is to give 8.0/candidate with the fixed logs ASAP.

@barryprice
Copy link
Author

Thanks, it'd be great if we could get a hotfix release ahead of the EOY break so that folks aren't woken up by this when on-call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants