-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
APCu stuck on FUTEX_WAIT #410
Comments
That's a dead lock... The first suspect would be whether an opcache restart might have taken place at the same time this occurred. Opcache may kill processes that fail to restart in time, and if a killed process holds an rwlock, this will result in a deadlock. |
So 5.1.18 was still triggering the issue. Thus for now we disabled the drupal cron flush cache and see whether this is really what triggers it. What probably happens around the same time, is that logrotation happens, which will reload httpd. BUT since fpm processes are running as fpm, they are independent from httpd and thus should not be affected. How would such a opcache restart look like? |
It would result in "Attempting to kill locker" warnings in the opcache log. |
I have not really been able to find such a log entry in one of the logs. Would I need to set a specific opcache log to be sure to not loose it, as otherwise it just seems to go to stdout aka. engine log. |
@duritong I think it's necessary to set opcache.log_verbosity_level=2 (or higher) for these to show up. |
Hi, have this issue been resolved ? @duritong We met an issue which is quite similar with you. the apcu_inc was stuck |
I have tried to reproduce a deadlock under various conditions, but the only case where I was successful is if a process is terminated using SIGKILL. I have landed php/php-src@8b7aaad on the opcache side to try terminating processes using SIGTERM first. However, I still do not know if this is really the cause for the issue you're seeing or not. |
If you are making use of apcu_entry(), then this issue may be resolved in apcu 5.1.20. The opcache restart issue should also be addressed in PHP 7.4.16 / 8.0.3 (although there is only one attempt to SIGTERM before SIGKILL). |
hi, Since php 7.3 isn't in support anymore, this problem will not be fixed in php 7.3.x ? Thanks in advance, and thanks for the update ! |
We did a new try with php 7.4.16 and acpu 5.1.20 unfortunately, we had another lockup after some days. I'll try to get a run with a more verbose opcache log. But it seems still tricky to reproduce the issue. |
Hi, But these processes do not lead to problems in the customer store anymore. I still have 2 processes "hanging arround", let me know if i can help with more information on this. strace -p 55317 |
@chrisddwrt Would it be possible to get a backtrace from gdb, to find out which lock is hanging? Something like |
here we go: |
same problem with php 7.2.13+apcu5.1.16, did this issue resolved? |
Be sure that you are not running an older libc version. It has been quite a while, but some libc versions had problems with deadlocks in FUTEX WAIT where a writer was just sitting in front of a lock and not taking it. In fact the reader writer lock had been rewritten three times IIRC in libc till they got it right. |
Since a while we have one particular Drupal website that at some point (we think it's related to a cronjob flushing the caches) suddenly & immediately consuming all php-fpm processes. This then goes on and eventually fills the complete apache scoreboard, ending up the whole apache being blocked with:
If you look at the processes, you see all processes being idle:
When we stop the systemd service managing the fpm processes, the httpd recovers and also the site itself recovers and works fine until the lock is retriggered.
All processes are waiting on FUTEX:
If we get the backtrace of one of these processes, we see the following:
We had the issue with php 7.2 and 7.4, although both versions are using apcu 5.1.19 from @remicollet repositories.
This is:
The system is up2date and all patches haves been applied (also from @remicollet SCL repos).
php-fpm config for that particular vhost:
The fpm process managed through systemd unit and started socket-based activation.
For now we downgraded apcu down to 5.1.18 to see whether it is an issue with the changes in 5.1.19.
There is nothing in the php-fpm slow log and we also don't see anything particular exciting in the fpm-log itself except that max children exceeded.
Let me know if we can provide any additional infos.
The text was updated successfully, but these errors were encountered: