Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expunges happen without hitting limits #508

Open
boppy opened this issue May 13, 2024 · 1 comment
Open

Expunges happen without hitting limits #508

boppy opened this issue May 13, 2024 · 1 comment

Comments

@boppy
Copy link

boppy commented May 13, 2024

Facts first:

  • APCu 5.1.23
  • PHP fpm 8.2.18 (inside a docker container based on php:8.2-fpm)
  • Memory-Type mmap

I observe that the cache expunges way too often without any hint why it's doing so.

Does anyone have an idea what's going on and how I can mitigate?


I implemented logging by calling the following script every 30s and writing its results to a log file:

<?php
$ci = apcu_cache_info(true);
$si = apcu_sma_info(true);

$info = [
    'now' => date('Y-m-d H:i:s'),
    'num_slots' => $ci['num_slots'],
    'num_hits' => $ci['num_hits'],
    'num_misses' => $ci['num_misses'],
    'num_inserts' => $ci['num_inserts'],
    'num_entries' => $ci['num_entries'],
    'expunges' => $ci['expunges'],
    'start_time' => date('c', $ci['start_time']),
    'mem_size_MB' => $ci['mem_size'] / 1024 / 1024,
    'avail_mem_MB' => $si['avail_mem'] / 1024 / 1024,
    'max_mem_MB' => ($si['num_seg'] * $si['seg_size']) / 1024 / 1024
];

echo implode("\t", $info);
echo PHP_EOL;

I then analyze the output with:

awk '$8 != acht { print lali; print; print "" } { acht=$8; lali=$0 }' apcu*log

My output is:

Day Time Slots Hit Miss Insert Entries KILLS Start Time Memory Size Memory Avail Max Memory
2024-05-12 00:46:17 163841 6353095 1 3997021 115772 1 2024-05-11T16:58:57+02:00 995.14669799805 1000.0436172485 1999.9998855591
2024-05-12 00:46:47 163841 4936 2 3738 429 2 2024-05-12T00:46:20+02:00 15.228881835938 1983.5077209473 1999.9998855591
2024-05-12 07:40:01 163841 5641487 2 3533330 110189 2 2024-05-12T00:46:20+02:00 995.19388580322 1000.1700973511 1999.9998855591
2024-05-12 07:40:31 163841 7425 1 6298 919 3 2024-05-12T07:40:06+02:00 21.448593139648 1969.1580581665 1999.9998855591
2024-05-12 13:18:07 163841 5769070 1 3800976 109887 3 2024-05-12T07:40:06+02:00 995.13430023193 1000.2423477173 1999.9998855591
2024-05-12 13:18:37 163841 9731 0 8950 1105 4 2024-05-12T13:18:12+02:00 25.843650817871 1963.471206665 1999.9998855591
2024-05-12 19:01:42 163841 5365891 0 3428149 112283 4 2024-05-12T13:18:12+02:00 994.80582427979 1000.4887771606 1999.9998855591
2024-05-12 19:02:12 163841 1818 0 1948 405 5 2024-05-12T19:02:03+02:00 16.132225036621 1982.6050491333 1999.9998855591
2024-05-13 00:32:43 163841 4912471 0 3072161 109755 5 2024-05-12T19:02:03+02:00 995.37220001221 1000.0014648438 1999.9998855591
2024-05-13 00:33:13 163841 12136 1 10872 1528 6 2024-05-13T00:32:43+02:00 33.372436523438 1955.4794082642 1999.9998855591

So basically it's always crashing if available space reaches 1000M.

I don't get why it's doing so. You see that my "Slots Hint" is 160k (shown as 163.841 in col 2 "Slots"), while "only" handling around 100-130k entries (col 6 "Entries"). I thought the problem was that I only assigned 1000M to apcu at first. So I upped it to 2000M, but the expunges still happen often.

Any hints are highly appreciated!

Full Config
[APCu]
apc.enabled = 1
apc.enable_cli = 1
apc.shm_size = 2000M
apc.shm_segments = 1
apc.shm_strings_buffer = 64M
apc.gc_ttl = 30
apc.entries_hint = 160000
@boppy
Copy link
Author

boppy commented May 13, 2024

Additional findings:

apcu/apc_cache.c

Lines 755 to 768 in 1ba5a2d

suitable = (cache->smart > 0L) ? (size_t) (cache->smart * size) : (size_t) (cache->sma->size/2);
/* gc */
apc_cache_wlocked_gc(cache);
/* get available */
available = apc_sma_get_avail_mem(cache->sma);
/* perform expunge processing */
if (!cache->ttl) {
/* check it is necessary to expunge */
if (available < suitable) {
apc_cache_wlocked_real_expunge(cache);
}

If the smart-Flag is not set (since it's not mentioned on the config page, I never set it to begin with - see #504), the code checks if available is HALF the full cache size, and purges if not. - Since I'm not that deep into the code, I assume I miss something here, because as it's a switched thing, there for sure is some sense I'm just not getting ;)


Update I: ~2 days later:

After setting the smart flag to 1, the purge does not happen anymore. but after reaching approx. 75% of usage, the FPM processes do not respond anymore. I assume it's because APC cannot find a big enough slot to store needed information. I do not see any hint on that in the log files, just a "max_children reached", because it piles up.


Update II: ~3 days later:

The problem seems to boil down to the massive fragmentation my implementation caused. After resetting smart to 0 again and upping the segment size to 3000M, I'm not running into purges or crashes anymore, because we still have a bunch of storage free at the end of the segment. I'm currently at > 54% usage without purge with 98% fragmentation. After another update that will run this night, the fragmentation should lower further, because I'm not inserting data multiple times per sec anymore.

Nevertheless I think that at least the hard crashes are something to look after. Also the docs could state, how and why the purge takes place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant