Expunges happen without hitting limits #508

boppy · 2024-05-13T13:29:35Z

Facts first:

APCu 5.1.23
PHP fpm 8.2.18 (inside a docker container based on php:8.2-fpm)
Memory-Type mmap

I observe that the cache expunges way too often without any hint why it's doing so.

Does anyone have an idea what's going on and how I can mitigate?

I implemented logging by calling the following script every 30s and writing its results to a log file:

<?php
$ci = apcu_cache_info(true);
$si = apcu_sma_info(true);

$info = [
    'now' => date('Y-m-d H:i:s'),
    'num_slots' => $ci['num_slots'],
    'num_hits' => $ci['num_hits'],
    'num_misses' => $ci['num_misses'],
    'num_inserts' => $ci['num_inserts'],
    'num_entries' => $ci['num_entries'],
    'expunges' => $ci['expunges'],
    'start_time' => date('c', $ci['start_time']),
    'mem_size_MB' => $ci['mem_size'] / 1024 / 1024,
    'avail_mem_MB' => $si['avail_mem'] / 1024 / 1024,
    'max_mem_MB' => ($si['num_seg'] * $si['seg_size']) / 1024 / 1024
];

echo implode("\t", $info);
echo PHP_EOL;

I then analyze the output with:

awk '$8 != acht { print lali; print; print "" } { acht=$8; lali=$0 }' apcu*log

My output is:

Day Time	Slots	Hit	Miss	Insert	Entries	KILLS	Start Time	Memory Size	Memory Avail	Max Memory
2024-05-12 00:46:17	163841	6353095	1	3997021	115772	1	2024-05-11T16:58:57+02:00	995.14669799805	1000.0436172485	1999.9998855591
2024-05-12 00:46:47	163841	4936	2	3738	429	2	2024-05-12T00:46:20+02:00	15.228881835938	1983.5077209473	1999.9998855591

2024-05-12 07:40:01	163841	5641487	2	3533330	110189	2	2024-05-12T00:46:20+02:00	995.19388580322	1000.1700973511	1999.9998855591
2024-05-12 07:40:31	163841	7425	1	6298	919	3	2024-05-12T07:40:06+02:00	21.448593139648	1969.1580581665	1999.9998855591

2024-05-12 13:18:07	163841	5769070	1	3800976	109887	3	2024-05-12T07:40:06+02:00	995.13430023193	1000.2423477173	1999.9998855591
2024-05-12 13:18:37	163841	9731	0	8950	1105	4	2024-05-12T13:18:12+02:00	25.843650817871	1963.471206665	1999.9998855591

2024-05-12 19:01:42	163841	5365891	0	3428149	112283	4	2024-05-12T13:18:12+02:00	994.80582427979	1000.4887771606	1999.9998855591
2024-05-12 19:02:12	163841	1818	0	1948	405	5	2024-05-12T19:02:03+02:00	16.132225036621	1982.6050491333	1999.9998855591

2024-05-13 00:32:43	163841	4912471	0	3072161	109755	5	2024-05-12T19:02:03+02:00	995.37220001221	1000.0014648438	1999.9998855591
2024-05-13 00:33:13	163841	12136	1	10872	1528	6	2024-05-13T00:32:43+02:00	33.372436523438	1955.4794082642	1999.9998855591

So basically it's always crashing if available space reaches 1000M.

I don't get why it's doing so. You see that my "Slots Hint" is 160k (shown as 163.841 in col 2 "Slots"), while "only" handling around 100-130k entries (col 6 "Entries"). I thought the problem was that I only assigned 1000M to apcu at first. So I upped it to 2000M, but the expunges still happen often.

Any hints are highly appreciated!

Full Config

[APCu]
apc.enabled = 1
apc.enable_cli = 1
apc.shm_size = 2000M
apc.shm_segments = 1
apc.shm_strings_buffer = 64M
apc.gc_ttl = 30
apc.entries_hint = 160000

The text was updated successfully, but these errors were encountered:

boppy · 2024-05-13T15:56:00Z

Additional findings:

apcu/apc_cache.c

Lines 755 to 768 in 1ba5a2d

    
           suitable = (cache->smart > 0L) ? (size_t) (cache->smart * size) : (size_t) (cache->sma->size/2); 
        
           /* gc */ 
        
           apc_cache_wlocked_gc(cache); 
        
           /* get available */ 
        
           available = apc_sma_get_avail_mem(cache->sma); 
        
           /* perform expunge processing */ 
        
           if (!cache->ttl) { 
        
           	/* check it is necessary to expunge */ 
        
           	if (available < suitable) { 
        
           		apc_cache_wlocked_real_expunge(cache); 
        
           	}

If the smart-Flag is not set (since it's not mentioned on the config page, I never set it to begin with - see #504), the code checks if available is HALF the full cache size, and purges if not. - Since I'm not that deep into the code, I assume I miss something here, because as it's a switched thing, there for sure is some sense I'm just not getting ;)

Update I: ~2 days later:

After setting the smart flag to 1, the purge does not happen anymore. but after reaching approx. 75% of usage, the FPM processes do not respond anymore. I assume it's because APC cannot find a big enough slot to store needed information. I do not see any hint on that in the log files, just a "max_children reached", because it piles up.

Update II: ~3 days later:

The problem seems to boil down to the massive fragmentation my implementation caused. After resetting smart to 0 again and upping the segment size to 3000M, I'm not running into purges or crashes anymore, because we still have a bunch of storage free at the end of the segment. I'm currently at > 54% usage without purge with 98% fragmentation. After another update that will run this night, the fragmentation should lower further, because I'm not inserting data multiple times per sec anymore.

Nevertheless I think that at least the hard crashes are something to look after. Also the docs could state, how and why the purge takes place.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expunges happen without hitting limits #508

Expunges happen without hitting limits #508

boppy commented May 13, 2024

boppy commented May 13, 2024 •

edited

Loading

Expunges happen without hitting limits #508

Expunges happen without hitting limits #508

Comments

boppy commented May 13, 2024

boppy commented May 13, 2024 • edited Loading

boppy commented May 13, 2024 •

edited

Loading