-
Notifications
You must be signed in to change notification settings - Fork 489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory defragmentation problems - added a best-fist algortm to nano-malloc #599
Comments
It does not appear to support the separate malloc regions needed for dram versus iram allocation. With that change the sbrk() function became a nop, there is no memory to grow, rather it is all pre-allocated to the malloc free list. There is another malloc implementation in newlib that does attempt to better handle fragmentation, but it seems at some cost, and it has not been extended to support the separate regions. From a quick look it appears to try to group objects of similar size in pools. Your strategy appears to be to search for the region on the free list that has the least room but that still fits the new allocation? That seems quite different to trying to allocate objects of a similar size to the same pool. A fall back strategy might be needed, to restart the code if fragmentation becomes a problem. This might be tested on a malloc failure, so if the free space is significant on a malloc failure then flag the heap as hopelessly fragmented and schedule a restart. It might be possible to extend the malloc region support to allow sub-pools to be defined, and some interface to allocate them. Then code could demand or prefer to allocate to a particular pool. Another alternative might be to allow sub pools to be defined based on object size, and an interface to allow these to be allocated. |
As far I have understood (but please correct me if I'm wrong), there are two regions: the first is the classic 'Heap' defined by the linker, the second one is the chunk added to the free pool after the system boot. There is no way for 'nano-malloc' to understand what kind of request is made, because there is only the 'dimension' param, as a request, and not any preference to the region. The best fit algorithm, as described in papers describing policies for malloc, try to allocate the request to the best fit free chunk in size. This help reducing fragmentation in the case you have requests of small chunks, that repeat in the time. By this way the free chunks tends to be re-allocated leaving bigger memory area available for other requests. In effect I've seen a great stabilization in the heap (I have some tasks running together), and despite some bugs that I'm solving now in this algorithm (As I've said is alpha actually... but I like to have comments about..) the advantage (at least in my case, where I have the small heap request coming from buffers alternated to the big buffers from the http server) is greatly evident. Restart the code could be not the right solution, especially if the software is serving an http page or providing an important service. Many thanks for your great work. |
The current code implements sbrk. The wip changes to support multiple malloc regions makes sbrk a nop and there is no sbrk reservation to allocate, and all the memory is added to the malloc free list. Some of the memory is allocated when available, for example the initial stack area - it is just pushed onto the malloc free list for use. The wip stores the regions to use in There is a requirement to be able to allocate to either the dram or iram, and it is desirable to be able to prefer one or the other. This requirement is due to hardware requirements, and perhaps software design decisions. The iram can not be used for some hardware buffers, and using it for task stack memory might challenge the exception handlers etc, and the iram supports only 32 bit accesses, the 8 and 16 bit accesses are emulated in the exception handler, so there are also performance reasons to allocate to dram versus iram. Generally heuristic allocation strategies are not going to prevent failure due to fragmentation, so detection of failure due to fragmentation is still needed in general. Memory compaction is not possible in C, not without effectively implementing a higher level memory manager on top of C, and no higher level alternative seems to have emerged. Thus simply restarting might be a practical strategy for these devices. Whether fragmentation is a problem depends on allocation patterns and the amount of free memory. The 'best fit' strategy would appear to lead to fragmentation but placing small objects next to larger objects and if these small objects are long lived then they fragment the heap. It would probably be better to place small objects close to other small objects, to use allocation bins. It also to distributes temporally related allocations over the heap. These are just heuristics and there are endless possibilities and problems. Off topic, but this is an area that interests me. There are for example some JS VMs for small systems that can use 16 bit pointers for better memory efficiency, and these might also be able to compact their heaps. Probably many potential VMs that can compact the heap. These might better support general strategies for memory management, but at the cost of memory compaction pauses. Overall it seems easiest for esp-open-rtos apps to pre-allocate large buffers to avoid fragmentation so the heap is largely a 'bin' of small objects, and to add fragmentation failure detection with support for graceful restarts. |
Thanks for the description.
Actually seems the code is just allocating chunks of ram, in dependence
from the size requested. I have to say that I don't have found the code for
the _malloc_region_masked() and I guessed is only doing a consistency check.
Despite what could seem, a lot of papers describe the best-fit algorithm as
one of the bests in fragmentation reductions, at least vs the first fit,
that is one of the worst (by the same measure), also if often implemented.
Obviously we have to take in consideration that the allocation process in
an embedded system is far away from a 'random' process...
Anyway, I'm not saying that the actual code is awful or anything else, but
I would like to point aout that in a system with suche a kind of
constrains, in terms of RAM, the 'right' algorithm and tool could sometime
save the day... Anyway we are lucky: with an open system anyone can
implement what thinks is better!
I'm fully aware that the RAM compaction is not possible, and this is the
first reason to avoid fo fragment the heap in the allocation process. By
the way I don't agree with the idea that the system 'restart' out of the
control of the application, except in extreme situations.
…--On March 31, 2018 at 11:40:56 PM +0000 Our Air Quality
<[email protected]> wrote:
The current code implements sbrk. The wip changes to support multiple
malloc regions makes sbrk a nop and there is no sbrk reservation to
allocate, and all the memory is added to the malloc free list. Some of
the memory is allocated when available, for example the initial stack
area - it is just pushed onto the malloc free list for use.
The wip stores the regions to use in reent_ptr->malloc_region_mask. There
is a callback into the esp-open-rtos code to check if a free list address
is consistent with the region mask, see _malloc_region_masked(). This
could all be improved, better abstracted, etc. Not claiming it is pretty.
But an attempt has been made to set the higher level interface to this,
which is by way of a thread local preference.
There is a requirement to be able to allocate to either the dram or iram,
and it is desirable to be able to prefer one or the other. This
requirement is due to hardware requirements, and perhaps software design
decisions. The iram can not be used for some hardware buffers, and using
it for task stack memory might challenge the exception handlers etc, and
the iram supports only 32 bit accesses, the 8 and 16 bit accesses are
emulated in the exception handler, so there are also performance reasons
to allocate to dram versus iram.
Generally heuristic allocation strategies are not going to prevent
failure due to fragmentation, so detection of failure due to
fragmentation is still needed in general. Memory compaction is not
possible in C, not without effectively implementing a higher level memory
manager on top of C, and no higher level alternative seems to have
emerged. Thus simply restarting might be a practical strategy for these
devices. Whether fragmentation is a problem depends on allocation
patterns and the amount of free memory.
The 'best fit' strategy would appear to lead to fragmentation but placing
small objects next to larger objects and if these small objects are long
lived then they fragment the heap. It would probably be better to place
small objects close to other small objects, to use allocation bins. It
also to distributes temporally related allocations over the heap. These
are just heuristics and there are endless possibilities and problems.
Off topic, but this is an area that interests me. There are for example
some JS VMs for small systems that can use 16 bit pointers for better
memory efficiency, and these might also be able to compact their heaps.
Probably many potential VMs that can compact the heap. These might better
support general strategies for memory management, but at the cost of
memory compaction pauses.
Overall it seems easiest for esp-open-rtos apps to pre-allocate large
buffers to avoid fragmentation so the heap is largely a 'bin' of small
objects, and to add fragmentation failure detection with support for
graceful restarts.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
[Image: "/"]
|
If you want to add 'best fit' then I would be happy to add that as a newlib option in my own fork, but as noted it would need to support allocation to the different regions, and it should not require any changes on the esp-open-rtos code side. The One other issue that needs to be explored is a memory-pressure event. There are various resources in lwip that can be freed under pressure. There are hooks for pool-pressure events, but we do not use pools so these do not all work. If a fragmentation failure is hit then perhaps a memory-pressure event could free up some space. Application code might be able to respond to memory pressure events too. This might give a 'restart' like benefit, allowing some subsystems to free resources and restart to clear fragmentation. It might not be necessary to restart without the control of the system. It was suggested to 'flag' a restart as necessary, and then leave it to the application to restart in an orderly manner, rather than have malloc trigger a restart like a wdt event. For example: in my own code there is some core code doing data gathering and storage using mostly pre-allocated resources, and there is robust networking code that expects failures so if the networking paths fail due to fragmentation then the core could schedule a restart in an orderly manner. If failure due to fragmentation is not acceptable then an application needs to manage memory well. |
If could help anyone, I'll be happy to share my code.
By the way, no way I can found the _malloc_region_masked... not by deep
look into the file you said, nor searching by grep..
Probably I'm missing something...
Anyway, I'll try to be compliant to the constrains you wrote. I just need
to better undertand what means 'support allocation to the different
regions'.
I think this means 'by application choice', but I don't have understood how
the choice is made now...
|
Might you be using the esp-open-rtos master branch rather than the malloc-regions branch which has has the wip support for multiple regions, see #584 The newlib source code, and the lwip source code, is now out of sync with the esp-open-rtos master branch, but it should not have even compiled without your changes if these were not in sync. This might have been frustrating your efforts, sorry. Another possible frustration is that I rebase PRs, to keep the final PR clean, and it might take a few extra steps to checkout the latest version. |
Hi,
I had a problem related to memory fragmentation, that stop the system working after some runs.
To better manage it I have added a 'best-fit' algorithm to nano_malloc (I have forked newlib from the great ourquality!).
I have good results actually, but is still an 'alpha'...
To work properly requires also some slight changes in newlib_syscall.c
Thank you for your work and let me know if you need some inside to changes,
The text was updated successfully, but these errors were encountered: