Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dlmalloc: new and faster allocator #54

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

mikrosk
Copy link
Member

@mikrosk mikrosk commented Mar 22, 2023

Doug Lea's allocator has been discovered by Eero Tamminen during our ScummVM profiling session. It is used in quite a few libc variants and most importantly: it's freaking fast! How much faster? Obviously with applications doing a lot of allocations, typically C++'s vector & list operations.

Few examples (% of CPU time spent in malloc/free before and after):

Basically, allocation/release overhead is now much closer to what you would see on typical PC. Me and Eero have tested it quite a lot, not to mention that it is a well known project.

I have only two points to raise:

  • should we be using sbrk or __sbrk in malloc.c ? Previous implementation used the latter.
  • is replacing end = (char*)(CALL_MORECORE(0)); (basically sbrk(0)) with end = br + asize; the right choice?

The allocator supports also non-contiguous memory model via sbrk (however it is also possible to use the contiguous one when setting _stksize properly). The trouble is that our sbrk doesn't support sbrk(0) in the non-contiguous model because it may be not clear what to return in such case. By default, such call shall return the current pointer where the next allocation would occur. I've been thinking that maybe we could fake it by storing the last allocated pointer + size (basically same as I do in malloc.c) and return that value in case of sbrk(0) is called.

But IMHO it is fine as it is now, and it is also faster. :)

The PR is split into two commits so it's easy to see what I have changed.

@mikrosk
Copy link
Member Author

mikrosk commented Mar 22, 2023

Btw another bonus is that memalign and posix_memalign now work as expected, in stark comparison to their previous implementation (with memory leaks...).

@th-otto
Copy link
Contributor

th-otto commented Mar 22, 2023

There are a few objections that i have.

  • it might be faster, but at the cost of substantially larger code size (1k vs 12k)
  • it has to implement some user-settable minimum chunk size to allocate from the OS (that MINCHUNK & MAXCHUNK variables). Otherwise you quickly run into the FOLDR100 problem.
  • The MINCHUNK & MAXCHUNK variables (as well as some other symbols) must not be removed, since that will give linking errors (yes i know they don't appear in public headers, but they are referenced in other packages)
  • reference to abort should be avoided. That pulls in fclose() and as as consequence, lots of other stuff.
  • The file should be split to one function per source file, as before. Very few programs will need posix_memalign

conclusion:
If you really need that, than it should be an alternative implementation, not completely replace the old implementation. So there should be a way to give the user a choice, maybe some #define before any system header files are included.

@mikrosk
Copy link
Member Author

mikrosk commented Mar 22, 2023

it might be faster, but at the cost of substantially larger code size (1k vs 12k)

Indeed. Perhaps this is partially caused by the one file implementation.

it has to implement some user-settable minimum chunk size to allocate from the OS (that MINCHUNK & MAXCHUNK variables). Otherwise you quickly run into the FOLDR100 problem.

What is this? I've grep'ed the original source, I don't see anything with such names in there?

reference to abort should be avoided. That pulls in fclose() and as as consequence, lots of other stuff.

Good catch!

The file should be split to one function per source file, as before. Very few programs will need posix_memalign

Is it even possible? The main functions (malloc, free, ...) use an awful lot of those macros and defines, let alone static functions. It looks quite error prone to dissect it (you know how it is, one forgotten static variable in a header file and you've got suddenly two copies instead of shared variable).

If you really need that, than it should be an alternative implementation, not completely replace the old implementation. So there should be a way to give the user a choice, maybe some #define before any system header files are included.

Wouldn't make more sense to move (if it's not already there) the current allocator to libcmini and let this one for "heavy duty" ? That would be more aligned with philosophy of both projects.

@mikrosk
Copy link
Member Author

mikrosk commented Mar 22, 2023

Ah, I see. You meant MINHUNK, not MINCHUNK. But I still don't get it, how is this used externally? It is a static variable in a C file after all? Or you meant that the function mallocChunkSize should be present? That's easy, dlmalloc provides a similar define (even with the same default size) so it's easy to export the same function from there as well.

Btw I see that mintlib's allocator is copied in libcmini so I still like the idea to make mintlib's one the heavier one.

@th-otto
Copy link
Contributor

th-otto commented Mar 22, 2023

Yes, sorry for the typo. MINHUNK, MAXHUNK & mallocChunkSizemust remain.

You have to look at the original code to understand what they do. Basically they are used to determine the size that is obtained using GEMDOS Malloc. In ORCS for example i set MINHUNK such that no more than about 1000 Malloc calls are made to the OS, even if all available memory is requested. IIIR some other program do similar things. Remember that plain TOS (including EmuTOS) fails to allocate anything after about 300 Malloc calls, if you don't run FOLDR100.PRG.

Is it even possible?

I only took a brief look so far, but of course that is possible. If there are some variables that are supposed to be static, but will then be referenced from several modules, just rename them with some unique prefix, including "__"

Wouldn't make more sense to move (if it's not already there) the current allocator to libcmini and let this one for "heavy >duty" ? That would be more aligned with philosophy of both projects.

Mintlib is already quite bloated, one of the reasons why users don't like using it. We should not make that worse.

Until now, we got quite well with the old implementation (ok, to be fair, maybe just because noone took the effort to profile it yet). But it can't be that bad, there are lots of other applications that allocate memory in small chunks. Your current need in ScummVM to allocate & free lots of blocks in each display frame seems rather special to me.

@mikrosk
Copy link
Member Author

mikrosk commented Mar 22, 2023

But it can't be that bad, there are lots of other applications that allocate memory in small chunks. Your current need in ScummVM to allocate & free lots of blocks in each display frame seems rather special to me.

It's true it is rather special for typical Atari usage (where the programmer is taught to be very careful about system resources) but I bet it is quite common in any C++ based application. Basically every app has some kind of (main)loop and if it's C++, there's a pretty good chance some vector or list is used. I wouldn't be surprised if zView, OpenTTD (and other SDL-based projects) or maybe even gcc/g++ would show significant speedup.

The trouble with the #ifdef approach (to select old and new malloc) is that I can't imagine how this would work with new and delete without changing the actual C++ source code (i.e. defining a custom allocator within new) ? Or is it possible to achieve this with some weak_alias hackery?

@th-otto
Copy link
Contributor

th-otto commented Mar 22, 2023

I wouldn't be surprised if zView, OpenTTD (and other SDL-based projects) or maybe even gcc/g++ would show significant speedup.

IIRC neither zView nor SDL use c++ code.

(i.e. defining a custom allocator within new) ? Or is it possible to achieve this with some weak_alias hackery?

Theoretically that should work, by putting a default allocator in the library. Might only be a problem with the link-order, since libc is later in the chain, so you may have to place it in libstdc++ instead, but you don't need to change the code otherwise.

Might also be worth to take a look at other projects. IIRC that allocator is already in use as an alternative rather than replacement.

@mikrosk
Copy link
Member Author

mikrosk commented Mar 22, 2023

zView - take a look at libfreetype :), SDL itself not but take a look at OpenTTD for instance: https://github.com/OpenTTD/OpenTTD/tree/master/src ... heaps of C++ code as well.

Oh man, that sounds pretty terrible. It's easier just to maintain a separate mintlib for my purposes, it's a question of one -L switch.

@th-otto
Copy link
Contributor

th-otto commented Mar 22, 2023

Well, its just my opinion. Aren't there any others?

@mikrosk
Copy link
Member Author

mikrosk commented Mar 22, 2023

Isn't possible to do something like (with defined __malloc and __dlmalloc functions in mintlib):

#ifdef USE_DLMALLOC
weak_alias(__dlmalloc, malloc)
#else
weak_alias(__malloc, malloc)
#endif

in some header file? So new would automatically pick whatever got aliased.

@th-otto
Copy link
Contributor

th-otto commented Mar 22, 2023

I don't think that this will work, because

  • new references operator new resp. operator new[], not malloc
  • malloc will only be defined in libc, which is later in the link chain, so operator new will be found in libcstd++ first
  • IIRC you can only define weak aliases for symbols already seen when compiling, but you cannot assume that c++ source include <stdlib.h>
  • generally you cannot assume that they directly include any mintlib headers at all, they might instead include etc.

That are things that have to be tried.

@xdelatour
Copy link
Contributor

Maybe I'm wrong but XaAES seems to be able to use different versions of memcpy/memset (xcclib) depending on a constant (the version number of GCC). Isn't possible to do similar thing with malloc/mfree?

@th-otto
Copy link
Contributor

th-otto commented Mar 23, 2023

That only seems to be a relict for old (2.95?) gcc versions, and the actual code is identical. Also, for XaAES things are quite different, because most standard library functions are redirected through the function table that the kernel exports.

@mikrosk mikrosk force-pushed the dlmalloc branch 2 times, most recently from 0c9e4f5 to 9ec6a67 Compare August 30, 2023 04:35
@mikrosk mikrosk force-pushed the dlmalloc branch 3 times, most recently from 14b7111 to f98b048 Compare September 28, 2023 18:31
@mikrosk mikrosk force-pushed the dlmalloc branch 5 times, most recently from 55f6a05 to 1fc9c13 Compare October 25, 2023 20:46
mikrosk added 2 commits March 10, 2024 22:18
This is not supported in mintlib. In fact, even sbrk(0) isn't.
@th-otto th-otto force-pushed the master branch 5 times, most recently from b9a4f13 to 519e232 Compare April 29, 2024 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants