-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dlmalloc: new and faster allocator #54
base: master
Are you sure you want to change the base?
Conversation
Btw another bonus is that |
There are a few objections that i have.
conclusion: |
Indeed. Perhaps this is partially caused by the one file implementation.
What is this? I've grep'ed the original source, I don't see anything with such names in there?
Good catch!
Is it even possible? The main functions (malloc, free, ...) use an awful lot of those macros and defines, let alone static functions. It looks quite error prone to dissect it (you know how it is, one forgotten static variable in a header file and you've got suddenly two copies instead of shared variable).
Wouldn't make more sense to move (if it's not already there) the current allocator to libcmini and let this one for "heavy duty" ? That would be more aligned with philosophy of both projects. |
Ah, I see. You meant MINHUNK, not MINCHUNK. But I still don't get it, how is this used externally? It is a static variable in a C file after all? Or you meant that the function Btw I see that mintlib's allocator is copied in libcmini so I still like the idea to make mintlib's one the heavier one. |
Yes, sorry for the typo. MINHUNK, MAXHUNK & mallocChunkSizemust remain. You have to look at the original code to understand what they do. Basically they are used to determine the size that is obtained using GEMDOS Malloc. In ORCS for example i set MINHUNK such that no more than about 1000 Malloc calls are made to the OS, even if all available memory is requested. IIIR some other program do similar things. Remember that plain TOS (including EmuTOS) fails to allocate anything after about 300 Malloc calls, if you don't run FOLDR100.PRG.
I only took a brief look so far, but of course that is possible. If there are some variables that are supposed to be static, but will then be referenced from several modules, just rename them with some unique prefix, including "__"
Mintlib is already quite bloated, one of the reasons why users don't like using it. We should not make that worse. Until now, we got quite well with the old implementation (ok, to be fair, maybe just because noone took the effort to profile it yet). But it can't be that bad, there are lots of other applications that allocate memory in small chunks. Your current need in ScummVM to allocate & free lots of blocks in each display frame seems rather special to me. |
It's true it is rather special for typical Atari usage (where the programmer is taught to be very careful about system resources) but I bet it is quite common in any C++ based application. Basically every app has some kind of (main)loop and if it's C++, there's a pretty good chance some The trouble with the #ifdef approach (to select old and new malloc) is that I can't imagine how this would work with |
IIRC neither zView nor SDL use c++ code.
Theoretically that should work, by putting a default allocator in the library. Might only be a problem with the link-order, since libc is later in the chain, so you may have to place it in libstdc++ instead, but you don't need to change the code otherwise. Might also be worth to take a look at other projects. IIRC that allocator is already in use as an alternative rather than replacement. |
zView - take a look at libfreetype :), SDL itself not but take a look at OpenTTD for instance: https://github.com/OpenTTD/OpenTTD/tree/master/src ... heaps of C++ code as well. Oh man, that sounds pretty terrible. It's easier just to maintain a separate mintlib for my purposes, it's a question of one |
Well, its just my opinion. Aren't there any others? |
Isn't possible to do something like (with defined
in some header file? So |
I don't think that this will work, because
That are things that have to be tried. |
Maybe I'm wrong but XaAES seems to be able to use different versions of memcpy/memset (xcclib) depending on a constant (the version number of GCC). Isn't possible to do similar thing with malloc/mfree? |
That only seems to be a relict for old (2.95?) gcc versions, and the actual code is identical. Also, for XaAES things are quite different, because most standard library functions are redirected through the function table that the kernel exports. |
0c9e4f5
to
9ec6a67
Compare
14b7111
to
f98b048
Compare
55f6a05
to
1fc9c13
Compare
This is not supported in mintlib. In fact, even sbrk(0) isn't.
b9a4f13
to
519e232
Compare
Doug Lea's allocator has been discovered by Eero Tamminen during our ScummVM profiling session. It is used in quite a few libc variants and most importantly: it's freaking fast! How much faster? Obviously with applications doing a lot of allocations, typically C++'s vector & list operations.
Few examples (% of CPU time spent in malloc/free before and after):
Basically, allocation/release overhead is now much closer to what you would see on typical PC. Me and Eero have tested it quite a lot, not to mention that it is a well known project.
I have only two points to raise:
sbrk
or__sbrk
inmalloc.c
? Previous implementation used the latter.end = (char*)(CALL_MORECORE(0));
(basicallysbrk(0)
) withend = br + asize;
the right choice?The allocator supports also non-contiguous memory model via
sbrk
(however it is also possible to use the contiguous one when setting_stksize
properly). The trouble is that oursbrk
doesn't supportsbrk(0)
in the non-contiguous model because it may be not clear what to return in such case. By default, such call shall return the current pointer where the next allocation would occur. I've been thinking that maybe we could fake it by storing the last allocated pointer + size (basically same as I do inmalloc.c
) and return that value in case ofsbrk(0)
is called.But IMHO it is fine as it is now, and it is also faster. :)
The PR is split into two commits so it's easy to see what I have changed.