Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary memory allocation limit checks and rename max-allocation related functions #1137

Merged

Conversation

micahsnyder
Copy link
Contributor

The goal of the ClamAV's functions like "cli_malloc" and "cli_realloc" are to check that the size of the allocation is not unreasonable. But we only need to do that check when the size is derived from untrusted user input (aka may vary based on values in the possibly malicious file that we're scanning).

There are many calls to allocate memory where the size of the allocation is either known at compile time or else is derived from trusted user input. Examples of variable but trusted allocation sizes include:

  • the number of parts of a signature
  • or the number of blocks in a bytecode function.

This PR seeks to eliminate those unnecessary max-allocation checks. This is mostly because checking the limits in these cases is a waste of CPU. But it also sets groundwork for a future improvement where we may pass in the memory allocation limit as a parameter, and make it generally configurable for the scanning service. This would be meaningful for anyone wishing to crank up the max-filesize to scan large archives that decompress to very large amounts of data.

Further, this PR renames the max-allocation functions and related functions and macros so it's obvious what they do. I suspect that the reason the likes of "cli_calloc" and "cli_realloc" were so widely used for fixed size allocations is that the developers knew we had a special wrapper for allocating memory and didn't know when they should be used.

What's also drawing attention to is the functions that wrap allocations but do NOT check the allocation-limit. These are cli_strdup, cli_realloc, and cli_realloc2. These functions are safer than the original strdup and realloc functions from the C "standard library".

  • For strdup, our wrapper adds a NULL check to prevent accidental NULL-deref crashes.
  • For realloc, the "standard library" one is not standard. Some variations may free the pointer if size == 0. Some do not. Ours wrapper does not. In this way, we know the behavior will be consistent. Our version also has two variants. The first, cli_realloc does not free the pointer on failure. The second, cli_realloc2, does free the pointer on failure. That second behavior is generally less desirable because you must remember to set the pointer to NULL before goto done; or else you risk a double-free. I have not removed the latter variant simply because it's a lot of work, so there will still be two variants. In this PR, I improve the documentation and I rename those two so you can easily tell the difference.

Also, for both the strdup and realloc wrappers, we now have a "cli_safer_" prefix on them when they do not check the the allocation limit. When we do check the allocation limit, we instead use a "cli_max_" prefix.

Finally, for the macros which wrap the allocation-related functions, I've added a "_OR_GOTO_DONE" suffix to the macro names. This makes it clear what is special about the macros, and that error handling is built int. If the allocation fails, then we goto done;.

In summary, the functions and macros are now:

void *cli_max_malloc(size_t nmemb);
void *cli_max_calloc(size_t nmemb, size_t size);
void *cli_max_realloc(void *ptr, size_t size);
void *cli_max_realloc_or_free(void *ptr, size_t size);
void *cli_safer_realloc(void *ptr, size_t size);
void *cli_safer_realloc_or_free(void *ptr, size_t size);
char *cli_safer_strdup(const char *s);

#define CLI_SAFER_STRDUP_OR_GOTO_DONE(buf, var, ...) \
...
#define CLI_FREE_AND_SET_NULL(var) \
...
#define CLI_MALLOC_OR_GOTO_DONE(var, size, ...) \
...
#define CLI_MAX_MALLOC_OR_GOTO_DONE(var, size, ...) \
...
#define CLI_CALLOC_OR_GOTO_DONE(var, nmemb, size, ...) \
...
#define CLI_MAX_CALLOC_OR_GOTO_DONE(var, nmemb, size, ...) \
...
#define CLI_VERIFY_POINTER_OR_GOTO_DONE(ptr, ...) \
...
#define CLI_MAX_REALLOC_OR_GOTO_DONE(ptr, size, ...) \
...
#define CLI_SAFER_REALLOC_OR_GOTO_DONE(ptr, size, ...) \
...

libclamav/bytecode_api.c Fixed Show fixed Hide fixed
libclamav/hashtab.c Fixed Show fixed Hide fixed
libclamav/nsis/bzlib.c Fixed Show fixed Hide fixed
libclamav/pe_icons.c Fixed Show fixed Hide fixed
libclamav/pe_icons.c Fixed Show fixed Hide fixed
libclamav/pe_icons.c Fixed Show fixed Hide fixed
@micahsnyder micahsnyder force-pushed the remove-unnecessary-cli_-allocations branch from c4eac44 to 81449c5 Compare January 10, 2024 15:51
@micahsnyder micahsnyder force-pushed the remove-unnecessary-cli_-allocations branch from 81449c5 to 1d09996 Compare February 28, 2024 21:02
clamonacc/inotif/inotif.c Show resolved Hide resolved
clamonacc/inotif/inotif.c Show resolved Hide resolved
clamonacc/inotif/inotif.c Show resolved Hide resolved
@micahsnyder micahsnyder requested a review from ragusaa March 12, 2024 16:22
There are a large number of allocations for fix sized buffers using the
`cli_malloc` and `cli_calloc` calls that check if the requested size is
larger than our allocation threshold for allocations based on untrusted
input. These allocations will *always* be higher than the threshold, so
the extra stack frame and check for these calls is a waste of CPU.

This commit replaces needless calls with A -> B:
- cli_malloc -> malloc
- cli_calloc -> calloc
- CLI_MALLOC -> MALLOC
- CLI_CALLOC -> CALLOC

I also noticed that our MPOOL_MALLOC / MPOOL_CALLOC are not limited by
the max-allocation threshold, when MMAP is found/enabled. But the
alternative was set to cli_malloc / cli_calloc when disabled. I changed
those as well.

I didn't change the cli_realloc/2 calls because our version of realloc
not only implements a threshold but also stabilizes the undefined
behavior in realloc to protect against accidental double-free's.
It may be worth implementing a cli_realloc that doesn't have the
threshold built-in, however, so as to allow reallocaitons for things
like buffers for loading signatures, which aren't subject to the same
concern as allocations for scanning possible malware.

There was one case in mbox.c where I changed MALLOC -> CLI_MALLOC,
because it appears to be allocating based on untrusted input.
We have some special functions to wrap malloc, calloc, and realloc to
make sure we don't allocate more than some limit, similar to the
max-filesize and max-scansize limits. Our wrappers are really only
needed when allocating memory for scans based on untrusted user input,
where a scan file could have bytes that claim you need to allocate
some ridiculous amount of memory. Right now they're named:
- cli_malloc
- cli_calloc
- cli_realloc
- cli_realloc2

... and these names do not convey their purpose

This commit renames them to:
- cli_max_malloc
- cli_max_calloc
- cli_max_realloc
- cli_max_realloc2

The realloc ones also have an additional feature in that they will not
free your pointer if you try to realloc to 0 bytes. Freeing the memory
is undefined by the C spec, and only done with some realloc
implementations, so this stabilizes on the behavior of not doing that,
which should prevent accidental double-free's.

So for the case where you may want to realloc and do not need to have a
maximum, this commit adds the following functions:
- cli_safer_realloc
- cli_safer_realloc2

These are used for the MPOOL_REALLOC and MPOOL_REALLOC2 macros when
MPOOL is disabled (e.g. because mmap-support is not found), so as to
match the behavior in the mpool_realloc/2 functions that do not make use
of the allocation-limit.
Should be safe to allocate a smaller string than one we already have.
The cli_max_malloc, cli_max_calloc, and cli_max_realloc functions
provide a way to protect against allocating too much memory
when the size of the allocation is derived from the untrusted input.
Specifically, we worry about values in the file being scanned being
manipulated to exhaust the RAM and crash the application.

There is no need to check the limits if the size of the allocation
is fixed, or if the size of the allocation is necessary for signature
loading, or the general operation of the applications.
E.g. checking the max-allocation limit for the size of a hash, or
for the size of the scan recursion stack, is a complete waste of
time.

Although we significantly increased the max-allocation limit in
a recent release, it is best not to check an allocation if the
allocation will be safe. It would be a waste of time.

I am also hopeful that if we can reduce the number allocations
that require a limit-check to those that require it for the safe
scan of a file, then eventually we can store the limit in the scan-
context, and make it configurable.
Some sort of code merge way-back-when resulted in two identical
max-allocation checks. I removed the noisy ones.
A code merge resulted in a duplicate copy of the CLI_STRDUP macro.
Also fixed formatting.
Bytecode signature's are able to allocate buffers, but should probably
adhere to clamav's max allocation limit.  This adds a check to make sure
they don't accidentally alloc too much based on untrusted user input.
Allocations for bytecode signatures to work need not check against the
memory allocation limit, as bytecode signatures are considered trusted
user input.

You may note that I did not remove allocation limits from the bytecode
API functions that may be called by the signatures such as adding json
objects, hashsets, lzma and bz2 decompressors, etc. This is because it
is likely that a bytecode signature may call them more times based on
the structure of the file being scanned - particularly for the json objects.
Variables like the number of signature parts are considered trusted user input
and so allocations based on those values need not check the memory allocation
limit.

Specifically for the allocation of the normalized buffer in cli_scanscript,
I determined that the size of SCANBUFF is fixed and so safe, and the maxpatlen
comes from the signature load and is therefore also trusted, so we do not
need to check the allocation limit.
We add the _OR_GOTO_DONE suffix to the macros that go to done if the
allocation fails. This makes it obvious what is different about the
macro versus the equivalent function, and that error handling is
built-in.

Renamed the cli_strdup to safer_strdup to make it obvious that it exists
because it is safer than regular strdup. Regular strdup doesn't have the
NULL check before trying to dup, and so may result in a NULL-deref
crash.

Also remove unused STRDUP (_OR_GOTO_DONE) macro, since the one with the
NULL-check is preferred.
@micahsnyder micahsnyder force-pushed the remove-unnecessary-cli_-allocations branch from 1d09996 to b3c9a56 Compare March 15, 2024 14:39
@micahsnyder micahsnyder merged commit 1e5ddef into Cisco-Talos:main Mar 15, 2024
22 of 24 checks passed
@micahsnyder micahsnyder deleted the remove-unnecessary-cli_-allocations branch March 15, 2024 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants