Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failure on elf32-littlearm #1133

Closed
ivg opened this issue Jun 16, 2020 · 1 comment · Fixed by #1134
Closed

failure on elf32-littlearm #1133

ivg opened this issue Jun 16, 2020 · 1 comment · Fixed by #1134
Labels

Comments

@ivg
Copy link
Member

ivg commented Jun 16, 2020

While this architecture is not supported by BAP it would be nice if we wouldn't fail on it. So far it looks like that we got confused by the semantics of the last instruction at the end of the block, which is a thumb instruction (in fact two thumb instructions) which have an interpretation in ARM also. We are currently investigating if we can make the disassembler more robust to random inputs (we should), but in general the full support will be provided by #1122

See fkie-cad/cwe_checker#61 for more details.

ivg added a commit to ivg/bap that referenced this issue Jun 16, 2020
TL;DR; jumps that cross segments and section boundaries are now
treated more thoroughly so that if a jump instruction leads to an
invalid execution chain in some other segment, the we will cancel both
the chain in the other segment and the chain that led to that jump in
the current segment (before it was canceled up to the boundaries of
its own segment).

Partially fixes BinaryAnalysisPlatform#1133, however since it is Thumb 2.0 binary for BAP it
is still mostly random data than something meaningful.

Problem
-------

Since 2.0 we have the incremental disassembler that supports
cross-sectional/cross-segmential jumps. As BinaryAnalysisPlatform#1133 shows sometimes they
can go wrong as they were treated specially and had some preferences
that regular intersectional jumps didn't have. One of the invariants
of our disassembler is that there is no valid chain of execution that
will hit the end of segment or data. In other words, that will force
the CPU into the invalid instruction state. We allow conservative
chains, so that the CPU can still hit an invalid instruction because
of a conditional branch (in other words, we allow conditional branches
to hit data). To preserve this invariant we maintain a tree of
disassembling tasks, so that once we hit data, we can unroll the chain
up to the root that started it (or the first conditional branch) and
cancel everything in between marking it also as data.

This invariant doesn't hold for jumps between sections as when we
see a jump instruction that goes out of the current memory region we
just assume that once we will get this other region of memory, it will
be disassembled nicely. However, later when we actually get access to
the memory region that contains the destination (our disassembler is
incremental and applied per each chunk of memory as it is discovered)
we may figure out that the chain starting from this address is
invalid and cancel this chain. However, since we no longer have access
to the disassembler state of the original memory region, we can't
cancel the chain that led to that jump in the original memory
region. Therefore later, when we build the whole program CFG we will
start that chain and eventually hit data and end up with an
exception.

Solution
--------

The solution is instead of discarding the task that breaches the
segment boundaries we will accumulate it in a debt list, and every
time we are handled with a new memory region we first try to payoff
the debts. And if the task is now in the boundaries and we can prove
that it hits data, then we cancel the whole chain that can now cross
section boundaries.

Caveats
-------

The debt is a list of task and each task references its parent tasks,
so in fact it is a tree of instructions covering the whole program. We
are storing the debt list in the disassembler state which is saved on
the hard drive and if the debt list is large (and since in binary
format we can't preserve sharing) it can be quite large to store and
to load. So far the assumption is that the debt list is either empty
or very small after the project is fully disassembled. If this
hypothesis will not turn true, we can either cancel all unpayed debt
at the end of disassembling or just ignore it and do not store on the
disk.
ivg added a commit to ivg/bap that referenced this issue Jun 17, 2020
TL;DR; jumps that cross segments and section boundaries are now
treated more thoroughly so that if a jump instruction leads to an
invalid execution chain in some other segment, the we will cancel both
the chain in the other segment and the chain that led to that jump in
the current segment (before it was canceled up to the boundaries of
its own segment).

Partially fixes BinaryAnalysisPlatform#1133, however since it is Thumb 2.0 binary for BAP it
is still mostly random data than something meaningful.

Problem
-------

Since 2.0 we have the incremental disassembler that supports
cross-sectional/cross-segmential jumps. As BinaryAnalysisPlatform#1133 shows sometimes they
can go wrong as they were treated specially and had some preferences
that regular intersectional jumps didn't have. One of the invariants
of our disassembler is that there is no valid chain of execution that
will hit the end of segment or data. In other words, that will force
the CPU into the invalid instruction state. We allow conservative
chains, so that the CPU can still hit an invalid instruction because
of a conditional branch (in other words, we allow conditional branches
to hit data). To preserve this invariant we maintain a tree of
disassembling tasks, so that once we hit data, we can unroll the chain
up to the root that started it (or the first conditional branch) and
cancel everything in between marking it also as data.

This invariant doesn't hold for jumps between sections as when we
see a jump instruction that goes out of the current memory region we
just assume that once we will get this other region of memory, it will
be disassembled nicely. However, later when we actually get access to
the memory region that contains the destination (our disassembler is
incremental and applied per each chunk of memory as it is discovered)
we may figure out that the chain starting from this address is
invalid and cancel this chain. However, since we no longer have access
to the disassembler state of the original memory region, we can't
cancel the chain that led to that jump in the original memory
region. Therefore later, when we build the whole program CFG we will
start that chain and eventually hit data and end up with an
exception.

Solution
--------

The solution is instead of discarding the task that breaches the
segment boundaries we will accumulate it in a debt list, and every
time we are handled with a new memory region we first try to payoff
the debts. And if the task is now in the boundaries and we can prove
that it hits data, then we cancel the whole chain that can now cross
section boundaries.

Caveats
-------

The debt is a list of task and each task references its parent tasks,
so in fact it is a tree of instructions covering the whole program. We
are storing the debt list in the disassembler state which is saved on
the hard drive and if the debt list is large (and since in binary
format we can't preserve sharing) it can be quite large to store and
to load. So far the assumption is that the debt list is either empty
or very small after the project is fully disassembled. If this
hypothesis will not turn true, we can either cancel all unpayed debt
at the end of disassembling or just ignore it and do not store on the
disk.
@ivg ivg closed this as completed in #1134 Jun 17, 2020
ivg added a commit that referenced this issue Jun 17, 2020
TL;DR; jumps that cross segments and section boundaries are now
treated more thoroughly so that if a jump instruction leads to an
invalid execution chain in some other segment, the we will cancel both
the chain in the other segment and the chain that led to that jump in
the current segment (before it was canceled up to the boundaries of
its own segment).

Partially fixes #1133, however since it is Thumb 2.0 binary for BAP it
is still mostly random data than something meaningful.

Problem
-------

Since 2.0 we have the incremental disassembler that supports
cross-sectional/cross-segmential jumps. As #1133 shows sometimes they
can go wrong as they were treated specially and had some preferences
that regular intersectional jumps didn't have. One of the invariants
of our disassembler is that there is no valid chain of execution that
will hit the end of segment or data. In other words, that will force
the CPU into the invalid instruction state. We allow conservative
chains, so that the CPU can still hit an invalid instruction because
of a conditional branch (in other words, we allow conditional branches
to hit data). To preserve this invariant we maintain a tree of
disassembling tasks, so that once we hit data, we can unroll the chain
up to the root that started it (or the first conditional branch) and
cancel everything in between marking it also as data.

This invariant doesn't hold for jumps between sections as when we
see a jump instruction that goes out of the current memory region we
just assume that once we will get this other region of memory, it will
be disassembled nicely. However, later when we actually get access to
the memory region that contains the destination (our disassembler is
incremental and applied per each chunk of memory as it is discovered)
we may figure out that the chain starting from this address is
invalid and cancel this chain. However, since we no longer have access
to the disassembler state of the original memory region, we can't
cancel the chain that led to that jump in the original memory
region. Therefore later, when we build the whole program CFG we will
start that chain and eventually hit data and end up with an
exception.

Solution
--------

The solution is instead of discarding the task that breaches the
segment boundaries we will accumulate it in a debt list, and every
time we are handled with a new memory region we first try to payoff
the debts. And if the task is now in the boundaries and we can prove
that it hits data, then we cancel the whole chain that can now cross
section boundaries.

Caveats
-------

The debt is a list of task and each task references its parent tasks,
so in fact it is a tree of instructions covering the whole program. We
are storing the debt list in the disassembler state which is saved on
the hard drive and if the debt list is large (and since in binary
format we can't preserve sharing) it can be quite large to store and
to load. So far the assumption is that the debt list is either empty
or very small after the project is fully disassembled. If this
hypothesis will not turn true, we can either cancel all unpayed debt
at the end of disassembling or just ignore it and do not store on the
disk.
@frakman1
Copy link

frakman1 commented Jul 6, 2020

I read in the overview that:

BAP supports x86, x86-64, ARM, MIPS, PowerPC

Is this not true for ARM and what exactly is elf32-littlearm (vs ARM)? I am unsure of the distinction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants