Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
implements better support for cross memory disassembling
TL;DR; jumps that cross segments and section boundaries are now treated more thoroughly so that if a jump instruction leads to an invalid execution chain in some other segment, the we will cancel both the chain in the other segment and the chain that led to that jump in the current segment (before it was canceled up to the boundaries of its own segment). Partially fixes BinaryAnalysisPlatform#1133, however since it is Thumb 2.0 binary for BAP it is still mostly random data than something meaningful. Problem ------- Since 2.0 we have the incremental disassembler that supports cross-sectional/cross-segmential jumps. As BinaryAnalysisPlatform#1133 shows sometimes they can go wrong as they were treated specially and had some preferences that regular intersectional jumps didn't have. One of the invariants of our disassembler is that there is no valid chain of execution that will hit the end of segment or data. In other words, that will force the CPU into the invalid instruction state. We allow conservative chains, so that the CPU can still hit an invalid instruction because of a conditional branch (in other words, we allow conditional branches to hit data). To preserve this invariant we maintain a tree of disassembling tasks, so that once we hit data, we can unroll the chain up to the root that started it (or the first conditional branch) and cancel everything in between marking it also as data. This invariant doesn't hold for jumps between sections as when we see a jump instruction that goes out of the current memory region we just assume that once we will get this other region of memory, it will be disassembled nicely. However, later when we actually get access to the memory region that contains the destination (our disassembler is incremental and applied per each chunk of memory as it is discovered) we may figure out that the chain starting from this address is invalid and cancel this chain. However, since we no longer have access to the disassembler state of the original memory region, we can't cancel the chain that led to that jump in the original memory region. Therefore later, when we build the whole program CFG we will start that chain and eventually hit data and end up with an exception. Solution -------- The solution is instead of discarding the task that breaches the segment boundaries we will accumulate it in a debt list, and every time we are handled with a new memory region we first try to payoff the debts. And if the task is now in the boundaries and we can prove that it hits data, then we cancel the whole chain that can now cross section boundaries. Caveats ------- The debt is a list of task and each task references its parent tasks, so in fact it is a tree of instructions covering the whole program. We are storing the debt list in the disassembler state which is saved on the hard drive and if the debt list is large (and since in binary format we can't preserve sharing) it can be quite large to store and to load. So far the assumption is that the debt list is either empty or very small after the project is fully disassembled. If this hypothesis will not turn true, we can either cancel all unpayed debt at the end of disassembling or just ignore it and do not store on the disk.
- Loading branch information