Backend Rust/Foreign Language Support #69

julihoh · 2021-07-29T09:43:20Z

Hey,

I came up with the code in this PR in order to be able to implement new SymCC/SymQEMU backends in Rust for my GSoC project (which is about integrating LibAFL with SymCC as a concolic engine for fuzzing).

In a nutshell, it implements a new backend in SymCC which is supposed to wrap a backend implemented in another language. This wrapper implements some utility functionality and garbage collection. How this is achieved is described in the README in runtime/rust_backend/README. The wrapped runtime in turn can implement a reduced interface of the 'original' runtime and can also keep the libc wrappers.

I have put quite a bit of effort into making this code strictly additive to the current codebase of SymCC, which should make it easy to maintain. Also note, that while it may look like a lot of code, a lot of it is quite boilerplate-y and more or less a copy of the SimpleRuntime. I propose to merge this into SymCC purely for organisational reasons (ie. not having to sync the fork with SymCC upstream).

I think the code could benefit SymCC/SymQEMU and related research by making it much easier to build a Runtime that is not C++, but still keeps the libc wrappers and garbage collection feature from the common SymCC runtime code. Note, that interfacing with the GC feature of SymCC more or less requires interoperability with C++, making an implementation in foreign languages quite cumbersome. The common backend code also makes the assumption that the backend will always keep track of the bit-width of any expression, which proposed code also implements transparently for the wrapped runtime.

Cheers

…mon runtime code

this is due to linking issues with rust

instead of a dynamic library

aurelf · 2021-08-07T15:22:12Z

Hi,
thanks for the PR. I'm going to look at this. Looks interesting.
To start, 2 comments:

Can you add a target to the Dockerfile to compile and run the tests on this backend? You can take inspiration from the simple backend again here.
Can you mention in the readme how this is intended to be used with LibAFL ? SymCC would be an executor ? Is there a minimal example that actually uses this Rust backend ?

Thanks.

julihoh · 2021-08-08T14:53:39Z

Thanks for taking a look :)

Can you add a target to the Dockerfile to compile and run the tests on this backend? You can take inspiration from the simple backend again here.

This will be difficult, as the 'rust backend' code added in this repository delegates the solving, etc. to the backend that is implemented in a foreign language (and for which no 'canonical' version exists at this point). I presume most tests are irrelevant in this case. Note, that the 'rust backend' in itself is also not usable on it own, because it compiles to a static archive instead of shared library, so it is not directly compatible with the interface that SymCC (and SymQEMU) expect. The symcc_runtime crate in the LibAFL repo takes care of giving the user an actual interface for implementing the runtime and enables the production of a shared library that can be used as a SymCC backend. (This is achieved by linking against the static archive that is generated by building the rust backend in this repo, re-exporting the standard the SymCC interface from it and implementing the foreign symbols). Finally, there is a smoke test for this whole process in the LibAFL repo.

Can you mention in the readme how this is intended to be used with LibAFL ?

I'm writing up some more extensive documentation for LibAFL users in the coming days (this is the basically the last missing piece for this whole project :) )

SymCC would be an executor ?

Exactly. SymQEMU is supported as well, because they use the same interface. Garbage Collection, which, AFAIU is required for the massive amounts of expressions that the SymQEMU fronted produces, is implemented as part of this PR.

Is there a minimal example that actually uses this Rust backend ?

Yes. Here is a the runtime used in the aforementioned smoke test, which traces the calls made to the backend for processing outside of the traced target, and here is a backend that is used as part of an example hybrid fuzzer based on LibAFL.

Note that, the runtimes that I linked to make use of the pre-built components that come with the symcc_runtime crate. This really only makes sense when reading the documentation for that crate. Unfortunately, that package is not published to crates.io yet, and therefore no publicly accessible HTML version of the documentation for this crate exists yet. You can of course read the documentation straight out of the source, but this is of course not ideal. Alternatively, you could clone LibAFL and build the documentation using cargo doc -p symcc_runtime --openor just wait until symcc_runtime is published and read the docs at docs.rs.

Cheers

sebastianpoeplau

That's a very interesting project! (Apologies for the late reaction...) I've added a few suggestions inline.

Just out of curiosity: Why did you decide to do target -> SymCC runtime -> Rust code rather than target -> Rust code -> parts of SymCC runtime used as helpers? I'm not suggesting that one is better than the other, just curious :)

In principle, I have no objections against a backend that delegates to foreign-language implementations. But I agree with @aurelf that it would be nice to have a way to make sure we don't break implementations, other than receiving bug reports from after the fact :P Do you think it's feasible to express the simple backend in terms of your generic backend as I suggested in the comments? That would give us a nice way to make sure we don't break anything...

sebastianpoeplau · 2021-08-22T16:21:45Z

runtime/CMakeLists.txt

@@ -35,7 +36,9 @@ set(SHARED_RUNTIME_SOURCES
  ${CMAKE_CURRENT_SOURCE_DIR}/Shadow.cpp
  ${CMAKE_CURRENT_SOURCE_DIR}/GarbageCollection.cpp)

-if (${QSYM_BACKEND})
+if (${RUST_BACKEND})


Minor: I think it would make sense to rename the backend in a way that prevents confusion with a possible future backend for analyzing Rust programs (as opposed to your approach of enabling a backend implemented in Rust).

sebastianpoeplau · 2021-08-22T16:23:58Z

runtime/rust_backend/Runtime.cpp

+#endif
+
+/// The set of all expressions we have ever passed to client code.
+std::set<SymExpr> allocatedExpressions;


It would be very nice if we could refactor the parts that are the same in the simple backend and your new backend into a shared header or something of the sort, in order to avoid code duplication. We could even go as far as expressing the simple backend as an implementation of your interface - which would nicely solve @aurelf's demand for a testable version of the new backend.

julihoh · 2021-09-13T09:34:45Z

Just a quick note: I haven't forgotten about this, just very busy at this time. I'll come back to this as soon as possible!

julihoh · 2021-09-20T11:09:45Z

Just out of curiosity: Why did you decide to do target -> SymCC runtime -> Rust code rather than target -> Rust code -> parts of SymCC runtime used as helpers? I'm not suggesting that one is better than the other, just curious :)

That's a quite curious design decision indeed. It follows from the following requirements:

I (or we, as in "in collaboration with @domenuk and @andreafioraldi") wanted a solution that could be built entirely from cargo. To be specific, I didn't want to have to do a CMake build to build my otherwise rust-based backend.
Rust/cargo can't re-export symbols from external libraries.

It is possible to link the rust parts of the backend into the final shared object using (Corrosion)
[https://github.com/AndrewGaspar/corrosion]. This would allow the scenario as you describe. Note that, if we want to interface with SymCC's GC, we need some C++ (because it deals in std types). However, this violates requirement 1. (I know that this all works, because this is how I had implemented it initially).

If we now want to invert the dependency between CMake and Cargo (ie. call cmake from cargo), then we can't have cargo call into cmake to do the entire build. Instead, we need to build a static library out of the code that we want from the SymCC runtime and link that into the final shared object, which is now built and linked by cargo. This is where requirement 2. comes in. Since we can't simply re-export the symbols from the SymCC/C++ part of the backend, we need to define the required symbols in our rust crate. Therefore, in the symcc_runtime crate, we 1. gather exported symbols from the C++ runtime using bindgen, 2. generate a header that renames those symbols according to a known pattern and finally compiles the C++ runtime with this renaming header included, therefore producing a static library with the required code, but renamed symbols. Then, in rust, we generate the necessary symbol definitions and forward calls to the C++ runtime with renamed symbols. The renaming is necessary, because, again, we can't re-export external symbols. This technique basically works around requirement 2. Then, to be able to interface with SymCC's GC (and to implement _sym_bits_helper for convenience), we have the C++ part of this PR, which does exactly those two things: interfaces with the GC and implements _sym_bits_helper. The GC interface to the rust runtime is then FFI compatible (ie. does not use C++'s std::set).

I'm kind of annoyed that I can't seem to find a more concise explanation, but the issues are kind of intertwined in a weird way. Hope it is understandable. Also, another goal was to modify as little existing code as possible from SymCC as to be able to easily maintain a fork in case these changes will not be merged.

In principle, I have no objections against a backend that delegates to foreign-language implementations. But I agree with @aurelf that it would be nice to have a way to make sure we don't break implementations, other than receiving bug reports from after the fact :P Do you think it's feasible to express the simple backend in terms of your generic backend as I suggested in the comments? That would give us a nice way to make sure we don't break anything...

The way that the C++ part of the backend (specifically the implementation of _sym_bits_helper) is implemented at the moment means that 32-bit targets can't be supported. (In short: the bit-width of all expressions is stored in the least significant byte of the "SymExpr" (ie. pointer-sized) type. On 32-bit, I believe this leaves too few bits for the actual address.).
Other than that, I see no technical reason why the simple backend couldn't be implemented this way. However, I personally think that is would make the simple backend kind of convoluted. A solution could be potentially to leave the simple backend as it is and re-implement the simple backend (ie. create a second simple backend) in terms of the rust interface.

To come back to the issue of testability: In general, the code in this PR doesn't implement a backend, as discussed before. What it does implement, however, is this whole forwarding business. Therefore my suggestion would be to test the forwarding only decoupled from the actual backend logic. For example, a 'tracing' backend could be implemented that simply outputs the calls that were made to it in text format to stdout. A test script could then ensure the correct sequence of calls were made to the backend. In fact, this is basically what the tests inside the LibAFL repo do at this point: https://github.com/AFLplusplus/LibAFL/tree/main/libafl_concolic/test .

Of course, in my humble opinion, the current C++ parts of the backend (not the LLVM pass of course) should simply be implemented in Rust, providing a FFI compatible interface like the one in the symcc_runtime crate. But that's for another time ;)

…symcc fuzzing helper (#1)

* This is a temporary fix due to std::iterator depercation. This commit needs to be reverted once a proper fix is in place. * symcc_fuzzing_helper: Move to clap3 (eurecom-s3#94) * Revert "symcc_fuzzing_helper: Move to clap3 (eurecom-s3#94)" (eurecom-s3#101) This reverts commit 88b464c. * Add some FAQs to the Readme * changed from structopt to clap 3 (eurecom-s3#103) * fix for issue eurecom-s3#108 * fix for issue eurecom-s3#108 * LLVM 12 works without changes * Add a clang-format configuration This is just the output of "clang-format -style=llvm -dump-config". * Add support for LLVM 13 Clang now uses the new pass manager for the optimization pipeline, so we have to do the same to make Clang use our pass. Moreover, FileCheck now complains if a configured prefix doesn't appear in the checked file; added "ANY" in three tests where it was missing. Finally, printing arbitrary-precision integers in QSYM needed some changes. * Add support for LLVM 14 * LLVM 15 works without changes * fix issue eurecom-s3#109 * Run clang-format We should really automate this... * Add a GitHub action that checks LLVM compatibility * Prevent test failures in case of reordered solver output Z3 doesn't always output model constants in the same order; make sure that our tests don't depend on it. * Accept symbolic input from memory This commit adds the option to mark symbolic input by calling symcc_make_symbolic from the program under test. The refactoring that was required to add the new feature has had the pleasant side effect that the QSYM backend now doesn't require the entire input upfront anymore, making it much more convenient to feed symbolic data through stdin. * Run GitHub actions for pull requests only No need for "push": the "pull_request" event already triggers when new commits are pushed to the PR branch, and we expect all changes to go through a PR. Co-authored-by: Aurelien Francillon <[email protected]> Co-authored-by: Dominik Maier <[email protected]> Co-authored-by: aurelf <[email protected]> Co-authored-by: Dominik Maier <[email protected]> Co-authored-by: Emilio Coppa <[email protected]> Co-authored-by: Sebastian Poeplau <[email protected]>

* actually fix interface * more fixes

Merge Upstream

Adding more functions

Remove extern block from RustRuntime.h

Change to C

Revert #10 #11

* push * add * FMT * f * bits

julihoh and others added 16 commits July 1, 2021 13:37

add new runtime that compiles to static library and contains only com…

58ec03e

…mon runtime code

add the common only runtime library to installation output of cmake

d23f876

make cmake call cargo instead of cargo calling cmake

3968841

this is due to linking issues with rust

common backend is now the rust backend

1613c78

delegate call stack tracing to rust runtime

b67b70a

move panic mode configuration to cmake script

f1c6b0e

re-implement rust backend to support GC and _sym_bits_helper

e243f9a

pass gc information as array instead of of single values

1346fca

Merge branch 'eurecom-s3:master' into main

8840fda

fix missing include

2b62059

switch to building the rust backend into a static archive

ec05367

instead of a dynamic library

fix rust runtime header

ba0aeac

use more convenient types for rust runtime

1a1bf95

cleanup

8b35ef8

fix RuntimeCommon.h includes

f44262f

fix c+p bug in rust runtime (inadverntently turgnin trunc in zext)

45cde02

aurelf added the enhancement New feature or request label Aug 7, 2021

sebastianpoeplau reviewed Aug 22, 2021

View reviewed changes

domenukk and others added 8 commits January 4, 2022 14:36

Move to clap3 (#2)

3133c0b

Adds handling for afl-showmap failures, to avoid crashing the entire …

08c29c5

…symcc fuzzing helper (#1)

Merge remote-tracking branch 'eurecom/master'

8f87bba

more less bugs

5cccc33

adapt rust runtime to api changes from upstream (#4)

76d4e26

Follow up #4 (#5)

2a3229d

* actually fix interface * more fixes

Fix naming for afl++

a42e95e

tokatoka added 18 commits October 20, 2023 15:59

merge

379061f

Update: qsym_backend

bee13da

Merge pull request #7 from AFLplusplus/eurecom-s3-master

6e1a055

Merge Upstream

add

5cb76f1

Merge pull request #8 from AFLplusplus/upd

6909c3f

Adding more functions

Merge branch 'eurecom-s3:master' into main

d3870f3

remove extern block

fa54463

endif

7caf6aa

FMT

27734ff

Merge pull request #10 from AFLplusplus/no_extern

019a226

Remove extern block from RustRuntime.h

include

5db9e6b

boolean?

1e8f02b

fmt

950ab01

Merge pull request #11 from AFLplusplus/cpp_to_c

2d16373

Change to C

revert

4898f5b

Merge pull request #12 from AFLplusplus/revert

6010402

Revert #10 #11

Merge branch 'eurecom-s3:master' into main

f33f679

Update rust backend (#13)

1330e29

* push * add * FMT * f * bits

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backend Rust/Foreign Language Support #69

Backend Rust/Foreign Language Support #69

julihoh commented Jul 29, 2021

aurelf commented Aug 7, 2021

julihoh commented Aug 8, 2021

sebastianpoeplau left a comment

sebastianpoeplau Aug 22, 2021

sebastianpoeplau Aug 22, 2021

julihoh commented Sep 13, 2021

julihoh commented Sep 20, 2021

Backend Rust/Foreign Language Support #69

Are you sure you want to change the base?

Backend Rust/Foreign Language Support #69

Conversation

julihoh commented Jul 29, 2021

aurelf commented Aug 7, 2021

julihoh commented Aug 8, 2021

sebastianpoeplau left a comment

Choose a reason for hiding this comment

sebastianpoeplau Aug 22, 2021

Choose a reason for hiding this comment

sebastianpoeplau Aug 22, 2021

Choose a reason for hiding this comment

julihoh commented Sep 13, 2021

julihoh commented Sep 20, 2021