Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend Rust/Foreign Language Support #69

Open
wants to merge 42 commits into
base: master
Choose a base branch
from

Conversation

julihoh
Copy link

@julihoh julihoh commented Jul 29, 2021

Hey,

I came up with the code in this PR in order to be able to implement new SymCC/SymQEMU backends in Rust for my GSoC project (which is about integrating LibAFL with SymCC as a concolic engine for fuzzing).

In a nutshell, it implements a new backend in SymCC which is supposed to wrap a backend implemented in another language. This wrapper implements some utility functionality and garbage collection. How this is achieved is described in the README in runtime/rust_backend/README. The wrapped runtime in turn can implement a reduced interface of the 'original' runtime and can also keep the libc wrappers.

I have put quite a bit of effort into making this code strictly additive to the current codebase of SymCC, which should make it easy to maintain. Also note, that while it may look like a lot of code, a lot of it is quite boilerplate-y and more or less a copy of the SimpleRuntime. I propose to merge this into SymCC purely for organisational reasons (ie. not having to sync the fork with SymCC upstream).

I think the code could benefit SymCC/SymQEMU and related research by making it much easier to build a Runtime that is not C++, but still keeps the libc wrappers and garbage collection feature from the common SymCC runtime code. Note, that interfacing with the GC feature of SymCC more or less requires interoperability with C++, making an implementation in foreign languages quite cumbersome. The common backend code also makes the assumption that the backend will always keep track of the bit-width of any expression, which proposed code also implements transparently for the wrapped runtime.

Cheers

@aurelf
Copy link
Member

aurelf commented Aug 7, 2021

Hi,
thanks for the PR. I'm going to look at this. Looks interesting.
To start, 2 comments:

  • Can you add a target to the Dockerfile to compile and run the tests on this backend? You can take inspiration from the simple backend again here.
  • Can you mention in the readme how this is intended to be used with LibAFL ? SymCC would be an executor ? Is there a minimal example that actually uses this Rust backend ?

Thanks.

@aurelf aurelf added the enhancement New feature or request label Aug 7, 2021
@julihoh
Copy link
Author

julihoh commented Aug 8, 2021

Thanks for taking a look :)

  • Can you add a target to the Dockerfile to compile and run the tests on this backend? You can take inspiration from the simple backend again here.

This will be difficult, as the 'rust backend' code added in this repository delegates the solving, etc. to the backend that is implemented in a foreign language (and for which no 'canonical' version exists at this point). I presume most tests are irrelevant in this case. Note, that the 'rust backend' in itself is also not usable on it own, because it compiles to a static archive instead of shared library, so it is not directly compatible with the interface that SymCC (and SymQEMU) expect. The symcc_runtime crate in the LibAFL repo takes care of giving the user an actual interface for implementing the runtime and enables the production of a shared library that can be used as a SymCC backend. (This is achieved by linking against the static archive that is generated by building the rust backend in this repo, re-exporting the standard the SymCC interface from it and implementing the foreign symbols). Finally, there is a smoke test for this whole process in the LibAFL repo.

  • Can you mention in the readme how this is intended to be used with LibAFL ?

I'm writing up some more extensive documentation for LibAFL users in the coming days (this is the basically the last missing piece for this whole project :) )

  • SymCC would be an executor ?

Exactly. SymQEMU is supported as well, because they use the same interface. Garbage Collection, which, AFAIU is required for the massive amounts of expressions that the SymQEMU fronted produces, is implemented as part of this PR.

  • Is there a minimal example that actually uses this Rust backend ?

Yes. Here is a the runtime used in the aforementioned smoke test, which traces the calls made to the backend for processing outside of the traced target, and here is a backend that is used as part of an example hybrid fuzzer based on LibAFL.

Note that, the runtimes that I linked to make use of the pre-built components that come with the symcc_runtime crate. This really only makes sense when reading the documentation for that crate. Unfortunately, that package is not published to crates.io yet, and therefore no publicly accessible HTML version of the documentation for this crate exists yet. You can of course read the documentation straight out of the source, but this is of course not ideal. Alternatively, you could clone LibAFL and build the documentation using cargo doc -p symcc_runtime --openor just wait until symcc_runtime is published and read the docs at docs.rs.

Cheers

Copy link
Collaborator

@sebastianpoeplau sebastianpoeplau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very interesting project! (Apologies for the late reaction...) I've added a few suggestions inline.

Just out of curiosity: Why did you decide to do target -> SymCC runtime -> Rust code rather than target -> Rust code -> parts of SymCC runtime used as helpers? I'm not suggesting that one is better than the other, just curious :)

In principle, I have no objections against a backend that delegates to foreign-language implementations. But I agree with @aurelf that it would be nice to have a way to make sure we don't break implementations, other than receiving bug reports from after the fact :P Do you think it's feasible to express the simple backend in terms of your generic backend as I suggested in the comments? That would give us a nice way to make sure we don't break anything...

@@ -35,7 +36,9 @@ set(SHARED_RUNTIME_SOURCES
${CMAKE_CURRENT_SOURCE_DIR}/Shadow.cpp
${CMAKE_CURRENT_SOURCE_DIR}/GarbageCollection.cpp)

if (${QSYM_BACKEND})
if (${RUST_BACKEND})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I think it would make sense to rename the backend in a way that prevents confusion with a possible future backend for analyzing Rust programs (as opposed to your approach of enabling a backend implemented in Rust).

#endif

/// The set of all expressions we have ever passed to client code.
std::set<SymExpr> allocatedExpressions;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be very nice if we could refactor the parts that are the same in the simple backend and your new backend into a shared header or something of the sort, in order to avoid code duplication. We could even go as far as expressing the simple backend as an implementation of your interface - which would nicely solve @aurelf's demand for a testable version of the new backend.

@julihoh
Copy link
Author

julihoh commented Sep 13, 2021

Just a quick note: I haven't forgotten about this, just very busy at this time. I'll come back to this as soon as possible!

@julihoh
Copy link
Author

julihoh commented Sep 20, 2021

Just out of curiosity: Why did you decide to do target -> SymCC runtime -> Rust code rather than target -> Rust code -> parts of SymCC runtime used as helpers? I'm not suggesting that one is better than the other, just curious :)

That's a quite curious design decision indeed. It follows from the following requirements:

  1. I (or we, as in "in collaboration with @domenuk and @andreafioraldi") wanted a solution that could be built entirely from cargo. To be specific, I didn't want to have to do a CMake build to build my otherwise rust-based backend.
  2. Rust/cargo can't re-export symbols from external libraries.

It is possible to link the rust parts of the backend into the final shared object using (Corrosion)
[https://github.com/AndrewGaspar/corrosion]. This would allow the scenario as you describe. Note that, if we want to interface with SymCC's GC, we need some C++ (because it deals in std types). However, this violates requirement 1. (I know that this all works, because this is how I had implemented it initially).

If we now want to invert the dependency between CMake and Cargo (ie. call cmake from cargo), then we can't have cargo call into cmake to do the entire build. Instead, we need to build a static library out of the code that we want from the SymCC runtime and link that into the final shared object, which is now built and linked by cargo. This is where requirement 2. comes in. Since we can't simply re-export the symbols from the SymCC/C++ part of the backend, we need to define the required symbols in our rust crate. Therefore, in the symcc_runtime crate, we 1. gather exported symbols from the C++ runtime using bindgen, 2. generate a header that renames those symbols according to a known pattern and finally compiles the C++ runtime with this renaming header included, therefore producing a static library with the required code, but renamed symbols. Then, in rust, we generate the necessary symbol definitions and forward calls to the C++ runtime with renamed symbols. The renaming is necessary, because, again, we can't re-export external symbols. This technique basically works around requirement 2. Then, to be able to interface with SymCC's GC (and to implement _sym_bits_helper for convenience), we have the C++ part of this PR, which does exactly those two things: interfaces with the GC and implements _sym_bits_helper. The GC interface to the rust runtime is then FFI compatible (ie. does not use C++'s std::set).

I'm kind of annoyed that I can't seem to find a more concise explanation, but the issues are kind of intertwined in a weird way. Hope it is understandable. Also, another goal was to modify as little existing code as possible from SymCC as to be able to easily maintain a fork in case these changes will not be merged.

In principle, I have no objections against a backend that delegates to foreign-language implementations. But I agree with @aurelf that it would be nice to have a way to make sure we don't break implementations, other than receiving bug reports from after the fact :P Do you think it's feasible to express the simple backend in terms of your generic backend as I suggested in the comments? That would give us a nice way to make sure we don't break anything...

The way that the C++ part of the backend (specifically the implementation of _sym_bits_helper) is implemented at the moment means that 32-bit targets can't be supported. (In short: the bit-width of all expressions is stored in the least significant byte of the "SymExpr" (ie. pointer-sized) type. On 32-bit, I believe this leaves too few bits for the actual address.).
Other than that, I see no technical reason why the simple backend couldn't be implemented this way. However, I personally think that is would make the simple backend kind of convoluted. A solution could be potentially to leave the simple backend as it is and re-implement the simple backend (ie. create a second simple backend) in terms of the rust interface.

To come back to the issue of testability: In general, the code in this PR doesn't implement a backend, as discussed before. What it does implement, however, is this whole forwarding business. Therefore my suggestion would be to test the forwarding only decoupled from the actual backend logic. For example, a 'tracing' backend could be implemented that simply outputs the calls that were made to it in text format to stdout. A test script could then ensure the correct sequence of calls were made to the backend. In fact, this is basically what the tests inside the LibAFL repo do at this point: https://github.com/AFLplusplus/LibAFL/tree/main/libafl_concolic/test .

Of course, in my humble opinion, the current C++ parts of the backend (not the LLVM pass of course) should simply be implemented in Rust, providing a FFI compatible interface like the one in the symcc_runtime crate. But that's for another time ;)

domenukk and others added 8 commits January 4, 2022 14:36
* This is a temporary fix due to std::iterator depercation.

This commit needs to be reverted once a proper fix is in place.

* symcc_fuzzing_helper: Move to clap3 (eurecom-s3#94)

* Revert "symcc_fuzzing_helper: Move to clap3 (eurecom-s3#94)" (eurecom-s3#101)

This reverts commit 88b464c.

* Add some FAQs to the Readme

* changed from structopt to clap 3 (eurecom-s3#103)

* fix for issue eurecom-s3#108

* fix for issue eurecom-s3#108

* LLVM 12 works without changes

* Add a clang-format configuration

This is just the output of "clang-format -style=llvm -dump-config".

* Add support for LLVM 13

Clang now uses the new pass manager for the optimization pipeline, so we
have to do the same to make Clang use our pass. Moreover, FileCheck now
complains if a configured prefix doesn't appear in the checked file; added
"ANY" in three tests where it was missing. Finally, printing
arbitrary-precision integers in QSYM needed some changes.

* Add support for LLVM 14

* LLVM 15 works without changes

* fix issue eurecom-s3#109

* Run clang-format

We should really automate this...

* Add a GitHub action that checks LLVM compatibility

* Prevent test failures in case of reordered solver output

Z3 doesn't always output model constants in the same order; make sure
that our tests don't depend on it.

* Accept symbolic input from memory

This commit adds the option to mark symbolic input by calling
symcc_make_symbolic from the program under test.

The refactoring that was required to add the new feature has had the
pleasant side effect that the QSYM backend now doesn't require the
entire input upfront anymore, making it much more convenient to feed
symbolic data through stdin.

* Run GitHub actions for pull requests only

No need for "push": the "pull_request" event already triggers when new
commits are pushed to the PR branch, and we expect all changes to go
through a PR.

Co-authored-by: Aurelien Francillon <[email protected]>
Co-authored-by: Dominik Maier <[email protected]>
Co-authored-by: aurelf <[email protected]>
Co-authored-by: Dominik Maier <[email protected]>
Co-authored-by: Emilio Coppa <[email protected]>
Co-authored-by: Sebastian Poeplau <[email protected]>
* actually fix interface

* more fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants