Skip to content

Latest commit

 

History

History
243 lines (179 loc) · 12.4 KB

README.md

File metadata and controls

243 lines (179 loc) · 12.4 KB

Mariposa

Mariposa is a tool for testing and quantifying SMT-based proof stability. Given an SMT query and a solver, Mariposa:

  • creates multiple semantics-preserving mutations of the query
  • runs the solver on the mutants and collects the results and performance data
  • applies statistical tests to the data in order to assign a stability category
  • reports the stability category and relevant stability metrics

For more details, please see our paper:

The Mariposa Benchmark and Experimental Data

We used the Mariposa tool to create a benchmark set of SMT queries to use when evaluating SMT proof stability. These queries were selected from a larger set of queries generated by six large-scale program verification projects. The benchmark set, the original query set, and the experimental data from our Mariposa paper live in a separate repository.

Installation

  1. Clone this repository. Since this repository currently includes in the commit history the compressed database files from our past experiments, we recommend that you avoid fetching the full history when cloning:
git clone --filter=blob:none https://github.com/secure-foundations/mariposa.git
  1. You will need a working Rust toolchain to compile the Mariposa code that parses and mutates queries. To compile this code, run:
cd src/smt2action/
cargo build --release
cd -
  1. For preprocessing queries, building proof traces, creating unsat cores, and generating various log files, we use the ninja build system. Please make sure that is installed.

  2. You will need a working Python 3 installation to run the code that performs the experiments and analysis. This code was written using Python 3.8.10 (and has not been tested on other versions). To install the required packages, run

pip3 install -r doc/requirements.txt
  1. There are some solver binaries that are configured in config/solvers.json. Note that most of the solvers are for Linux. To install other versions of Z3, you can create a new solver configuration in the the config/solvers.json file and point it to the location of the binary. See the Solvers section for more information on editing config/solvers.json.

Quick Start

To perform a basic sanity check (on Linux):

python3 src/exper_wizard.py single -s z3_4_12_2 -i data/samples/single_check.smt2 --category-verbosity 3 --clear-existing

If you are on macOS, you might want to provide a different solver with the -s flag:

python3 src/exper_wizard.py single -s z3_4_12_2_osx -i data/samples/single_check.smt2 --category-verbosity 3 --clear-existing

This will test the stability of the query data/samples/single_check.smt2 on the solver Z3 4.12.2, using the default settings for experiments. The result should be something like this:

[INFO] running 1 original queries 
[INFO] exp config:	default 
[INFO] project dir:	gen/single_proj/ 
[INFO] solver path:	bin/z3-4.12.2 
[INFO] analyzer:	default 
| category   |   count | percentage   |
|------------|---------|--------------|
| stable     |       1 | 100.0 %      |
| total      |       1 | 100.00 %     |
[INFO] listing stable queries... 
[INFO] query path:		gen/single_proj/split.smt2 
[INFO] procedure name:		Function-Def bitvector_equivalence::equivalence_proof_lower_n 
plain query unsat in 0.022s
mutation      unsat    unknown    timeout    mean    std
----------  -------  ---------  ---------  ------  -----
shuffle          61          0          0    0.02      0
rename           61          0          0    0.02      0
reseed           61          0          0    0.02      0

The solver and query pair is expected to be stable, as shown above. Each row is a summary of results from a mutation method, which includes the success rate, mean of response times, and standard deviation of response times. The success count is also given. Using the default configuration,60 mutants are generated for each mutation method in addition to the original query. Therefore 61/61 means all the mutants (and the original) succeeded for a mutation method.

Mariposa Concepts

Mariposa's operations are organized around some concepts.

  • Solver $(S)$: an SMT solver such as z3 or cvc5. A solver must be configured before being used in Mariposa. Please see configuration for more details.
  • Query $(Q)$: an individual SMT query. This typically refers to a preprocessed query.
  • Project $(P)$: a collection of SMT queries, typically originated from the same verification project. For example, Komodo produces a set of SMT queries, which we consider as a project in our analysis. The queries from a verification project usually corresponds to a base project, which identifies a project group.
  • Project Group $(G)$: a collection of projects, originated from the same base project through some query action. For example, if we take apply the unsat-core action to queries in the Komodo base project, we obtain another project under this project group. Similarly, if we apply the shake query action to queries in the Komodo base project, we obtain yet another project under the same group.
  • Query Operation $(O)$: an operation at the query level. For example, the unsat-core action takes a query $Q_0$ as input, and produces a $Q_{core}$ that is a subset of the assertions (commands) in $Q_0$.
  • Experiment Config $(C) $: a set of parameters to configure how experiment is ran. Please see configuration for more details.
  • Experiment Instance $E$: an instance of the Mariposa experiment, identified by the a tuple of $(C,P,S)$, i.e., an experiment configuration, a project, and a solver.
  • Analyzer $(A)$: configures how the stability status should be decided. An analyzer can be applied to a existing experiment instance. Please see configuration for more details on how to customize the analyzer.

Mariposa Utilities

Mariposa interface is currently split into 4, based on the level of abstraction at which it operates:

src/query_wizard.py

This script handles query-level operations. It takes as input a query file $(Q)$, performs various query operations, and (typically) outputs another query or log file.

src/proj_wizard.py

This script handles project management. It takes as input a project $(P)$, and applies some transformation to create another project under the same project group $(G)$.

src/exper_wizard.py

This script handles running experiment. It takes a tuple of $(C,P,S)$ as input and performs experiment. It contains utilities for cluster management, which is not expected in common use cases.

src/analysis_wizard.py

This script handles post-experiment analysis. It takes a tuple of $(C,P,S,A)$ as input. It also has analysis that operates over a group $G$ or multiple groups, which is not expected in common use cases.

Typical Mariposa Workflows

The "common" way of using Mariposa is through the exper_wizard, which operates on a project $P$. However, if you are only interested in a particular SMT query, the exper_wizard also offers the single mode, generally used for a "quick" stability test of a single query and a solver, without needing to creating an explicit project.

Experiment with a Single Query

The two required arguments for the single mode are:

  • -i/--input-query-path, the path to the query
  • -s/--solver, the name of the solver (see config/solvers.json)

The results are stored in a temporary database under gen/. This mode can handle a query with multiple (check-sat) commands. In that case, the input query will be split for each (check-sat). For example:

python3 src/exper_wizard.py single -i data/samples/multiple_checks.smt2 --clear-existing

The query file actually contains three (check-sat) commands. The split queries are placed in gen/single_proj/.

[INFO] running 3 original queries 
exp config:	default
project dir:	gen/single_proj/
solver path:	bin/z3-4.12.2
analyzer:	default

=== Overall Report ===
| category   |   count | percentage   |
|------------|---------|--------------|
| stable     |       3 | 100.0 %      |
| total      |       3 | 100.00 %     |

The temporary database is also under gen/single_proj/. One can also repeat the analysis without the experiment, without the --clear-existing flag.

python3 src/exper_wizard.py single -i data/samples/multiple_checks.smt2 

The above will load the temporary database. Please note the temporary results will be overwritten by any further experiment that specifies --clear-existing.

Experiment with a Project

exper_wizard manages the experiments over a project. It maintains a directory structure for projects under data/. Please do not manually interfere with the sub-directories, including data/projs, data/dbs or data/logs, otherwise unexpected errors might occur. Mariposa should automatically handle the bookkeeping of project and experiment results (if not, that is a bug, and PR or issues are welcome).

1. Create a New Project

First, we need to create a project, otherwise Mariposa does not know how to manage the project group. Moreover, the queries in the wild typically does not conform to the standard SMT-LIB format, so it is really necessary for Mariposa to pre-process the queries at this step!

As an example, we can use the following command to create a project named sample out of the queries under data/samples/.

./src/proj_wizard.py create -i data/samples/ --new-project-name sample

It should ask for a confirmation to run the build script that generates the pre-processed files:

[INFO] output directory is set to data/projs/sample/base.z3 
[INFO] generated 8 targets in build.ninja 
[INFO] run `ninja -j 6 -k 0` to start building? [Y]  Y

If we answer Y, the output should look something like the following.

[INFO] generated 11 files in data/projs/sample/base.z3 

This means a new project group named sample is created, which contains a base project named base.z3, and that project contains 11 queries.

2. Running Experiment on a Project

Now that we have created a new project, we can run an experiment with that project using exper_wizard, by specifying the directory. Please note that any directory that is not created by the proj_wizard will likely cause problems.

./src/exper_wizard.py multiple -i data/projs/sample/base.z3/ -e debug

It might take a few minutes for exper_wizard to run. The expected result should look something as the following:

[INFO] running 11 original queries 
exp config:	debug
project dir:	data/projs/sample/base.z3
solver path:	bin/z3-4.12.2
analyzer:	default

=== Overall Report ===
| category   |   count | percentage   |
|------------|---------|--------------|
| stable     |       6 | 54.55 %      |
| unstable   |       3 | 27.27 %      |
| unsolvable |       2 | 18.18 %      |
| total      |      11 | 100.00 %     |

We can also list the available experiments:

python3 src/exper_wizard.py info

This should give something like the following, meaning we have a project group called sample, containing a base project named base.z3 with 11 queries, and we have ran the z3_4_12_2 using the default experiment configuration on it.

project group: sample
	base.z3 (11)
		debug - z3_4_12_2

3. Running Analysis on a Experiment

After an experiment is finished, we can run analysis on it.

./src/analysis_wizard.py basic -i data/projs/sample/base.z3 -e debug --analyzer 2sec

This should print out a similar report as above. However, we note there is one more query that is considered unstable, since we are applying a more stringent time limit in our analyzer. The results of course can differ depending on the machine Mariposa is running on.

exp config:	debug
project dir:	data/projs/sample/base.z3
solver path:	bin/z3-4.12.2
analyzer:	2sec

=== Overall Report ===
| category   |   count | percentage   |
|------------|---------|--------------|
| stable     |       5 | 45.45 %      |
| unstable   |       4 | 36.36 %      |
| unsolvable |       2 | 18.18 %      |
| total      |      11 | 100.00 %     |