Experimental translation of LLVM (3.5ish) IR to SMT-LIB.
This tool, llvm2smt, parses a llvm bitcode file (in its human readable form) and translates it to a symbolic SMT-LIB representation.
Currently the resulting SMT-LIB file uses the theory of bitvectors and arrays (QF_ABV).
The goal is to support symbolic analyses, such as bounded model checking, using SMT solvers.
The tool is in its infancy and only translates the llvm IR as it appears. Any logical properties one might want to verify must be added by hand.
The file test/shufflevector.ll
includes two simple functions written
by hand. The first function @lhs
takes two integers, stores them
in a two-element vector, shuffles the vector twice, then returns the first
vector element.
; Function Attrs: nounwind ssp uwtable
define i32 @lhs(i32 %a, i32 %b) #0 {
%1 = insertelement <2 x i32> undef, i32 %a, i32 0
%2 = insertelement <2 x i32> %1, i32 %b, i32 1
%3 = shufflevector <2 x i32> %2, <2 x i32> undef, <2 x i32> <i32 1, i32 0>
%4 = shufflevector <2 x i32> %3, <2 x i32> undef, <2 x i32> <i32 1, i32 0>
%5 = extractelement <2 x i32> %4, i32 0
ret i32 %5
}
The second function @rhs
does the same thing without the shuffles.
; Function Attrs: nounwind ssp uwtable
define i32 @rhs(i32 %a, i32 %b) #0 {
%1 = insertelement <2 x i32> undef, i32 %a, i32 0
%2 = insertelement <2 x i32> %1, i32 %b, i32 1
%3 = extractelement <2 x i32> %2, i32 0
ret i32 %3
}
We can show that these functions are equivalent by first translating the LLVM IR to SMT-LIB via:
> llvm2smt shufflevector.ll > shufflevector.smt
Function @rhs
is translated to the following SMT-LIB statements, in SMT-LIB the character @
can be controversial
so we replace it with _@
.
;; Function: |_@rhs|
;; (i32 %a, i32 %b)
(declare-fun memory2 () Mem)
(define-fun rsp2 () (_ BitVec 64) (_ bv0 64))
(declare-fun |%a_@rhs| () (_ BitVec 32))
(declare-fun |%b_@rhs| () (_ BitVec 32))
;; BLOCK %0 with index 0 and rank = 1
;; Predecessors:
;; |_@rhs_block_0_entry_condition|
(define-fun |_@rhs_block_0_entry_condition| () Bool true)
;; %1 = insertelement <2 x i32> undef, i32 %a, i32 0
(define-fun |%1_@rhs| () (Array (_ BitVec 1) (_ BitVec 32)) (store vzero_1_32 ((_ extract 0 0) (_ bv0 32)) |%a_@rhs|))
;; %2 = insertelement <2 x i32> %1, i32 %b, i32 1
(define-fun |%2_@rhs| () (Array (_ BitVec 1) (_ BitVec 32)) (store |%1_@rhs| ((_ extract 0 0) (_ bv1 32)) |%b_@rhs|))
;; %3 = extractelement <2 x i32> %2, i32 0
(define-fun |%3_@rhs| () (_ BitVec 32) (select |%2_@rhs| ((_ extract 0 0) (_ bv0 32))))
;; ret i32 %3
;; No backward arrows
(define-fun |_@rhs_result| () (_ BitVec 32) |%3_@rhs|)
The key points are:
-
The function takes two input arguments denoted by
|%a_@rhs|
and|%b_@rhs|
. Both are bitvectors of length 32. -
The return value of the function is denoted by
_@rhs_result
.
The other function is encoded similarly.
To check whether these two functions are equivalent, we add the following two SMT-LIB commands at the end of the file:
(assert (and (= |%a_@lhs| |%a_@rhs|) (= |%b_@lhs| |%b_@rhs|) (not (= |_@lhs_result| |_@rhs_result|))))
(check-sat)
This tests whether the functions @lhs
and @rhs
can produce different results when run of the same input.
We can then give the entire file to an SMT solver, such as yices-smt2
, to conclude:
> yices-smt2 shufflevector.smt
unsat
As expected, the assertion is not satisfiable: if we give both functions the same input, they produce the same result.
llvm2smt
is written in OCaml. It is known to compile with OCaml 4.02.1
but other versions may work too. Standard OCaml tools are required
including ocamllex
, ocamlyacc
, and ocamldep
.
Installing OCaml is reasonably easy. Check the instructions at https://ocaml.org/docs/install.html.
Once you have OCaml, go to the ./src
directory then type
> make
This will build two main executables:
-
parse
is based on Trevor Jim's parser for LLVM assembly language (.ll
suffix). It can be used to check that our tool properly parses LLVM. -
llvm2smt
is the main tool. It produces an SMT-LIB specification from a single.ll
input.
Examples and tests for both are included in the ./examples
,
./test
, and ./bitcode
directories. Check the Makefile for details.
On simple examples (i.e., one source file), you can generate bitvcode using clang -S -emit-llvm
. For
more complex builds, we typically use wllvm.
llvm2smt
translates every basic block in the LLVM file into a
sequence of SMT-LIB declarations and definitions. We use a global
array to represent memory. For a 64 bit address space, this array has
type
(Array (_ BitVec 64) (_ BitVec 8)).
Read operations are encoded using SMT-LIB select
and write
operations are encoded using store
. Each write operation produces a
new memory state, denoted by a fresh SMT-LIB constant.
We also use a global variable to denote the stack pointer. It is used to
encode the LLVM alloca
operations (i.e., create local variables on the stack).
We use a bit-precise representation: i1
variables are represented as
Boolean, all other integer types are converted to bitvectors of the
appropriate size. For example, i32
variables are represented as
bitvectors of length 32. We support all LLVM types except
floating-point numbers. For LLVM vector types, we use SMT-LIB
arrays. For example a register of type <2 x i32>
is represented as
an array of two bitvectors of length 32. The array itself is of type
(Array (_ BitVec 1) (_ BitVec 32)).
The SMT-LIB translation assumes that every basic block is executed at
most once. In most cases, this means that we must unroll loops before
the translation by using opt
with the following command switches:
> opt -loop-rotate -loop-unroll -unroll-count=3 ...
(Try opt --help-list-hidden
to see all the good things opt
can do for you.)
We do not handle function calls. A work around is to force the compiler to inline the calls to all relevant functions. This can be done by annotating function declarations as follows:
static __attribute__ ((__always_inline__)) int my_function(int x) {
...
}
We do not handle floating-point types in LLVM since the QF_ABV logic that we use does not support include floating point operations. Our crude approach for now is to convert all floating-point constants to zero and all floating-point register to uninterpreted constants in the SMT-LIB translation.
In addition to call
mentioned above, we do not handle the following
LLVM instructions invoke
, landingpad
, resume
, va_arg
,
indirectbr
, cmpxchg
, atomicrmw
, fence
, addrspacecast
,
extractvalue
, and insertvalue
. Some of these could be added but we
have not encountered them in our C-code examples.
Our code builds upon an OCaml-based parser for LLVM written by Trevor Jim:
https://github.com/tjim/smpcc/blob/master/compiler/
We diverged from this repository around February 2015.