Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

Benchmarks for evaluating Checked C

David Tarditi edited this page Feb 25, 2019 · 9 revisions

This page lists real-world programs that we have chosen to use to evaluate Checked C. Our goal is to have a suite of 10 to 12 programs that we can use to evaluate Checked C and tools for converting existing code to use Checked C.

The criteria for considering a program for inclusion in the evaluation suite are:

  • The program is used in production.
  • Someone cares about the program. The program is associated with an active project and it is being developed or maintained.
  • The license allows us to redistribute a modified version of the program.
  • The source code is at least 10,000 lines long.

We used the following criteria for decide on the mix of programs to include in the evaluation suite:

  • Prefer programs that are widely used.
  • Have an interesting mix of programs. Avoid more than 2 programs that "do the same thing" (compression, image processing etc.).
  • Prefer programs that run on Windows in addition to Unix over those that just run on Unix.
  • Prefer programs that have well-defined testing and benchmarking processes.

Selected programs

  • GPLv2
  • 17,000 lines of C code
  • Internet service, probably in decline
  • Has had security issues.
  • Bugs have bound in this before using fuzzing.
  • 3-clause UC Regents license
  • 145,000 lines of C code
  • Used all over the place at a low level
  • "mid-modern" (2004)
  • Very permissive license
  • 66,000 lines of C code
  • Used all over the place at a low level
  • Initially developed in 1993?
  • Library for asynchronous I/O, originally developed as the platform layer for Node.js
  • About 70,000 lines of C code (note: this is an old number and should be re-checked).
  • Interpreter for Lua scripting language
  • About 22,000 lines of C code
  • Open source codec for interactive speech/audio transmission over the Internet.
  • 70 ,000 lines of C code.
  • Very permissive license
  • 94,000 lines of C code
  • Internet service in semi-popular use
  • Slightly more modern (2009)
  • Public domain
  • 130,000 lines of C code
  • Very robust and well tested
  • GPLv2
  • 16,000 lines of C code
  • Has a very good security track record and was written in a very defensive style with regards to memory management. It would be an interesting case study to contrast with something that isn't, like say libarchive.
  • Lightweight SSL library used in IoT and embedded space.
  • About 100,000 lines of C code (this line count needs to be double-checked).
  • Pascal Cuoq of trust-in-soft is currently running a bug-finding effort on this project with tis-interpreter.
  • Very permissive license
  • 20,000 lines of C code
  • Used all over the place at a low level
  • Old (1995), C systems code

Other interesting programs

There are many other interesting C programs, ranging from tiny to very large.

  • ParsonParson. A C library for parsing JSON files. We haveChecked C version of this parson.
  • CMake: this is a mixture of C and C++ code. Last time, we checked it had about 600,000 lines of C code
  • Git: about 210,000 lines of C code
  • Linux kernel (to use as a benchmark for Checked C, we need this to compile reliably with clang. The LLVM/clang community is working on this.
  • NeoVim.
  • OpenSSL.
  • X.org. About 3.3 millions of code, most of which is C.