Skip to content

Latest commit

 

History

History
385 lines (330 loc) · 18.2 KB

README.md

File metadata and controls

385 lines (330 loc) · 18.2 KB

This tutorial will demonstrate the 3C conversion process by converting a small C program into Checked C. The program uses the C library libjpeg to read a JPEG image file provided as a command line argument, and write it to standard output in PPM format. The program being converted is adapted from sample code provided as part of the libjpeg distribution.

This tutorial is intended to specifically demonstrate how 3C can be used to port programs that interact with existing C libraries. For a first tutorial on 3C, see our port of tiny-bignum-c.

Requirements

For conversion of C code into Checked C and compilation of Checked C to C, both 3c and the Checked C version of clang must be built from the repository correctcomputation/checkedc-clang. The Checked C standard library headers from correctcomputation/checkedc are also required. These should already be available if 3C was built according to the 3C build instructions. This tutorial uses these repositories at commits 2a0df40 and 7c6f388 respectively.

The premise of this tutorial is to convert C code that depends on libjpeg, so the library should be installed. On Ubunutu, it can be installed with sudo apt install libjpeg-dev.

The makefile and conversion script included with this tutorial assume that the 3c Checked C version of clang are on your path. If you cloned 3c into your home directory and built it according to the instructions linked above, then the path can be configured with

export PATH="~/checkedc-clang/build/bin/:$PATH"

Initial Conversion

The tutorial begins with unconverted code on this commit. To convert the sample program, we will execute 3C through the provided convert_all.sh script. This script invokes 3C on the source file with some options to control the specifics of the conversion process. (The purpose of the options is explained in the tiny-bignum-c tutorial.

Here we assume that you do not know beforehand that you will need to create a new checked header file for libjpeg, and show how to determine that it is necessary. If this is known ahead of time, the conversion script could be modified to include the options required for converting with the libjpeg header file from the beginning.

The command in convert_all.sh can be simple because only one source file is being converted, and there are no complicated compiler flags that need to be provided.

#!/bin/bash
3c -alltypes -warn-root-cause -output-dir=out ./to_ppm.c --

Running the conversion script (./convert_all.sh) will execute the initial conversion and place the converted files in ./out/ according to the specified -output-dir. The converted code is left in the output directory for the moment because subsequent changes will be applied to the original, unconverted, code. The converted code will be copied from the output directory when we are happy with the progress of the conversion, and want to begin making final adjustments to the Checked C code.

./convert_all.sh
git add out
git commit -m "[3c] Initial conversion"

(view changes).

Converting jpeglib.h

Running the conversion script also emitted root cause diagnostics as specified by the -warn-root-cause option. These diagnostics identify code patterns that have specifically prevented a pointer from being made checked (it remains "wild"); this code is a root cause in the sense that it directly affects the identified pointer (which in turn may affect other pointers with a dataflow relationship). To decide how to proceed with the conversion, we can examine the information in the root-cause diagnostics. By far, the most common root cause of wildness is that there are function declarations in unwritable files. These functions are declared in the system version of the libjpeg header file (jpeglib.h) which is located outside of the current directory. By default, 3C considers files outside the current directory "unwritable", and so will never make changes to such files. Before we can have 3C infer Checked C declarations for these functions, we first need to create a local, writable copy of the header so that 3C can rewrite it.

For consistency, a copy of the header file is already included with this tutorial, but, in general, the header file should be a copy of the system version of the header file. In this case, we would need to create a local include directory and copy the system header into it:

mkdir include
cp /usr/include/jpeglib.h ./include

The local include directory and header file already exist, so we only need to update the conversion script so that the local header is found before the system header. Edit convert_all.sh to append -I ./include/ to the end of the argument list. This must be placed after the dashes (--) which denote the end of arguments passed to 3C and the start of arguments passed to clang. For this tutorial, we have included patch files for this and all other necessary manual changes.

git apply < patches/update_include.patch
git add convert_all.sh
git commit -m "[Manual] Update include path"

(view changes).

Re-running ./convert_all.sh will attempt conversion again, but this time with the writable copy of jpeglib.h, so 3C is capable of making changes to it. This resolves the "unwritable file" root causes, but they are replaced by root causes due to "undefined functions".

./convert_all.sh
git add out
git commit -m "[3C] Convert with local header"

(view changes).

By default, 3C will not rewrite the types of functions that are not defined. To make it infer the types of such functions, 3C can be given the flag -infer-types-for-undefs. This disables constraints that are typically generated which force all undefined functions to remain unchecked, allowing Checked C types to be inferred and added to the header as so-called "interop types" (itypes).

There are some important limitations to this flag:

  • 3C will still only infer types for functions that are declared in files that it is able to rewrite. In particular, this means that functions defined in system library headers will not be automatically converted to use itypes. This is why a local copy of jpeglib.h was created in the same directory as the code being converted in the previous steps. This allows 3C to rewrite the declaration, so checked types will be inferred.
  • 3C will still treat any undefined function as internally unsafe. This means that an undefined function will only be converted to an itype type, and never a fully checked type. If the body of the function is later made available, 3C can be rerun on the code and will rewrite itypes into fully checked types.
  • 3c is not able to examine the bodies of the functions, so inferred pointer types and array bounds are not guaranteed to be correct. The compiler will check that the functions are called in accordance with the inferred types, but doing so risks missing spatial safety violations when the types do not accurately specify the behavior of the implementation. For this reason, the inferred types must be validated against the implementation or the behavior specified in available documentation.

To proceed with conversion of the libjpeg header file (jpeglib.h) we must now update the options passed to 3C to enable conversion of undefined functions. Update convert_all.sh again to add -infer-types-for-undefs to the list of options passed to 3C. When making this edit, be careful to add the option before the two dashes -- which mark the end of the options passed to 3C.

git apply < patches/add_option.patch
git add convert_all.sh
git commit -m "[Manual] Add new option"

(view changes).

Running the 3C conversion for a third time and examining the root cause diagnostics will show that the undefined functions are no longer reported as causes of wildness.

./convert_all.sh
git add out
git commit -m "[3C] Convert with new option"

(view changes).

A Final Root Cause

After this point, the conversion can proceed using the same process described in the tiny-bignum-c tutorial. Any remaining root causes of unchecked pointers will be resolved before committing to using the converted version of the code, and finally placing the entire program in a checked region to finish conversion.

One of the root causes that remains indicates that a pointer is declared in a macro. 3C cannot rewrite inside macros, so we must either fully expand the macro, so that 3C can annotate each expansion individually, or manually add Checked C annotations inside the macros.

In this case, the macro jpeg_common_fields is used to declare the fields of a structure. It is easy to add an itype to the field declarations so that each structure defined using the macro is now defined with itypes on its fields. Only two of the fields (err and mem) are actually relevant to the current conversion. Itypes should be added onto these fields so that they are now declared as struct jpeg_error_mgr *err : itype(_Ptr<struct jpeg_error_mgr>) and struct jpeg_memory_mgr *mem : itype(_Ptr<struct jpeg_memory_mgr>) respectively. The conversion script should then be run again to propagate the change.

git apply < patches/insert_itypes.patch
git add ./include/jpeglib.h
git commit -m "[Manual] Add itypes in macro"

(view changes).

./convert_all.sh
git add out
git commit -m "[3C] Convert with itypes in macro"

(view changes).

At this point, the major issues preventing conversion of the example program (unwritable code, undefined functions, and struct fields defined in macros) have been resolved. A few unchecked pointers remain, but these are either in the header file only, or can be fixed individually in the partially converted code. To do this, the converted versions of the source files need to be copied into the working directory.

cp -r ./out/* .
git add include/jpeglib.h to_ppm.c
git commit -m "[Manual] Copy converted files"

(view changes).

Compiling the Converted Code

Attempting to compile the code by executing make in its current state will show that a few manual fixes need to be made to the converted code.

A few of the errors are syntax errors introduced by 3C while rewriting the header file. These are not interesting and can be fixed easily. The other, more interesting, error is raised in to_ppm.c.

3C infers that a pointer is an array, but is unable to infer any bounds. Checked C raises an error when the pointer is later accessed at constant index 0. Due to the constant index, the length of the array must be at least 1. It is not used anywhere else, so the bounds of the array can be exactly count(1). This pointer is actually a two dimensional array (_Array_ptr<_Array_ptr<unsigned char>>), but Checked C only supports bounds on the outer array pointer. The lack of bounds on the nested pointers will need to be addressed before the program can be considered fully checked.

With bounds added onto the array Checked C now checks that the bounds are preserved on assignment to the array. There is an assignment to the array from the return value of a function declared in jpeglib.h. 3C did not infer bounds for this function, so they must now be added manually. The function is passed a number of rows and the size of each row. As noted above, the array pointer is a two-dimensional array, and we cannot annotate the inner array, so its length is equal to the number of rows (count(numrows)). This is consistent with the actual arguments at the function call: numrows is 1, and the return value is expected to have bounds count(1).

The final error complains about an unsafe cast from a struct jpeg_error_mgr pointer to struct my_error_mgr. While the cast isn't safe, the size of struct my_error_mgr is at least as large as struct jpeg_error_mgr, so we can instruct Checked C to accept the cast with explicit bounds checking by replacing the cast (my_error_ptr) with a _Dynamic_bounds_cast<my_error_ptr>, so

my_error_ptr myerr = (my_error_ptr) cinfo->err;

becomes

my_error_ptr myerr = _Dynamic_bounds_cast<my_error_ptr>(cinfo->err);

The exact fixes are captured in the patch file:

git apply < patches/fix_errors.patch
git add include/jpeglib.h to_ppm.c
git commit -m  "[Manual] Fix errors in converted code."
make

(view changes).

Finishing the Conversion

With the initial conversion working, the final goal is to "fully convert" the source file, where full conversion means enabling the CHECKED_SCOPE pragma while having a minimal number of _Unchecked regions in the program.

Add #pragma CHECKED_SCOPE on at the top of to_ppm.c, before the function definitions but after all #includes. Attempting to compile again with make after adding the pragma reveals new errors raised by the Checked C compiler.

git apply < patches/add_pragma.patch
git add to_ppm.c
git commit -m "[Manual] Add CHECKED_SCOPE pragma."
make || true

(view changes).

The first error is a result of a bounds cast inserted by 3C during conversion. Before turning the CHECKED_SCOPE pragma on, we fixed a different error by adding a dynamic cast so that Checked C would check a bound at run time. Here, an _Assume_bounds_cast was added by 3C to avoid checking a bound that cannot be verified even at run time. This is clearly unsafe, and so it is not allowed in a checked region. Ideally, we would be able to refactor the code to avoid needing the _Assume_bounds_cast, but that is not possible without a substantial change to the functions exported by libjpeg. libjpeg works with two dimensional arrays, but Checked C currently does not have a mechanism for annotating bounds on the inner array pointers. We need to place the _Assume_bounds_cast in an unchecked region before the Checked C compiler will accept the program.

The simplest approach would be to place the whole statement with the cast in an unchecked block, but this would result in the function call and array access being unchecked, which should be avoided. Instead, the statement is split so that only the unsafe cast is unchecked, minimizing the amount of code in unchecked blocks. The statement

put_scanline_someplace(_Assume_bounds_cast<JSAMPROW >(buffer[0],  count(row_stride)), row_stride);

is split into four statements.

JSAMPROW row_unkb : bounds(unknown) = buffer[0];
JSAMPROW row : count(row_stride) = 0;
_Unchecked {
  row = _Assume_bounds_cast<JSAMPROW>(row_unkb,  count(row_stride));
}
put_scanline_someplace(row, row_stride);

Another compiler error is raised because libjpeg using the standard library header setjmp.h for error handling. This header file does not yet have a checked version, so we need to surround calls to its functions in _Unchecked blocks.

The exact fixes required are again provided in a patch file.

git apply < patches/pragma_fixes.patch
git add -u
git commit -m  "[Manual] Fix errors in checked scope."

(view changes).

With these changes, the program compiles successfully with the CHECKED_SCOPE pragma enabled. As a final step, we can check that the program can run without error on sample input.

make
./to_ppm test.jpg | display