-
Notifications
You must be signed in to change notification settings - Fork 3
New pointer and array types
The Checked C extension adds new array and pointer types. The new types include:
-
Single and
multi-dimensional arrays with bounds checking. These are specified by placing the keyword
_Checked
before the dimension of the array. For example,int x _Checked[10]
declares a 10-element integer array with bounds checking. - The
_Ptr
type for pointers to single objects. Pointer arithmetic is not allowed on these types. For example,_Ptr<int> p
declares a pointer to a single object. - The
_Array_ptr
type for pointers to elements of arrays. Pointer arithmetic is allowed on these types. When_Array_ptr
pointers are used to access memory, they are bounds checked and checked for non-nullness. Programmers must declare the bounds of_Array_ptr
variables to be able to use them to access memory. The compiler constructs bounds for_Array_ptr
typed expressions based on the bounds for variables. - The
_Nt_array_ptr
type for pointers to elements of null-terminated arrays. These are similar to_Array_ptr
types, with two differences. The element just beyond the declared bounds can be read or have a zero written to it. If the element is non-null, the bounds for the pointer can be widened by 1 element.
A programmer can declare a checked array by prefixing the dimension of the
array with the keyword _Checked
. Here is an example:
double a _Checked[10]
This declares an array with 10 elements. Accesses to this array will be
bounds checked at runtime. The index to the array must be be >= 0 and < 10.
If a runtime bounds check fails, a runtime error is signalled. This typically
causes the program to exit immediately:
a[0] = e; // works fine.
a[-1] = e; // program exits
Existing C array types do not check out-of-bounds array accesses, so they are called unchecked array types. In contrast to Checked C, they have undefined behavior when an out-of-bounds access occurs (typically they read or write the memory location that would correspond to the out-of-bounds array element, unless the memory is not mapped by the operating system).
Multi-dimensional arrays in C are simply arrays of arrays. The _Checked
prefix can be placed before the first array dimension size. It propagates
to the nested arrays in the declaration.
// checked 10 dimensional array of checked 5 element arrays
double m _Checked[10][5];
m[0][0] = e; // works fine.
int x = m[-1][0]; // program exits
The prefix can be specified explicitly if desired:
double m _Checked[10]_Checked[5]; // equivalent to original declaration.
The prefix does not propagate into typedef'ed types. It only propagates to arrays declared as part of the declaration.
The runtime bounds checking checks that multi-dimensional array accesses are within the entire bounds of the array. It does not check that accesses for an individual dimension are within the range for that dimension. This allows runtime bounds checking to be efficient for multi-dimensional arrays, while preserving type safety. (in our experience, the cost of checking individual dimensions can cause performance problems in Java and C# programs.)
m[-1][10]; // runtime check does not fail because this accesses m[0][0]
A multi-dimensional array is either checked or unchecked. If an array has an element type that is also an array, the array and the element array both must be checked or unchecked. If a typedef'ed array type is used an element of another array, the checkedness must match.
typedef double vec[10];
vec a _Checked[10]; // incorrect - mismatch in checked property.
Strings in C are arrays of characters that are null-terminated. A programmer
can declare a checked null-terminated array type by prefixing the dimension
with _Nt_checked
:
char s _Nt_checked[6] = "hello";
The array element type of an _Nt_checked
array must be an integer
type, an enumeration type, or a pointer type. The null terminator is defined
to be 0
:
int a _Nt_checked[5]; // OK
double b _Nt_checked[5]; // compile-time error
struct S {
int x;
int y;
};
S _Nt_checked[5]; // compile-time error
The bounds checking for null-terminated array accesses is slightly different than for regular arrays. All elements of the array can be read. For all elements of the array except the last element, any value can be written. For the element, only a null value can be written:
char c;
c = s[0]; // OK
c = s[5]; // OK.
s[0] = 'c'; // OK
s[5] = 'c'; // Program exits. Cannot ovewrite last element with non-zero value.
By themselves, null-terminated array types are not terribly useful. The sizes of the arrays are known, after all. Their usefulness will become apparent when we discuss pointers to null-terminated arrays.
There are many uses of pointers in C programs where pointers are just that:
pointers. The pointer may point to a single object or be null. Checked
C provides a new pointer type _Ptr
that declares a pointer to a singleton
object. The same syntax as C++ templates is used to declare these pointers:
int x = 0;
_Ptr<int> x = &x;
struct binary_tree_node {
int val;
_Ptr<struct binary_tree_node> left, right;
};
Pointer arithmetic is not allowed on these pointers:
x = x + 1; // compile-time error: pointer arithmetic not allowed on _Ptr
Measurements show that 30-60% of pointers declared in C programs could
be declared as _Ptr
types.
A runtime null check is done before accessing memory with a _Ptr type. If the check fails, a runtime error is signalled. This typically causes the program to exit immediately.
_Ptr<int> p = 0;
*p = 0; // program exits
int y = *p; // program exits
In C, most uses of array variables are converted implicitly to pointers
(there are a few cases, such as the &
and sizeof operators,
where they aren't). It makes sense to allow pointer arithmetic on such
pointers. Checked C provides a new pointer type _Array_ptr
that declares a
pointer to an array element. Pointer arithmetic is allowed on these pointers.
int a[5];
_Array_ptr<int> p = a;
_Array_ptr<int> t = p + 1;
A programmer must declare the bounds of an _Array_ptr
variable to be able
to access memory using the pointer or a pointer derived from the pointer.
We'll explain bounds declarations in detail elsewhere. We introduce a simple
bounds declaration here so that we can explain how _Array_ptr
s work. The count bounds
expression declares the number of array elements that a pointer
points to:
int a[5] = { 0, 1, 2, 3, 4};
_Array_ptr<int> p : count(5) = a; // p points to 5 elements.
The bounds are part of the declaration of a variable and are placed after the
declarator using a :
. A count of 5 means that it is valid to access memory
at or above p and below p + 5 (the lower bound is included and the upper bound
is excluded).
p // valid to access
p + 4 // valid to access
p + 5 // not valid to access
When the indirection operator (*
) is used to access memory
and the pointer value is an _Array_ptr
, the memory access
is bounds checked. The compiler constructs expressions that
represent at runtime the upper and lower bounds of memory
that can be accessed using the ponter. Those bounds are
use to check that the memory access is in range.
If the operand of the *
operator is a variable,
the compiler uses the declared bounds for the
variable:
int y = *p; // bounds of p are (p, p + 5), so this access is OK.
*p = y + 1; // same here.
If the operand is a pointer arithmetic expression, the compiler uses the bounds of the pointer-typed subexpression. Pointer arithmetic does not change the range of memory that is valid to access.
For example, given the operand p + i
, it has the same bounds as
p
, these bounds will be used to bounds check the memory access:
int sum = 0;
for (int = 0; i < 5; i++) {
int v = *(p + i); // check that p + i is within (p, p + 5)
sum += v;
}
If there was on off-by-one error in the for-loop, the
program would terminate when i == 5
.
for (int = 0; i < 6; i++) {
int v = *(p + i); // check that p + i is within (p, p + 5)
...
}
In C, arrays are converted implicitly to pointers at most uses.
When a _Checked
array is converted to a pointer, it is converted
to an _Array_ptr
. The computed bounds for the converted
value are the bounds of the _Checked
array:
int a _Checked[5];
... a ... // bounds are (a, a + 5)
C defines e1[e2]
to be the same as *(e1 + e2)
. Given:
int a _Checked[5];
a[i] = ...
this is the same as:
int a _Checked[5];
*(a + i) = ...
Because *
is a memory access using a checked pointer, it will be bounds
checked. The calculated bounds of a + i
are (a, a + 5)`, leading to:
int a _Checked[5];
*(a + i) = ... // check that a + i is within (a, a + 5).
C represents strings using null-terminated arrays of characters. A null-terminated array has a null element stored in its last elements. For strings, this is a null character. For strings, this avoids the need for a length field. C uses "pointer to character" types as string types, w with the convention that these pointers point to elements of null-terminated arrays.
This is a problem because "pointer to character" types can also point to elements of non-null-terminated arrays. A confusion about which convention applies to a specific pointer can result in a buffer overrun. A null terminator can be missing, if a pointer to a non-null terminated array is passed to a function expecting a null-terminated array. A null-terminator can be overwrriten if the reverse happens.
Strings are fundamental enough to C that the Checked C extension introduces
a special pointer type for pointers to null-terminated arrays called _Nt_array_ptr
.
This way, instead of relying on programming conventions, the information is
represented directly in the program and can be checked at compile time.
A use of an _Nt_checked
array converts implicitly to an _Nt_array_ptr
:
char s _Nt_checked[] = "hello";
s[0] = 'c'; // s converts to a _Nt_checked array.
If a programmer does not declare bounds for an Nt_array_ptr
variable, it has
a bounds of count(0)
. The bounds checking rules for _Nt_array_ptr
differ from the rules for _Array_ptr
. Given a bounds of (lower, upper)
,
- The memory locations that are >= lower and < upper can be read or written just like a regular _Array_ptr memory locations.
- The memory location at upper can be read and the null value can be written to it.
When a use of an_Nt_checked
array is converted implicitly to an
_Nt_array_ptr
, the count is one less than the number of elements in the
array. The null element is excluded.
Here are some examples:
char s _Nt_checked[1] = "";
char t _Nt_checked[6] = "hello";
_Nt_array_ptr<char> m : count(0) = s;
_Nt_array_ptr<char> n = t; // defaults to count(0)
_Nt_array_ptr<char> p : count(5) = t;
if (*m) { } // reading element at upper bound allowed
*p = 'c'; // OK.
*m = 'a'; // error - writing non-null value at upper bound.
p[5] = 'a'; // error - writing non-null value at upper bound.
_Nt_array_ptr
pointers have a special property that _Array_ptr
pointers do not: their bounds
can be widened. If control flow establishes that the value at the upper bound is non-null,
the bounds can be widened by 1.
_Nt_array_ptr<char> get_id_at(_Nt_array_ptr<char> str : count(i), unsigned i) : count(3) {
if (*(str + i) && *(str + i + 1) && *(str + i + 2)) {
return str + i;
}
return NULL;
}
Checked C Wiki