-
Notifications
You must be signed in to change notification settings - Fork 771
C Programming, Part 4: Strings and Structs
In C we have Null Terminated strings rather than Length Prefixed for historical reasons. What that means for your average everyday programming is that you need to remember the null character! A string in C is defined as a bunch of bytes until you reach '\0' or the Null Byte.
Whenever you define a constant string (ie one in the form char* str = "constant"
) That string is stored in the data or code segment that is read-only meaning that any attempt to modify the string will cause a segfault.
If one however malloc
's space, one can change that string to be whatever they want.
One common gotcha is when you write the following
char* hello_string = malloc(14);
___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
// hello_string ----> | g | a | r | b | a | g | e | g | a | r | b | a | g | e |
‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾
hello_string = "Hello Bhuvan!";
// (constant string in the text segment)
// hello_string ----> [ "H" , "e" , "l" , "l" , "o" , " " , "B" , "h" , "u" , "v" , "a" , "n" , "!" , "\0" ]
___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
// memory_leak -----> | g | a | r | b | a | g | e | g | a | r | b | a | g | e |
‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾
hello_string[9] = 't'; //segfault!!
What did we do? We allocated space for 14 bytes, reassigned the pointer and successfully segfaulted! Remember to keep track of what your pointers are doing. What you probably wanted to do was use a string.h
function strcpy
.
strcpy(hello_string, "Hello Bhuvan!");
Forgetting to NULL terminate a string is a big affect on the strings! Bounds checking
In low level terms, a struct is just a piece of contiguous memory, nothing more. Just like an array, a struct has enough space to keep all of its members. But unlike an array, it can store different types. Consider the contact struct declared above
struct contact {
char firstname[20];
char lastname[20];
unsigned int phone;
};
struct contact bhuvan;
Brief aside
/* a lot of times we will do the following typdef
so we can just write contact contact1 */
typedef struct contact contact;
contact bhuvan;
/* You can also declare the struct like this to get
it done in one statement
typedef struct optional_name {
...
} contact;
If you compile the code without any optimizations and reordering, you can expect the addresses of each of the variables to look like this.
&bhuvan // 0x100
&bhuvan.firstname // 0x100 = 0x100+0x00
&bhuvan.lastname // 0x114 = 0x100+0x14
&bhuvan.phone // 0x128 = 0x100+0x28
Because all your compiler does is say 'hey reserve this much space, and I will go and calculate the offsets of whatever variables you want to write to'.
The offsets are where the variable starts at. The phone variables starts at the 0x128
th bytes and continues for sizeof(int) bytes, but not always. Offsets don't determine where the variable ends though. Consider the following hack that you see in a lot of kernel code.
typedef struct {
int length;
char c_str[0];
} string;
const char* to_convert = "bhuvan";
int length = strlen(to_convert);
// Let's convert to a c string
string* bhuvan_name;
bhuvan_name = malloc(sizeof(string) + length+1);
/*
Currently, our memory looks like this with junk in those black spaces
___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
bhuvan_name = | | | | | | | | | | | |
‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾
*/
bhuvan_name->length = length;
/*
This writes the following values to the first four bytes
The rest is still garbage
___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
bhuvan_name = | 0 | 0 | 0 | 6 | | | | | | | |
‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾
*/
strcpy(bhuvan_name->c_str, to_convert);
/*
Now our string is filled in correctly at the end of the struct
___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ____
bhuvan_name = | 0 | 0 | 0 | 6 | b | h | u | v | a | n | \0 |
‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾ ‾‾‾‾
*/
strcmp(bhuvan_name->c_str, "bhuvan") == 0 //The strings are equal!
Legal and Licensing information: Unless otherwise specified, submitted content to the wiki must be original work (including text, java code, and media) and you provide this material under a Creative Commons License. If you are not the copyright holder, please give proper attribution and credit to existing content and ensure that you have license to include the materials.