Lab 1: Buffer Overflows


Lab Overview

Important Note: This course's labs, including this lab, ask you to design exploits and to perform attacks. These exploits and attacks are realistic enough that you might be able to use them to perform a real-world attack, but you should not do so. The only goal of the designing exploits is to teach you how to defend against them, not how to use them to attack others---attacking computer systems is illegal and can get you into serious trouble. Don't do it.

You will do a sequence of labs in this course. These sequence of labs will give you practical experience with common attacks and counter measures. These labs consist of the following:

  • Lab 1: you'll understand the principal of buffer overflows and will understand how such attacks happen in real-world application (say, a web server);
  • Lab 2: you'll explore return-oriented programming (ROP) techniques, these techniques are widely used on systems with non-execution protections;
  • Lab 3: you'll study privilege separation, when one component of a computer system is wrecked, the whole system will not crash;
  • Lab 4: you'll design and implement a simple intrusion protection system (IPS).

This is lab 1, in this lab, you will study the basic principal of buffer overflows and then use this knowledge to attack a real-world application: a web server. Finally, you'll fix the buffer overflows by patching the source code.

Software Setup

You should finish all labs on the Linux OS we given, so you should first install the Ubuntu OS listed on the tools page. Especially, you should download and install the Ubuntu 12.04. (Download this software to your machine and unpack it into some directory.) Important note: install the OS in a virtual machine (say, VMWare), instead of on your bare machine, or else the attack may bite yourself.

Getting Started

We've offered some code here to start with. Download this code to your machine and unpack it into some directory.

Lab Requirement

There are two kinds of exercises: normal exercises and challenge ones. Challenge exercises may not be that hard, but may involve substantial code hacking. You are required to do ALL normal exercises. All challenge exercises are optional (but you're encouraged to try them).

Hand-in Procedure

When you finished your lab, zip you solutions and submit to the school's information system.

Part A: Buffer Overflow Principal

In this part of the lab, you will study the basic principal of buffer overflows, and then you will study how to use buffer overflows to attack a simple vulnerability program, some basic theorem will assistant you to realise the goal.

Now, you should browse the source code we given and find out the file stack1.c. There is a simple C program in this file, which has buffer overflow vulnerability. You can compile this program using the gcc compiler:

    $ gcc -g stack1.c -o stack1
    $ ./stack1
note the use of the -g parameter, which will be useful when you debug the executable using gdb.

Stack Layout and Buffers

In computer science, a call stack is a stack-like data structure holding information to control function calls and returns. Stack layout is the convention on how the stack frame is used. As an example, read the simple C program given you (the C file stack1.c).

When the function main calls the function func, the stack layout looks like the following, pay special attention to the positions of the local variables and arguments.

                
                     +------------------+ high address
                     |       ...        |
                     |  stack frame of  |
                     |   main           |
                     |       ...        |
                     +------------------+
                     |  str(a pointer)  | (4 bytes)
                     |  return address  | (4 bytes)
           %ebp----> |    saved %ebp    | (4 bytes)
                     +------------------+
                     |       buf[11]    |
                     |       ...        | (12 bytes)
            buf----> |       buf[0]     |
                     |     variable_a   | (4 bytes)
                     |       ...        |
                     +------------------+ low address 

 

Exercise 1. Now, you can write some code. Your job is to print the address of the variable buffer, in the C program stack1.c, and compile the C program as above. Run it three times, observe and write down the output addresses in address.txt, are these 3 addresses the same or not?

Challenge! Read the file /proc/pid/maps on your machine (pid is the process id), observe the value of [stack]. You can read this article to learn how effective the ASLR on Linux systems is.

Now you can investigate the stack layout and C calling convention in detail, for this, you should use the debugger gdb. In this and future labs, you will use gdb heavily.

Exercise 2. Use gdb to debug the program, as the following. You may find the online gdb manual useful.

$ gdb stack1
(gdb) b  func 
Breakpoint 1 at 0x8048412: file stack.c, line 8.
(gdb) r 
Starting program: /tmp/stack1
8	  strcpy(buffer,str);
(gdb) info r  
eax            0x80484e8	134513896
ecx            0xbffff504	-1073744636
edx            0xbffff494	-1073744748
ebx            0xb7fc8000	-1208188928
esp            0xbffff410	0xbffff410
ebp            0xbffff438	0xbffff438
esi            0x0	0
edi            0x0	0
eip            0x8048412	0x8048412 
eflags         0x282	[ SF IF ]
cs             0x73	115
ss             0x7b	123
ds             0x7b	123
es             0x7b	123
fs             0x0	0
gs             0x33	51
(gdb) x/2s 0x80484e8 
0x80484e8:	 "I am greater than 12 bytes"
0x8048503:	 ""
(gdb) p &buffer 
$1 = (char (*)[12]) 0xbffff424
(gdb) x/4wx 0xbffff424 
0xbffff424:	0x08048320	0x00000000	0x080482bd	0xb7fc83e4
(gdb) x/8wx $ebp 
0xbffff438:	0xbffff468	0x08048443	0x080484e8	0xbffff504
0xbffff448:	0xbffff50c	0xb7e54225	0xb7fed280	0x00000000
(gdb) x/2i 0x08048443 
0x8048443 :	leave  
0x8048444 :	ret    
(gdb) disass func 
Dump of assembler code for function func:
0x0804840c :	push   %ebp
0x0804840d :	mov    %esp,%ebp
0x0804840f :	sub    $0x28,%esp
0x08048412 :	mov    0x8(%ebp),%eax
0x08048415 :	mov    %eax,0x4(%esp)

Address Space Layout Randomization

In order to protect against buffer overflows, most recent operating systems introduce many protection mechanisms, among which the most important one is address space layout randomization (ASLR). Basically, in a system with ASLR, the starting address of the heap and the stack, along with other segments, will be randomized, so it's will be difficult for the attack to know or guess the specific address of any memory segments, say the stack. Here is a brief introduction, in lab 2, you will study ASLR in detail and learn how to defeat ASLR. For the purpose of this lab, you should simply turn off ASLR (in lab 2, you'll perform attacks when ASLR is effective), which will make your attack easier to achieve. To turn off ASLR, you can run these commands:
  $ su root
  Password : (enter root password)
  # sysctl -w kernel.randomize_va_space=0

Exercise 3. Turn off the address space layout randomization, and then do exercise 1 again, write down the three addresses in args.txt, are those three addresses same or not?

Buffer Overflow and Shellcode

A buffer overflow occurs when data written to a buffer exceeds the length of the buffer, so that corrupting data values in memory addresses adjacent the end of the buffer. This often occurs when copying data into a buffer without sufficient bounds checking. You can refer to Aleph One's famous article to figure out how buffer overflows work.

Now, you run the program stack1, just like below.

   $ ./stack1 aaaaaaaaaa
   Returned Properly
   $ ./stack1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
   Segmentation fault
If you don't observe Segmentation fault, just increase the number of the input as. Here, the message Segmentation fault indicates that your program crashed due to invalid memory access (for instance, refer to memory address 0).

Exercise 4. Use gdb, to print the value of the register %eip when the program crashes. How does the program run to this address?

Challenge! Try to write a C program which prints every return address in the call stack until the invocation of the current function. This is often called a backtrace. This behaves like the bt command in the gdb. Hint: just as the following picture shows, the stack is simply a list with %ebp as the next pointer.

You can overwrite the return address with any valid address, instead of eight as (0x61616161). Interestingly, you can supply the starting address of the current buffer being overflowed, whose address has been studied in exercise 1. More interestingly, if the buffer contains some binary code, that code will be executed. Any binary code can be supplied, especially, Alpha One offers a binary code to pop a shell, so this kind of binary code is often called shellcode, although such kind of binary code can do much more interesting things (attacks).

Now compile the supplied C program test-shell.c, and run it to pop a shell.

  $ gcc -z execstack test-shell.c -o test-shell   
  $ ./test-shell
  sh-3.2$ id
  uid=1000(seed) gid=1000(seed) groups=4(adm),20(dialout),24(cdrom),
  46(plugdev),106(lpadmin),121(admin),122(sambashare),1000(seed)
  sh-3.2$ exit
  $ 
The -z execstack option will mark the stack segment executable, which you'll study in detail in lab 2.

Exercise 5. The shellcode we offered can pop up a shell, Now it's your turn to attack the C program named stack.c using shellcode, you will get a shell if you succeed. You should compile and run your program as follows:

   $ gcc -g -z execstack -fno-stack-protector stack2.c -o stack2
   $ ./stack2
   sh-3.2$ id
   uid=1000(seed) gid=1000(seed) groups=4(adm),20(dialout),24(cdrom),
   46(plugdev),106(lpadmin),121(admin),122(sambashare),1000(seed)
   sh-3.2$ exit
   $ 
Here, the -fno-stack-protector option will disable gcc's stack canary. Hint: you can use the gdb when necessary, but keep in mind that there are some minor differences between the result from gdb and that from the stand-alone executable.

Challenge! Write other kind of (more interesting) shellcode, do whatever you want to do.

Part B: Buffer Overflows in the Touchstone Web Server

In this part of the lab, you will explore how buffer overflows happen in real-world and how to exploit them. To make the discussion concrete and realistic, you will study a small web server called Touchstone. The touchstone web server is realistic enough to serve static pages (though you can extend it with other features), and meanwhile small enough whose source code can be studied very quickly. We have left some bugs and vulnerabilities in the touchstone web server, some of which will be studied in this lab, whereas others will be studied in future labs.

The Touchstone Web Server

All the source code for touchstone is stored in the code repository. Now compile the touchstone web server and deploy it:
   $ cd web-server
   $ make
   $ ./touchstone 
Open your browser to input this URL http://127.0.0.1:8080, you will get a simple "hello, world" page. If that fails, try to re-run the web server like as ./touchstone 8899, and your browser URL should input http://127.0.0.1:8899. Contact us, if you still have problems.

Exercise 6. Study the web server's code, and look for code vulnerability which can be exploited to crash the server by buffer overflows, pay special attention to the file parse.c. Write down a description of each vulnerability in the file named bugs.txt.

For each vulnerability, how you would construct the input (i.e., the HTTP request) to overflow the buffer, Locate at least one vulnerabilities. Here is a tutorial of the HTTP protocol, you can focus on the GET request.

Attack the Web Server

Even though the vulnerability has been detected in the web server, it's still difficult for you to crash the server, because your browser will do most of the dirty work for you that you can not control, that is, you can only do good things with such a browser. So, as a hacker, you have to write your own browser from scratch. We have offered you a simple browser in the file browser.c, basically, this browser will construct an http request and then send to the web server, waiting for the server's response. The browser is simple enough that is does not come with an html render, however, it's not difficult to add an html render engine and a UI to make it more realistic. In fact, there are many open source html render engine, say webkit. You are encouraged to do so, if you're interested in.

Exercise 7. For the buffer overflow vulnerability you've found, construct an input to send to the touchstone web server, your goal is to crash the web server (the http server daemon). Note: if you're successful to crash the web server, your browser will remain dead-waiting to receive data from the server. Don't forget that any valid request must end up with \r\n\r\n.

Crashing the web server is just the first step, now you should try to do some thing interesting, say, to delete some sensitive files (for example, the grades.txt). To start with, you can use the program create-shellcode.c, to construct your shellcode, you may have to modify the file according to your need. And you can copy your shellcode to the C program test-shell.c.

Exercise 8. Perform your attack by constructing an exploit that hijacks control flow of the web server and unlink (delete) grades.txt. Remember that the web server is on your computer, so you should create a file named grades.txt first.

Challenge! Write a remote shellcode, so that you can gain control of a remote machine. Remote shellcode is used when an attacker wants to target a vulnerable process running on another machine on a local network or intranet. If successfully executed, the shellcode can provide the attacker access to the target machine across the network. Remote shellcodes normally use standard TCP/IP socket connections to allow the attacker access to the shell on the target machine. Such shellcode can be categorised based on how this connection is set up: if the shellcode can establish this connection, it is called a "reverse shell" or a connect-back shellcode because the shellcode connects back to the attacker's machine. To bypass the firewalls, you can use the port reuse techniques.

Part C: Fixing buffer overflow

The source of buffer overflow vulnerability comes from the web server's source code, so you should realize the importance to write secure code from the first place, though it's, nevertheless to say, not easy. For the specific buffer overflows in this lab, you can fix buffer overflows relatively easily by modifying the source code. If you can not gain access to the source code, say your Windows has a buffer overflow (that's often the case), you will have to wait for M$ to publish a security update.

Exercise 9. Try to fix the buffer overflow vulnerabilities of the touchstone web server. You can use whatever techniques to achieve this, say use safe string copying function strncpy or to allocate the buffer in the heap but not on the stack. And re-do the attack, observe whether or not your attack will succeed.

Handin

This completes the lab. Remember to hand in your solution to the information system.