In this lab, you will start building a kernel called PIOS, for Parallel Instructional Operating System. PIOS uses some of the same source code as xv6, and is also derived from JOS, the 6.828 kernel, but is structured quite differently from either xv6 or JOS. Like xv6, PIOS supports multiprocessor and multicore systems. While xv6 closely follows a classic Unix design and JOS follows an exokernel philosophy, PIOS combines concepts from classic microkernels like L4 with new ideas inspired by the growing need to support large-scale parallelism on modern computers. You will learn more about the design of PIOS gradually as you work through this course.
This first lab focuses on getting a skeleton PIOS kernel up and running to a point where it provides a basic, functional kernel development environment. The lab contains four parts:
The files you will need for this and subsequent lab assignments in this course are distributed using the Git version control system. To learn more about Git, take a look at the Git tutorial and user's manual. Or if you are already familiar with other version control systems, you may find this CS-oriented overview of Git useful.
The initial PIOS repository is hosted on GitHub. First, check out it to your local repository (e.g. /home/john.doe/pios/):
$ mkdir ~/pios $ cd ~/pios $ git init Initialized empty Git repository in /home/john.doe/pios/.git/ $ git remote add upstream https://github.com/bford/PIOS.git $ git fetch upstream remote: Reusing existing pack: 217, done. remote: Total 217 (delta 0), reused 0 (delta 0) Receiving objects: 100% (217/217), 174.90 KiB | 56.00 KiB/s, done. Resolving deltas: 100% (55/55), done. From https://github.com/bford/PIOS * [new branch] lab1 -> upstream/lab1 * [new branch] lab2 -> upstream/lab2 * [new branch] lab3 -> upstream/lab3 * [new branch] lab4 -> upstream/lab4 * [new branch] lab5 -> upstream/lab5The code base of n'th lab is put in branch labn. To switch from branch of your own implementation to one of your team's solution, you can use 'git checkout' to jump between your local branches:
$ git checkout lab1and check which branch you are currently working on:
$ git branch * lab1
Git also allows you to keep track of the changes you make to the code. For example, if you are finished with one of the exercises, and want to checkpoint your progress, you can commit your changes by running:
$ git commit -am 'my solution for lab1 exercise1' Created commit 60d2135: my solution for lab1 exercise1 1 files changed, 1 insertions(+), 0 deletions(-) $It is recommended to commit your changes after finishing each exercise and before jumping to another branch.
You can keep track of your changes by using the git diff command. Running git diff will display the changes to your code since your last commit, and git diff upstream/lab1 will display the changes relative to the initial code supplied for this lab. Here, upstream/lab1 is the name of the git branch with the initial code you downloaded from GitHub.
The same compilers and simulators you are already using to explore xv6 will work for the labs. Once you have the Git repository cloned in your account, you can build the PIOS skeleton by typing
$ makeand you can run or debug it under QEMU by typing make qemu or make qemu-gdb, respectively, just as with xv6. The skeleton won't get very far, of course: it will quickly panic in one of the functions you will need to implement.
Note: If you can't create boot block and get error like
boot block too large: 600 bytes (max 510)which says that the boot block is too large to reside in the 0st disk sector. Please download and apply our patch:
$ git apply Makefrag_bootloader.patch $ make clean; makewhich should work. Contact the course staffs, if there is still problems.
In this lab you may find GCC's inline assembly language feature useful, although it is also possible to complete the lab without using it. At the very least, you will need to be able to understand the fragments of inline assembly language ("asm" statements) that already exist in the source code we gave you. You can find several sources of information on GCC inline assembly language on the reference page.
Before hand in your implementation, make sure you are in the correct branch and have already committed all modifications that you are going to hand in.
$ git branch * lab1 <-- * must be before labn for the n'th lab! $ one or more 'git commit' $ git archive --format=zip -o 'your student id'_lab1.zip --prefix='your student id'_lab1/ HEAD
The last command archives everything in the last commit to file 'your student id'_lab1.zip. Upload this file to the homework submission system.
You do not need to turn in answers to any of the questions in the text of the lab. (Do answer them for yourself though! They will help with the rest of the lab.)
We will be grading your solutions with a grading program. You can run make grade to test your solutions with the grading program.
We will now start to examine the minimal PIOS kernel in a bit more detail.
PIOS uses essentially the same boot loader code as xv6,
which you examined in the lecture homeworks, so we will skip that part.
Once the boot loader loads the kernel's ELF image,
it jumps to the kernel entrypoint,
namely the start
label in kern/entry.S.
This assembly language entrypoint code
just performs basic initialization
and then calls the C function init()
in kern/init.c.
After zeroing out the kernel's uninitialized data area
(which is expected by C program loading conventions
but the boot loader didn't bother to do when it was loading the kernel),
the first thing the kernel does is initialize the console device driver
so that your kernel can produce visible output.
Most people take functions like printf() for granted, sometimes even thinking of them as "primitives" of the C language. But in an OS kernel, we have to implement all I/O ourselves.
Read through kern/cons.c, lib/cprintf.c, lib/printfmt.c, dev/kbd.c, and dev/video.c, and make sure you understand their relationship. It will become clear in later labs why printfmt.c and cprintf.c are located in the separate lib directory.
Exercise 1. We have omitted a small fragment of code - the code necessary to print octal numbers using patterns of the form "%o". Find and fill in this code fragment.
Be able to answer the following questions:
1 if (crt_pos >= CRT_SIZE) { 2 int i; 3 memcpy(crt_buf, crt_buf + CRT_COLS, 4 (CRT_SIZE - CRT_COLS) * sizeof(uint16_t)); 5 for (i = CRT_SIZE - CRT_COLS; i < CRT_SIZE; i++) 6 crt_buf[i] = 0x0700 | ' '; 7 crt_pos -= CRT_COLS; 8 }
Trace the execution of the following code step-by-step:
int x = 1, y = 3, z = 4; cprintf("x %d, y %x, z %d\n", x, y, z);
cprintf()
,
to what does fmt
point?
To what does ap
point?cons_putc
, va_arg
, and vcprintf
.
For cons_putc
, list its argument as well. For
va_arg
, list what ap
points to before and
after the call. For vcprintf
list the values of its
two arguments.unsigned int i = 0x00646c72; cprintf("H%x Wo%s", 57616, &i);What is the output? Explain how this output is arrived at in the step-by-step manner of the previous exercise. Here's an ASCII table that maps bytes to characters.
The output depends on that fact that the x86 is little-endian. If
the x86 were instead big-endian what would you set i
to in
order to yield the same output? Would you need to change
57616
to a different value?
Here's a description of little- and big-endian and a more whimsical description.
'y='
? (note: the answer is not a specific value.) Why
does this happen?
cprintf("x=%d y=%d", 3);
cprintf
or its
interface so that it would still be possible to pass it a variable
number of arguments?
Challenge Enhance the console to allow text to be printed in different colors. The traditional way to do this is to make it interpret ANSI escape sequences embedded in the text strings printed to the console, but you may use any mechanism you like. There is plenty of information on the reference page and elsewhere on the web on programming the VGA display hardware. If you're feeling really adventurous, you could try switching the VGA hardware into a graphics mode and making the console draw text onto the graphical frame buffer.
We will now explore in more detail the way the C language uses the stack on the x86, and in the process write a useful function that computes a backtrace of the stack: a list of the saved Instruction Pointer (IP) values from the nested call instructions that led to the current point of execution.
Exercise 2. Determine where the kernel initializes its stack, and exactly where in memory its stack is located. How does the kernel reserve space for its stack? And at which "end" of this reserved area is the stack pointer initialized to point to?
The x86 stack pointer (esp register) points to the lowest location on the stack that is currently in use. Everything below that location in the region reserved for the stack is free. Pushing a value onto the stack involves decreasing the stack pointer and then writing the value to the place the stack pointer points to. Popping a value from the stack involves reading the value the stack pointer points to and then increasing the stack pointer. In 32-bit mode, the stack can only hold 32-bit values, and esp is always divisible by four. Various x86 instructions, such as call, are "hard-wired" to use the stack pointer register.
The ebp (base pointer) register, in contrast, is associated with the stack primarily by software convention. On entry to a C function, the function's prologue code normally saves the previous function's base pointer by pushing it onto the stack, and then copies the current esp value into ebp for the duration of the function. If all the functions in a program obey this convention, then at any given point during the program's execution, it is possible to trace back through the stack by following the chain of saved ebp pointers and determining exactly what nested sequence of function calls caused this particular point in the program to be reached. This capability can be particularly useful, for example, when a particular function causes an assert failure or panic because bad arguments were passed to it, but you aren't sure who passed the bad arguments. A stack backtrace lets you find the offending function.
Exercise 3.
To become familiar with the C calling conventions on the x86,
set a breakpoint at debug_trace
in kern/debug.c,
and examine the state of the stack at the point it gets called.
Get GDB to give you a backtrace via the bt command,
compare GDB's backtrace with the "raw" contents of the stack,
and make sure you understand how GDB got that backtrace.
How many 32-bit words does each nested function push on the stack,
and what are those words?
The above exercise should give you the information you need
to implement debug_trace(),
which collects into an array the EIPs from
up to DEBUG_TRACEFRAMES
(10) stack frames
starting at the EBP specified in the argument.
This function will be used in multiple ways in the kernel,
helping to provide meaningful context information for debugging
when something goes seriously wrong with the system.
Note that if there are fewer than 10 frames on the stack,
debug_trace
should set the rest of the EIPs to NULL.
By studying kern/entry.S
you'll find that there is an easy way to tell when to stop.
Besides collecting the EIPs in the array passed by the caller,
you may find it useful (at least for the moment)
to make debug_trace
print not only the EIP from each frame
but the frame pointer itself and a few of each function's arguments.
For example:
Stack backtrace: ebp f0109e58 eip f0100a62 args 00000001 f0109e80 f0109e98 f0100ed2 00000031 ebp f0109ed8 eip f01000d6 args 00000000 00000000 f0100058 f0109f28 00000061 ...
Within each line of this example, the ebp value indicates the base pointer into the stack used by that function: i.e., the position of the stack pointer just after the function was entered and the function prologue code set up the base pointer. The listed eip value is the function's return instruction pointer, where control will return when the function returns. The return instruction pointer typically points to the instruction after the call instruction (why?). Finally, the five hex values listed after args are the first five arguments to the function in question, which would have been pushed on the stack just before the function was called. If the function was called with fewer than five arguments, of course, then not all five of these values will be useful. (Why can't the backtrace code detect how many arguments there actually are? How could this limitation be fixed?)
Here are a few specific points you read about in K&R Chapter 5 that are worth remembering for the following exercise and for future labs.
int *p = (int*)100
, then
(int)p + 1
and (int)(p + 1)
are different numbers: the first is 101
but
the second is 104
.
When adding an integer to a pointer, as in the second case,
the integer is implicitly multiplied by the size of the object
the pointer points to.p[i]
is defined to be the same as *(p+i)
,
referring to the i'th object in the memory pointed to by p.
The above rule for addition helps this definition work
when the objects are larger than one byte.&p[i]
is the same as (p+i)
, yielding
the address of the i'th object in the memory pointed to by p.Although most C programs never need to cast between pointers and integers, operating systems frequently do. Whenever you see an addition involving a memory address, ask yourself whether it is an integer addition or pointer addition and make sure the value being added is appropriately multiplied or not.
Exercise 4.
Implement the debug_trace
function as specified above.
When you think you have it working right,
run make grade to see if the checking code agrees,
and fix it if it doesn't.
One of the most important functions of any OS kernel is to handle processor traps that occur during execution of either its own code or application code. For this part of the lab, the first thing you should do is thoroughly familiarize yourself with the x86 interrupt and exception mechanism.
Exercise 5. Read Chapter 5, Interrupt and Exception Handling, in the IA-32 System Programming Guide, if you haven't already.
There are many types and classifications of traps, and the terminology used to describe them is unfortunately have no fully standard meanings, even within a single processor architecture such as the x86. We will distinguish between two main classes of traps: exceptions such as illegal instruction or divide by zero, whose cause is directly related to the code the processor is running; and interrupts, which are generated asynchronously by external I/O devices or other CPUs and whose cause may have nothing to do with the code that happens to be running at the time the interrupt arrives. We will generally use the term trap as a generic term referring to both exceptions and interrupts, but note that the Intel manuals uses the term differently, to refer to a specific subcategory of exceptions. When you see these terms outside of this lab, the meanings may differ further.
This part of the lab focuses on handling traps that occur while the processor is running in the x86 processor's most privileged mode, ring 0, which by convention we call "kernel mode". The next part of the lab will use traps to implement protected control transfer, allowing the kernel to enter user mode (ring 3) and later recover control safely from traps that occur in user mode.
When a kernel bug does manifest itself, there is no guarantee anything can be done - the bug could have caused the system to go into practically any state, from which no recovery of any kind might be possible. But often the bug causes the processor to take a trap at some point, and it may be possible to recover from the bug at least enough to collect and output some potentially useful information about what happened. But the processor does not know how to generate a "kernel bug report": it only knows how to take a trap; the kernel itself must handle the trap and produce the bug report. We will now create a trap handler for PIOS to help catch and report on such bugs. (You will later extend this same trap handling mechanism to handle protected control transfer, virtual memory faults, and external device interrupts).
Every widespread operating system seems to develop an entire subculture surrounding its kernel trap handling mechanism: this is what a "blue screen of death" (Windows), a "Sad Mac", or a "guru meditation" (Amiga) is. Now it's time to implement your own screen of death.
The x86 processor uses a table known as the interrupt descriptor table (IDT) to determine how to transfer control when a trap occurs. The x86 allows up to 256 different interrupt or exception entry points into the kernel, each with a different interrupt vector. A vector is a number between 0 and 256. An interrupt's vector is determined by the source of the interrupt: different devices, error conditions, and application requests to the kernel generate interrupts with different vectors. The CPU uses the vector as an index into the processor's IDT, which the kernel sets up in kernel-private memory of the kernel's choosing, much like the GDT. From the appropriate entry in this table the processor loads:
When an x86 processor takes a trap while in kernel mode, it first pushes a trap frame onto the kernel stack, to save the old values of certain registers before the trap handling mechanism modifies them. The processor then looks up the CS and EIP of the trap handler in the IDT, and transfers control to that instruction address. The following diagram illustrates the format of the basic kernel trap frame, defining the state of the kernel stack on entry to the trap handler:
+--------------------+ <---- old ESP | old EFLAGS | " - 4 | 0x00000 | old CS | " - 8 | old EIP | " - 12 +--------------------+ <---- ESP
For certain types of x86 exceptions, in addition to the basic three 32-bit words above, the processor pushes onto the stack another word containing an error code. The page fault exception, number 14, is an important example. See the x86 manuals to determine for which exception numbers the processor pushes an error code, and what the error code means in that case. When the processor pushes an error code, the stack would look as follows at the beginning of the trap handler:
+--------------------+ <---- old ESP | old EFLAGS | " - 4 | 0x00000 | old CS | " - 8 | old EIP | " - 12 | error code | " - 16 +--------------------+ <---- ESP
The x86 processor provides a special instruction, iret, to return from trap handlers. It expects the kernel's stack to look like the first figure above, with ESP pointing to the old EIP. When the processor executes an iret instruction, pops the saved values of EIP, CS, and EFLAGS off the stack and back into the corresponding registers, and resumes instruction execution at the popped EIP.
Note that when returning from a trap, the processor doesn't actually know or care whether the "old" values it is popping off the stack are really the exact same values that it originally pushed onto the stack on entry to the trap handler. Think about what would happen - for better or worse - if the kernel trap handler changes these values during its execution.
Because the processor handles a trap by pushing the old EIP (and some other state) on the stack and then transferring control to a designated instruction address, trap handling in the x86 works very analogously to procedure calls, and like ordinary procedure calls, traps can be nested. That is, if the processor is already in the process of handling one trap, and the trap handler itself causes an exception or is interrupted by an external device, then the processor merely pushes another trap frame onto the stack and transfers control to the appropriate trap handler. The processor itself doesn't maintain any state reflecting the fact that the second trap handler is nested inside the first: as far as the processor is concerned it is simply pushing a standard frame onto the stack and transferring control in a standard way whenever a trap occurs. All of the state of both interrupted activities remains on the stack, and can later be "unwound" by returning from the two trap handlers in reverse order, again analogously to ordinary procedure returns.
Nested trap handling is often extremely important in kernel development, both for its utility and its danger: it is often highly useful to allow one trap handler to be interrupted by another, but unintentional nesting can cause extremely subtle and difficult bugs. There is also one important caveat to the processor's nested trap handling capability. If the processor takes a trap while already in kernel mode, and cannot push its old state onto the kernel stack for any reason such as lack of stack space, then there is nothing the processor can do to recover, so it simply resets itself. Needless to say, the kernel should be designed so that this can't happen.
You should now have the basic information you need in order to set up the IDT and handle exceptions in PIOS. For now, you will set up the IDT to handle interrupt vectors 0-31 (the processor exceptions).
The header files inc/trap.h and kern/trap.h contain important definitions that you will need to become familiar with. The file kern/trap.h contains definitions that are strictly private to the kernel, while inc/trap.h contains definitions that may also be useful to user-level programs and libraries. The skeleton source code for PIOS's trap handling mechanism is in kern/trap.c (the C part) and kern/trapasm.S (the assembly language part).
Note: Some of the exceptions in the range 0-31 are defined by Intel to be reserved. Since they will never be generated by the processor, it doesn't really matter how you handle them. Do whatever you think is cleanest.
The overall flow of control that you should achieve is depicted below:
IDT trapasm.S trap.c +----------------+ | &handler1 |---------> handler1: trap (struct Trapframe *tf) | | // do stuff { | | call trap // handle the exception/interrupt | | // undo stuff } +----------------+ | &handler2 |--------> handler2: | | // do stuff | | call trap | | // undo stuff +----------------+ . . . +----------------+ | &handlerX |--------> handlerX: | | // do stuff | | call trap | | // undo stuff +----------------+
Each exception should have
its own handler in trapasm.S
and trap_idt_init()
should initialize the IDT with the addresses
of these handlers.
Each of the handlers should build a struct trapframe
(see inc/trap.h) on the stack and call
trap()
(in trap.c)
with a pointer to that trapframe.
trap()
handles the
exception/interrupt or dispatches to a specific
handler function.
If and when trap()
returns,
the code in trapasm.S
restores the old CPU state saved in the Trapframe
and then uses the iret
instruction
to return from the exception.
Exercise 6.
Edit trapasm.S and trap.c and
implement the features described above. The macros
TRAPHANDLER
and TRAPHANDLER_NOEC
in
trapasm.S should help you, as well as the T_*
defines in inc/trap.h. You will need to add an
entry point in trapasm.S (using those macros)
for each trap defined in inc/trap.h, and
you'll have to provide _alltraps
which the
TRAPHANDLER
macros refer to. You will
also need to modify trap_idt_init()
to initialize the
IDT to point to each of these entry points
defined in trapasm.S; the SETGATE
macro will be helpful here.
Hint: your _alltraps
should:
CPU_GDT_KDATA
into %ds and %espushl %esp
to pass a pointer to the trapframe as an argument to trap()call trap
(can trap
ever return?)
Consider using the pushal
and popal
instructions; they fit nicely with the layout of the struct
trapframe
.
We have provided a function trap_check() to test your trap handling code for a variety of traps (though by no means all the exceptions the processor can generate). Make sure it reports success: You should be able to get make grade to succeed on the trap handler test at this point.
Challenge!
You probably have a lot of very similar-looking code
right now, between the lists of TRAPHANDLER
in
trapasm.S and their installations in
trap.c. Clean this up. Change the macros in
trapasm.S to automatically generate a table for
trap.c to use. Note that you can switch between
laying down code and data in the assembler by using the
directives .text
and .data
.
You will now need to implement the complement of the trap entry code: namely the code to return from a trap.
In many kernels, including xv6, the high-level trap handling code written in C returns from a trap simply by returning from the C procedure corresponding to PIOS's trap(). PIOS follows a different convention, however. You might notice the gcc_noreturn macro (see inc/gcc.h) in the definition of the trap() function, which declares that this function "never returns", at least as far as GCC is concerned. In PIOS, kernel code returns from a trap by performing another call, to the function trap_return(), which is also declared in trap.h never to return. See trap_check_recover() for an example of how it is used. Those familiar with programming language implementation will recognize this as tail-call style programming.
Exercise 7. Implement the trap_return() function in trapasm.S. (Alternatively, you may implement it in kern/trap.c using inline assembly language if you prefer.) The trap_return() function takes a pointer to a trapframe structure as an argument, and first "unwinds" the stack to the point where that trapframe was pushed, discarding any information that has been pushed onto the stack since that point. The code then restores all the saved register state from the trapframe and returns from the trap via the iret instruction.
We have provided a function trap_check() to test your trap handling code for a variety of traps (though by no means all the exceptions the processor can generate). Make sure it reports success: You should be able to get make grade to succeed on the trap handler test at this point.
For now, we will focus on the other requirement: protecting sensitive processor state from application code. If any application could run a lidt instruction to change the IDT, for example, then the application could wrest control of trap handling away from the kernel and redirect all traps to handlers of the application's choosing. The x86 processor provides multiple privilege levels or rings to enable operating systems to protect sensitive processor state. To protect the location of the sensitive IDT, for example, the x86 architecture restricts the use of the lidt instruction to code running in ring 0; the kernel can thus prevent applications from executing lidt and other sensitive instructions by running application code at a lower privilege level. To do so, however, the kernel must still have a protected way to get into and back out of lower privilege levels.
Exercise 8. Read Chapter 4, Protection, in the IA-32 System Programming Guide, if you haven't already.
In most processor architectures including the x86, the trap handling mechanism doubles as a protected control transfer mechanism, enabling the kernel to transfer control safely to and from code running at lower privilege levels. When the processor is running in user mode (ring 3 on the x86) and takes a trap or makes an explicit system call, the processor must transfer control to the kernel, but it is crucial for protection purposes that the user mode code currently running does not get to choose arbitrarily where the kernel is entered or how. Instead, the processor ensures that the kernel can be entered only under carefully controlled conditions. On the x86, two mechanisms work together to provide this protection:
The Interrupt Descriptor Table. As you have already seen in the previous part, through the IDT the processor ensures that interrupts and exceptions can only cause the kernel to be entered at a few specific, well-defined entry-points determined by the kernel itself, and not by the code running when the interrupt or exception is taken. This mechanism effectively protects the instruction pointer (EIP) address used to handle traps.
The Task State Segment. The processor needs a place to save the old processor state before the interrupt or exception occurred, such as the original values of EIP and CS before the processor invoked the exception handler, so that the exception handler can later restore that old state and resume the interrupted code from where it left off. But this save area for the old processor state must in turn be protected from unprivileged user-mode code; otherwise buggy or malicious user code could compromise the kernel: for example, one user mode thread could change the kernel state of another thread while the latter is in a system call, or user code could simply point ESP to unmapped or read-only memory, making it impossible for the processor to push the trap frame and causing an immediate reset as described above.
For this reason, when an x86 processor takes an interrupt or trap that causes a privilege level change from user to kernel mode, it also switches to a stack in the kernel's memory. A structure called the task state segment (TSS) specifies the segment selector and address where this stack lives. The processor pushes (on this new stack) SS, ESP, EFLAGS, CS, EIP, and an optional error code. Then it loads the CS and EIP from the interrupt descriptor, and sets the ESP and SS to refer to the new stack.
Although the TSS is large and can potentially serve a variety of purposes, PIOS only uses it to define the kernel stack that the processor should switch to when it transfers from user to kernel mode. Since "kernel mode" in PIOS is privilege level 0 on the x86, the processor uses the ESP0 and SS0 fields of the TSS to define the kernel stack when entering kernel mode. PIOS doesn't use any other TSS fields.
A particularly important kernel design issue is how many kernel stacks there are, and which OS abstraction they are associated with. There are basically two common models:
For now all you really need to know is that PIOS's kernel stack is part of the cpu structure, defined in kern/cpu.h: it grows downwards (like all x86 stacks) from kstackhi towards kstacklo in the structure. Thus, the ESP field for ring 0 in a given processor's TSS needs to point to kstackhi in that processor's cpu structure.
Let's put the above pieces together and trace through an example. Suppose the processor is executing code in user mode and encounters a divide instruction that attempts to divide by zero.
CPU_GDT_KDATA
and &cpu->kstackhi
,
respectively.kstackhi
:
+--------------------+ &cpu->kstackhi | 0x00000 | old SS | " - 4 | old ESP | " - 8 | old EFLAGS | " - 12 | 0x00000 | old CS | " - 16 | old EIP | " - 20 <---- ESP +--------------------+ <---- ESP
As you can see, the trap frame the processor pushes on kernel entry from user is similar to the one it pushes when it is already in the kernel, except in this case the processor also pushes the SS and ESP registers before pushing the frame discussed in the preivous part of the lab. You can see this difference reflected in the comments in the definition of trapframe in inc/trap.h, and on the definitions of the trapframe_usize() and trapframe_ksize() macros in the same header file.
For those traps for which the processor also pushes an error code when taking a trap from kernel mode, as discussed in the last part, it pushes an error code in the same fashion for traps from user mode. For these traps, therefore, a trap from user mode will leave the kernel stack in the following state:
+--------------------+ &cpu->kstackhi | 0x00000 | old SS | " - 4 | old ESP | " - 8 | old EFLAGS | " - 12 | 0x00000 | old CS | " - 16 | old EIP | " - 20 | error code | " - 24 +--------------------+ <---- ESP
Exercise 9. Modify cpu_init() to set up the tss field in the cpu struct appropriately, and to create a TSS segment descriptor in the GDT pointing to it. The SEGDESC16 macro should make the latter part easy: note that you can use this macro in an assignment statement, as in 'c->gdt[...] = SEGDESC16(...)'. A GDT entry for the TSS named CPU_GDT_TSS has already been reserved in kern/cpu.h.
Then, at the end of cpu_init(), after the processor's GDT is loaded, load your TSS into the processor using the LTR instruction. For convenience, there is an ltr() function in inc/x86.h. Make sure your code "accepts" your TSS descriptor: if it doesn't, the LTR instruction will cause a trap, which the kernel probably won't handle very well because trap_init() hasn't yet been called at this point.
What piece of register state in the processor actually defines which privilege level it is executing in at a given moment? Many architectures use a "kernel mode" flag in a control register of some kind, but x86 processors uses the low two bits of the CS register, effectively treating privilege level as a property of the currently running code segment.
To enter user mode, we might be tempted simply to take the kernel's code segment selector, CPU_GDT_KCODE, set the bottom two bits to 1, and do an ljmp to that code segment much like cpu_init() does to load CPU_GDT_KCODE as a kernel-mode code segment. Unfortunately, this doesn't work: why?
Exercise 10. Modify the definition of cpu_boot in kern/cpu.c to create code and data segments for user mode. Other than privilege level, these segments should be identical to the corresponding kernel segments: we aren't going to do any address translation or memory protection for now.
We described above how the processor leaves user mode and enters the kernel via a trap, but how does the kernel enter user in the first place? Simple: the kernel "returns" to user mode - even if the processor has never been there before!
When the processor executes an iret instruction, it pops its standard trap frame off the stack starting with the old EIP, but it doesn't actually know or care whether it actually pushed that frame on the stack or if it got there some other way. Thus, the kernel can always manufacture a trap frame representing whatever user mode state it wants to load into the processor, and "return" from it via iret to enter user mode. (There are other ways to switch to user mode on the x86, but this is the most general method.)
Exercise 11. Modify the code at the end of init() to cause the user() function to be run in user mode, instead of just calling it as a procedure like the skeleton code does, which leaves the processor in kernel mode. To do this, you will need to create and set up a trapframe struct correctly, and call trap_return() to "return" to it. The user mode code will need a stack, which must be separate from the kernel stack since the kernel stack will be used for trap handling: we have defined an array called user_stack for this purpose.
Note: You should make sure the processor enters user mode with the I/O privilege level set to 3, by setting the FL_IOPL_MASK bits in the EFLAGS register to the value FL_IOPL_3. This will allow console output, such as the cprintf() at the beginning of user(), to work even from user mode for debugging purposes. We would not want to do this for ordinary applications, since that would give them unrestricted access to the PC's I/O space: we will later provide a way for user mode code code to produce output in a controlled fashion via system calls.
You should now see user() print 'in user()' from user mode: verify that you're really in user mode here by setting a GDB breakpoint at user and looking at the CS register at that point.
Now that your kernel has basic exception handling capabilities and can enter user mode, you will refine it to handle traps that user mode code may cause deliberately for various purposes; we refer to such traps as software interrupts. There are two traps defined by the x86 processor expressly to serve as software interrupts, and PIOS defines a third one to serve as its system call mechanism:
int3
software interrupt instruction.
into
software interrupt instruction,
if the overflow flag (FL_OF) is set when the instruction executes.
The intent is for software to execute an into
after an arithmetic operation that might overflow:
if it doesn't, execution proceeds normally,
but if it does, the overflow condition is immediately caught.
This appealing idea has never really caught on
in high-level languages, however, and is rarely used.
Nevertheless, on principle we want to ensure that PIOS
can handle into instructions in applications properly.
Exercise 12.
Modify trap_init_idt()
to allow user mode code to invoke
the T_BRKPT and T_OFLOW vectors
via software interrupt instructions.
Don't worry about T_SYSCALL for now;
we'll do that in the next lab
when we start implementing useful system calls.
The call from user() to trap_check() in user mode should now succeed, and you should be able to get make grade to succeed on the user mode test.
The operating system must keep track of which parts of physical RAM are free and which are currently in use. The final part of this lab is to construct a physical memory allocator for the kernel, so that the kernel can dynamically allocate memory and later free it. Before starting this part of the lab, be sure you have read and understood the part about memory allocation in xv6 chapter 2.
Your allocator will operate in units of 4096 bytes, called pages. Pages will become highly relevant in lab 3, where you will implement virtual memory, but for now take it on faith that pages are useful units of allocation in kernels. Your task will be to maintain data structures that record which physical pages are free and which are allocated, and how many users there are of each allocated page. You will also write the routines to allocate and free pages of memory.
Dividing up physical memory into pages of exactly equal size
will make memory allocation in the PIOS kernel substantially simpler
than it is in xv6,
or any user-level implementation of malloc()
for that matter,
because you won't have to deal with the problems
of fragmentation or searching for a sufficiently large free chunk
to fill a request for a given amount of memory.
This design simplicity has its costs:
namely, the rest of the kernel will not have any way
to allocate physically contiguous memory
for structures larger than a page after the kernel boots.
As it turns out, the PIOS kernel will not need
to allocate any data structures larger than a page after it boots.
But even some much more complex kernels - Linux, for example -
impose restrictions such as this
to keep their physical memory allocators simple and fast,
and work around these limitations using virtual memory
to make physically discontiguous memory appear virtually contiguous
to the kernel code using that memory.
You'll now write the physical page allocator. It keeps track of which
pages are free with a linked list of pageinfo
structures,
each corresponding to a physical page.
Exercise 13. In the file kern/mem.c, you must implement code for the following functions.
mem_init() mem_alloc() mem_free()
We have provided a function in the same source file,
mem_check()
, which tests your physical page allocator.
You should boot PIOS and see whether mem_check()
reports success. Fix your code so that it passes. You may find it
helpful to add your own assert()
s to verify that
your assumptions are correct.
Our checking code is certainly not guaranteed
to detect every possible bug,
and any bugs your code has that it does not detect
may easily come back to bite you in a future lab!
This lab, and all the labs in this course, will require you to do a bit of detective work to figure out exactly what you need to do. This assignment does not describe all the details of the code you'll have to add to the kernel. Look for comments in the parts of the PIOS source that you have to modify; those comments often contain specifications and hints. You will also need to look at related parts of PIOS, at the Intel manuals, and perhaps at your notes from relevant courses you may have taken previously.
This completes the lab.