divzero: OK (1.0s)
softint: OK (0.7s)
badsegment: OK (1.0s)
Part A score: 30/30
faultread: OK (0.9s)
faultreadkernel: OK (1.0s)
faultwrite: OK (1.4s)
faultwritekernel: OK (0.5s)
breakpoint: OK (3.4s)
testbss: OK (4.0s)
hello: OK (1.5s)
buggyhello: OK (4.5s)
buggyhello2: OK (4.0s)
evilhello: OK (0.5s)
Part B score: 50/50
Score: 80/80
:-/
In my last post, I walked through the xv6 system call mechanism. If you understand how that works, I’d say nothing is really new in Lab 3 – Just applying the similar idea to JOS and implementing a similar thing. Anyway, let me note down things I found interesting in this lab.
- Part A: User Environments and Exception Handling
- Part B: Page Faults, Breakpoints Exceptions, and System Calls
Part A: User Environments and Exception Handling
First, we need some more kernel data structures. Recall that in Lab 2, in mem_init()
, we allocated kern_pgdir
and pages
using boot_alloc()
, now we need one more, envs
:
pages = (struct PageInfo *) boot_alloc(npages * sizeof(struct PageInfo)); memset(pages, 0, npages * sizeof(struct PageInfo));
Defined in inc/env.h
, a struct Env
is a data structure describing a JOS environment. A JOS environment is similar to a UNIX process, but they are different.
struct Env { struct Trapframe env_tf; // Saved registers struct Env *env_link; // Next free Env envid_t env_id; // Unique environment identifier envid_t env_parent_id; // env_id of this env's parent enum EnvType env_type; // Indicates special system environments unsigned env_status; // Status of the environment uint32_t env_runs; // Number of times environment has run // Address space pde_t *env_pgdir; // Kernel virtual address of page dir };
Notably, later in page_init()
:
for (; i*PGSIZE < PADDR((char *)boot_alloc(0)); i++) { pages[i].pp_ref = 1; }
We use this for
loop to mark pages containing these kernel data structures allocated by boot_alloc()
as in-use. Since boot_alloc(0)
always returns the address of the next free byte, we don’t have to manually change this line if we boot_alloc()
more / less data structures before calling page_init()
, which is a nice coding trick.
After that, i386_init()
calls env_init()
, trap_init()
, env_create()
and, finally, env_run()
. As you can tell from their names, we do some set-ups, create a user environment, and run it. Let’s take a closer look at each one of these functions.
First, env_init()
in kern/env.c
:
void env_init(void) { // Set up envs array // LAB 3: Your code here. struct Env *e; for(e = envs; e < envs + NENV; e++) { e->env_id = 0; e->env_status = ENV_FREE; // If this is the last entry in envs, set its env_link to 0. e->env_link = (e == envs + NENV - 1) ? 0 : e + 1; } env_free_list = envs; // Per-CPU part of the initialization env_init_percpu(); }
It iterates through envs
, sets each entry’s env_id
to 0
, env_status
to ENV_FREE
, as well as chains these entries into a singly linked list, env_free_list
. Finally this function calls env_init_percpu()
, which loads GDT and clears out LDT, since JOS doesn’t use LDT.
Next, trap_init()
in kern/trap.c
. Well…
... void vector0(); SETGATE(idt[0], 1, GD_KT, vector0, 0); ...
trap_init()
first sets up the IDT. My solution does the above two lines for each trap gate. 🙂 Yeah, I just don’t feel like cleaning them up. vector*()
functions are defined in trapentry.S
:
... TRAPHANDLER_NOEC(vector7, T_DEVICE) TRAPHANDLER(vector8, T_DBLFLT) ...
Where the macro TRAPHANDLER
expands into:
#define TRAPHANDLER(name, num) \ .globl name; /* define global symbol for 'name' */ \ .type name, @function; /* symbol type is function */ \ .align 2; /* align function definition */ \ name: /* function starts here */ \ pushl $(num); \ jmp _alltraps
While TRAPHANDLER_NOEC
expands into:
#define TRAPHANDLER_NOEC(name, num) \ .globl name; \ .type name, @function; \ .align 2; \ name: \ pushl $0; \ pushl $(num); \ jmp _alltraps
The CPU does not pushes that error code (the tf_err
field in struct Trapframe
) for some of the exceptions, so we use TRAPHANDLER_NOEC
to push
one more $0
for them. Later the trapframe will be passed to our trap handler as argument, and we definitely don’t want the trap handler to be aware of this hardware detail, so we do the “alignment” here.
As we see, all vector*()
functions call _alltraps()
:
_alltraps: pushl %ds pushl %es pushal movw $0x10, %ax movw %ax, %ds movw %ax, %es pushl %esp call trap
It simply push
es more registers to complete the trapframe, loads the kernel data selector, GD_KD
into %ds and %es, then push
es the trapframe pointer to be passed to trap()
as argument. We will check out trap()
in Part B.
Back to trap_init()
, it finally calls trap_init_percpu()
, whose job is to initialize and load the TSS, as well as load the IDT. Recall, if the privilege level (lower bits in %cs) is changed before and after an interruption / exception, CPU loads new %ss and %esp from TSS, and pushes old ones onto stack before anything else.
Now we are ready to actually create an user environment in env_create()
:
void env_create(uint8_t *binary, enum EnvType type) { // LAB 3: Your code here. struct Env *e; env_alloc(&e, 0); load_icode(e, binary); }
It in turn calls env_alloc()
, which picks up a struct Env
from env_free_list
, sets up some of its status fields, and allocates & sets up a page directory for it. Then it calls load_icode()
, which loads an ELF binary from the kernel into this new user environment virtual address space. load_icode()
also allocates one page of user stack for the environment. Let’s check out load_icode()
:
static void load_icode(struct Env *e, uint8_t *binary) { // LAB 3: Your code here. struct Elf *elfhdr = (struct Elf *)binary; if (elfhdr->e_magic != ELF_MAGIC) panic("load_icode: not a ELF file!\n"); struct Proghdr *ph = (struct Proghdr *) ((uint8_t *)elfhdr + elfhdr->e_phoff); struct Proghdr *eph = ph + elfhdr->e_phnum; // temporarily switch to e->env_pgdir, or you will crash at memcpy() lcr3(PADDR(e->env_pgdir)); for (; ph < eph; ph++) { if (ph->p_type == ELF_PROG_LOAD) { // 1) alloc region for the segment region_alloc(e, (void *)ph->p_va, ph->p_memsz); // 2) copy the segment from where it's "embedded" in the kernel memcpy((void *)ph->p_va, (binary + ph->p_offset), ph->p_filesz); if (ph->p_filesz < ph->p_memsz) { // 3) clear to zero the .bss section memset(((void *)ph->p_va + ph->p_filesz), 0, (ph->p_memsz - ph->p_filesz)); } } } // switch back to kern_pgdir since you are done with it lcr3(PADDR(kern_pgdir)); // I guess I also have to set tf->eip to the entrypoint? e->env_tf.tf_eip = elfhdr->e_entry; // Now map one page for the program's initial stack // at virtual address USTACKTOP - PGSIZE. // LAB 3: Your code here. struct PageInfo *pp = page_alloc(ALLOC_ZERO); if (!pp) panic("load_icode: page_alloc() failed!\n"); page_insert(e->env_pgdir, pp, (void *) (USTACKTOP - PGSIZE), PTE_U | PTE_W); }
We’ve done the similar thing in bootmain()
, but this time we “load” the binary from memory instead of hard disk.
For each program header, we check whether it is loadable by checking its p_type
field. If a segment is loadable, we do the following things for it:
First: Allocate some physical pages for the segment, and map them according to the p_va
field in the program header, into the page directory of this environment. My solution does this via region_alloc()
, which we will also take a look at.
Second: Copies the segment from kernel image into the newly mapped memory region. Attention! We are configuring all these stuff with kern_pgdir
installed, but this region is not mapped in kern_pgdir
! In order to access (copy into) this memory region we need to temporarily switch to the page directory of this new environment (e->env_pgdir
) first. See line 335, 353
.
Third: In step 2 we only copy as many bytes as the p_filesz
field suggests. If a segment’s p_memsz
is larger than p_filesz
, this means this segment contains uninitialized global variables, a.k.a the .bss
section. We memset()
these bytes to zero.
Another important thing to do is to set up the entry point.
peilin@PWN:~/6.828/2018/lab$ readelf -h obj/user/hello
ELF Header:
…
Entry point address: 0x800020
…
By looking at the e_entry
field of the ELF header we know that this program wants to start executing at virtual address 0x800020
. We store this entry point address into this environment’s trapframe, e->env_tf.tf_eip
, which will get loaded into %eip later when we run env_pop_tf()
.
Finally load_icode()
allocates and initializes one page for the environment as the user stack. Actually this is indicated by another program header:
peilin@PWN:~/6.828/2018/lab$ readelf -l obj/user/hello
Elf file type is EXEC (Executable file)
Entry point 0x800020
There are 4 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x001000 0x00200000 0x00200000 0x0409f 0x0409f RW 0x1000
LOAD 0x006020 0x00800020 0x00800020 0x01154 0x01154 R E 0x1000
LOAD 0x008000 0x00802000 0x00802000 0x0002c 0x00030 RW 0x1000
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RWE 0x10
But here we just allocate it unconditionally anyways. Also, it goes without saying that my implementation is vulnerable to malformed ELF files, but… 🙂
Oh, region_alloc()
.
static void region_alloc(struct Env *e, void *va, size_t len) { // LAB 3: Your code here. struct PageInfo *pp; for (void *pva = ROUNDDOWN(va, PGSIZE); pva < ROUNDUP((va + len), PGSIZE); pva += PGSIZE) { pp = page_alloc(0); if (!pp) panic("region_alloc: page_alloc() failed!\n"); page_insert(e->env_pgdir, pp, pva, PTE_U | PTE_W); } }
That’s it.
Now we have a ENV_RUNNABLE
user environment. What’s next? Run it! Finally, env_run()
:
void env_run(struct Env *e) { // LAB 3: Your code here. if (curenv) { if (curenv->env_status == ENV_RUNNING) curenv->env_status = ENV_RUNNABLE; } curenv = e; e->env_status = ENV_RUNNING; e->env_runs++; lcr3(PADDR(e->env_pgdir)); env_pop_tf(&(e->env_tf)); }
If another environment is currently running, we change its status to ENV_RUNNABLE
and switch curenv
to our new environment e
. We update its env_status
and env_runs
(indicating how many times this environment has been run), install its page directory and finally call env_pop_tf()
to perform the context switch.
And the user program will start executing at its entry point! Yay…
As usual user/hello
prints out a line to the console. It does so by calling the sys_cputs()
system call. We’ve already set up the IDT, but we haven’t implemented the trap handler yet, so the syscall won’t work.
Part B: Page Faults, Breakpoints Exceptions, and System Calls
In this part I modified the “real” trap handler, trap_dispatch()
:
static void trap_dispatch(struct Trapframe *tf) { // Handle processor exceptions. // LAB 3: Your code here. switch (tf->tf_trapno) { case T_DEBUG: monitor(tf); break; case T_BRKPT: monitor(tf); break; case T_PGFLT: page_fault_handler(tf); break; case T_SYSCALL:; uint32_t ret = syscall(tf->tf_regs.reg_eax, \ tf->tf_regs.reg_edx, \ tf->tf_regs.reg_ecx, \ tf->tf_regs.reg_ebx, \ tf->tf_regs.reg_edi, \ tf->tf_regs.reg_esi); if (ret == -E_INVAL) panic("unknown syscall\n"); tf->tf_regs.reg_eax = ret; break; default: // Unexpected trap: The user process or the kernel has a bug. print_trapframe(tf); if (tf->tf_cs == GD_KT) panic("unhandled trap in kernel\n"); else { env_destroy(curenv); return; } } }
Nothing new. The idea is to env_destroy()
the environment if something bad happened in userland, and panic()
s if something bad happened in the kernel. For example, when handling page fault (T_PGFLT
) exceptions:
void page_fault_handler(struct Trapframe *tf) { uint32_t fault_va; // Read processor's CR2 register to find the faulting address fault_va = rcr2(); // Handle kernel-mode page faults. // LAB 3: Your code here. if ((tf->tf_cs & 3) == 0) { // kernel mode panic("trap_dispatch: page fault happened in kernel mode!\n"); } // We've already handled kernel-mode exceptions, so if we get here, // the page fault happened in user mode. // Destroy the environment that caused the fault. cprintf("[%08x] user fault va %08x ip %08x\n", curenv->env_id, fault_va, tf->tf_eip); print_trapframe(tf); env_destroy(curenv); }
Uh huh.
Syscall (T_SYSCALL
) arguments are passed by registers in JOS.
Breakpoint exceptions (T_BRKPT
) are handled quite interestingly, though: If a user program does int T_BRKPT
, the CPU generates a breakpoint exception which will be caught by trap_dispatch()
. trap_dispatch()
drops the kernel monitor for breakpoint exceptions. I modified the kernel monitor so that it now behaves like a debugger:
int mon_stepi(int argc, char **argv, struct Trapframe *tf) { if (!tf) { cprintf("stepi: trapframe does not exist!\n"); return 0; } switch (tf->tf_trapno) { case T_BRKPT: tf->tf_eflags |= FL_TF; return -1; case T_DEBUG: if (tf->tf_eflags & FL_TF) { return -1; } default: cprintf("stepi: monitor must be invoked via breakpoint exception!\n"); return 0; } }
int mon_continue(int argc, char **argv, struct Trapframe *tf) { if (!tf) { cprintf("continue: trapframe does not exist!\n"); return 0; } if ((tf->tf_trapno == T_DEBUG) | (tf->tf_trapno == T_BRKPT)) { if (tf->tf_eflags & FL_TF) { tf->tf_eflags &= ~FL_TF; return -1; } } cprintf("continue: monitor must be invoked via breakpoint or debug exception!\n"); return 0; }
monitor()
breaks its infinite loop if any of the commands returned negative values. In our case, it returns to userland if it was invoked by either breakpoint or debug (explained later) exceptions.
If the Trap Flag in %eflags is enabled, the CPU single-steps one instruction before generating a debug (T_DEBUG
) exception. We route debug exceptions to the kernel monitor, too. In this way, when a user program generates a breakpoint interruption:
peilin@PWN:~/6.828/2018/lab$ make run-breakpoint-nox
…
trap 0x00000003 Breakpoint
…
eip 0x00800037
K> stepi
…
trap 0x00000001 Debug
…
eip 0x00800038
K> stepi
…
trap 0x00000001 Debug
…
eip 0x0080008f
…
K> continue
…
[00001000] exiting gracefully
[00001000] free env 00001000
Destroyed the only environment – nothing more to do!
Welcome to the JOS kernel monitor!
Type ‘help’ for a list of commands.
K>
Ugh…What happened to 0x00800038
…?
800038: c3 ret
Okay Okay… 🙂
backtrace
raises page faults, though:
K> backtrace
Stack backtrace:
ebp efffff00 eip f01014c4 args 00000001 efffff28 f01d4000 f010803d 00001000
kern/monitor.c:461: monitor+342
ebp efffff80 eip f010505d args f01d4000 efffffbc f03bc000 00000092 eebfd000
kern/trap.c:209: trap+316
ebp efffffb0 eip f0105177 args efffffbc 00000000 00000000 eebfdfc0 efffffdc
kern/syscall.c:72: syscall+0
ebp eebfdfc0 eip 0080008f args 00000000 00000000 eebfdff0 00800055 00000000
lib/libmain.c:31: libmain+86
ebp eebfdff0 eip 00800031Incoming TRAP frame at 0xeffffe74
kernel panic at kern/trap.c:294: trap_dispatch: page fault happened in kernel mode!
Welcome to the JOS kernel monitor!
Type ‘help’ for a list of commands.
K>
This is because mon_backtrace()
tried to access addresses higher than USTACKTOP
when trying to print out arguments of the last stack frame. Take a look at inc/memlayout.h
:
* +------------------------------+ 0xeebff000 * | Empty Memory (*) | --/-- PGSIZE * USTACKTOP ---> +------------------------------+ 0xeebfe000 * | Normal User Stack | RW/RW PGSIZE * +------------------------------+ 0xeebfd000
0xeebfdff0
is only 0x10
bytes away from USTACKTOP
.
Well, this completes this lab. See you next time!