Select Page
divzero: OK (1.0s)
softint: OK (0.7s)
badsegment: OK (1.0s)
Part A score: 30/30

faultread: OK (0.9s)
faultreadkernel: OK (1.0s)
faultwrite: OK (1.4s)
faultwritekernel: OK (0.5s)
breakpoint: OK (3.4s)
testbss: OK (4.0s)
hello: OK (1.5s)
buggyhello: OK (4.5s)
buggyhello2: OK (4.0s)
evilhello: OK (0.5s)
Part B score: 50/50

Score: 80/80

:-/

In my last post, I walked through the xv6 system call mechanism. If you understand how that works, I’d say nothing is really new in Lab 3 – Just applying the similar idea to JOS and implementing a similar thing. Anyway, let me note down things I found interesting in this lab.

  1. Part A: User Environments and Exception Handling
  2. Part B: Page Faults, Breakpoints Exceptions, and System Calls


Part A: User Environments and Exception Handling

First, we need some more kernel data structures. Recall that in Lab 2, in mem_init(), we allocated kern_pgdir and pages using boot_alloc(), now we need one more, envs:

	pages = (struct PageInfo *) boot_alloc(npages * sizeof(struct PageInfo));
	memset(pages, 0, npages * sizeof(struct PageInfo));	

Defined in inc/env.h, a struct Env is a data structure describing a JOS environment. A JOS environment is similar to a UNIX process, but they are different.

struct Env {
	struct Trapframe env_tf;	// Saved registers
	struct Env *env_link;		// Next free Env
	envid_t env_id;			// Unique environment identifier
	envid_t env_parent_id;		// env_id of this env's parent
	enum EnvType env_type;		// Indicates special system environments
	unsigned env_status;		// Status of the environment
	uint32_t env_runs;		// Number of times environment has run

	// Address space
	pde_t *env_pgdir;		// Kernel virtual address of page dir
};

Notably, later in page_init():

	for (; i*PGSIZE < PADDR((char *)boot_alloc(0)); i++) {
		pages[i].pp_ref = 1;
	}

We use this for loop to mark pages containing these kernel data structures allocated by boot_alloc() as in-use. Since boot_alloc(0) always returns the address of the next free byte, we don’t have to manually change this line if we boot_alloc() more / less data structures before calling page_init(), which is a nice coding trick.

After that, i386_init() calls env_init(), trap_init(), env_create() and, finally, env_run(). As you can tell from their names, we do some set-ups, create a user environment, and run it. Let’s take a closer look at each one of these functions.

First, env_init() in kern/env.c:

void
env_init(void)
{
	// Set up envs array
	// LAB 3: Your code here.
	struct Env *e;
	
	for(e = envs; e < envs + NENV; e++) {
		e->env_id = 0;
		e->env_status = ENV_FREE;
		// If this is the last entry in envs, set its env_link to 0.
		e->env_link = (e == envs + NENV - 1) ? 0 : e + 1;
	}

	env_free_list = envs;

	// Per-CPU part of the initialization
	env_init_percpu();
}

It iterates through envs, sets each entry’s env_id to 0, env_status to ENV_FREE, as well as chains these entries into a singly linked list, env_free_list. Finally this function calls env_init_percpu(), which loads GDT and clears out LDT, since JOS doesn’t use LDT.

Next, trap_init() in kern/trap.c. Well…

...
	void vector0();
	SETGATE(idt[0], 1, GD_KT, vector0, 0);
...

trap_init() first sets up the IDT. My solution does the above two lines for each trap gate. 🙂 Yeah, I just don’t feel like cleaning them up. vector*() functions are defined in trapentry.S:

...
TRAPHANDLER_NOEC(vector7, T_DEVICE)
TRAPHANDLER(vector8, T_DBLFLT)
...

Where the macro TRAPHANDLER expands into:

#define TRAPHANDLER(name, num)						\
	.globl name;		/* define global symbol for 'name' */	\
	.type name, @function;	/* symbol type is function */		\
	.align 2;		/* align function definition */		\
	name:			/* function starts here */		\
	pushl $(num);							\
	jmp _alltraps

While TRAPHANDLER_NOEC expands into:

#define TRAPHANDLER_NOEC(name, num)					\
	.globl name;							\
	.type name, @function;						\
	.align 2;							\
	name:								\
	pushl $0;							\
	pushl $(num);							\
	jmp _alltraps

The CPU does not pushes that error code (the tf_err field in struct Trapframe) for some of the exceptions, so we use TRAPHANDLER_NOEC to push one more $0 for them. Later the trapframe will be passed to our trap handler as argument, and we definitely don’t want the trap handler to be aware of this hardware detail, so we do the “alignment” here.

As we see, all vector*() functions call _alltraps():

_alltraps:
	pushl %ds
	pushl %es
	pushal
	movw $0x10, %ax
	movw %ax, %ds
	movw %ax, %es
	pushl %esp
	call trap

It simply pushes more registers to complete the trapframe, loads the kernel data selector, GD_KD into %ds and %es, then pushes the trapframe pointer to be passed to trap() as argument. We will check out trap() in Part B.

Back to trap_init(), it finally calls trap_init_percpu(), whose job is to initialize and load the TSS, as well as load the IDT. Recall, if the privilege level (lower bits in %cs) is changed before and after an interruption / exception, CPU loads new %ss and %esp from TSS, and pushes old ones onto stack before anything else.

Now we are ready to actually create an user environment in env_create():

void
env_create(uint8_t *binary, enum EnvType type)
{
	// LAB 3: Your code here.
	struct Env *e;
	env_alloc(&e, 0);
	load_icode(e, binary);
}

It in turn calls env_alloc(), which picks up a struct Env from env_free_list, sets up some of its status fields, and allocates & sets up a page directory for it. Then it calls load_icode(), which loads an ELF binary from the kernel into this new user environment virtual address space. load_icode() also allocates one page of user stack for the environment. Let’s check out load_icode():

static void
load_icode(struct Env *e, uint8_t *binary)
{
	// LAB 3: Your code here.
	struct Elf *elfhdr = (struct Elf *)binary;
	
	if (elfhdr->e_magic != ELF_MAGIC)
		panic("load_icode: not a ELF file!\n");

	struct Proghdr *ph = (struct Proghdr *) ((uint8_t *)elfhdr + elfhdr->e_phoff);
	struct Proghdr *eph = ph + elfhdr->e_phnum;

	// temporarily switch to e->env_pgdir, or you will crash at memcpy()
	lcr3(PADDR(e->env_pgdir));

	for (; ph < eph; ph++) {
		if (ph->p_type == ELF_PROG_LOAD) {
			// 1) alloc region for the segment
			region_alloc(e, (void *)ph->p_va, ph->p_memsz);

			// 2) copy the segment from where it's "embedded" in the kernel
			memcpy((void *)ph->p_va, (binary + ph->p_offset), ph->p_filesz);

			if (ph->p_filesz < ph->p_memsz) {
				// 3) clear to zero the .bss section
				memset(((void *)ph->p_va + ph->p_filesz), 0, (ph->p_memsz - ph->p_filesz));
			}
		}
	}

	// switch back to kern_pgdir since you are done with it
	lcr3(PADDR(kern_pgdir));

	// I guess I also have to set tf->eip to the entrypoint?
	e->env_tf.tf_eip = elfhdr->e_entry;

	// Now map one page for the program's initial stack
	// at virtual address USTACKTOP - PGSIZE.

	// LAB 3: Your code here.
	struct PageInfo *pp = page_alloc(ALLOC_ZERO);
	if (!pp) 
		panic("load_icode: page_alloc() failed!\n");
	page_insert(e->env_pgdir, pp, (void *) (USTACKTOP - PGSIZE), PTE_U | PTE_W);
}

We’ve done the similar thing in bootmain(), but this time we “load” the binary from memory instead of hard disk.

For each program header, we check whether it is loadable by checking its p_type field. If a segment is loadable, we do the following things for it:

First: Allocate some physical pages for the segment, and map them according to the p_va field in the program header, into the page directory of this environment. My solution does this via region_alloc(), which we will also take a look at.

Second: Copies the segment from kernel image into the newly mapped memory region. Attention! We are configuring all these stuff with kern_pgdir installed, but this region is not mapped in kern_pgdir! In order to access (copy into) this memory region we need to temporarily switch to the page directory of this new environment (e->env_pgdir) first. See line 335, 353.

Third: In step 2 we only copy as many bytes as the p_filesz field suggests. If a segment’s p_memsz is larger than p_filesz, this means this segment contains uninitialized global variables, a.k.a the .bss section. We memset() these bytes to zero.

Another important thing to do is to set up the entry point.

peilin@PWN:~/6.828/2018/lab$ readelf -h obj/user/hello
ELF Header:

Entry point address: 0x800020

By looking at the e_entry field of the ELF header we know that this program wants to start executing at virtual address 0x800020. We store this entry point address into this environment’s trapframe, e->env_tf.tf_eip, which will get loaded into %eip later when we run env_pop_tf().

Finally load_icode() allocates and initializes one page for the environment as the user stack. Actually this is indicated by another program header:

peilin@PWN:~/6.828/2018/lab$ readelf -l obj/user/hello

Elf file type is EXEC (Executable file)
Entry point 0x800020
There are 4 program headers, starting at offset 52

Program Headers:
Type      Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
LOAD      0x001000 0x00200000 0x00200000 0x0409f 0x0409f RW  0x1000
LOAD      0x006020 0x00800020 0x00800020 0x01154 0x01154 R E 0x1000
LOAD      0x008000 0x00802000 0x00802000 0x0002c 0x00030 RW  0x1000
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RWE 0x10

But here we just allocate it unconditionally anyways. Also, it goes without saying that my implementation is vulnerable to malformed ELF files, but… 🙂

Oh, region_alloc().

static void
region_alloc(struct Env *e, void *va, size_t len)
{
	// LAB 3: Your code here.
	struct PageInfo *pp;

	for (void *pva = ROUNDDOWN(va, PGSIZE); pva < ROUNDUP((va + len), PGSIZE); pva += PGSIZE) {
		pp = page_alloc(0);
		if (!pp)
			panic("region_alloc: page_alloc() failed!\n");

		page_insert(e->env_pgdir, pp, pva, PTE_U | PTE_W);
	}
}

That’s it.

Now we have a ENV_RUNNABLE user environment. What’s next? Run it! Finally, env_run():

void
env_run(struct Env *e)
{
	// LAB 3: Your code here.
	if (curenv) {
		if (curenv->env_status == ENV_RUNNING)
			curenv->env_status = ENV_RUNNABLE;
	}

	curenv = e;
	e->env_status = ENV_RUNNING;
	e->env_runs++;
	lcr3(PADDR(e->env_pgdir));

	env_pop_tf(&(e->env_tf));
}

If another environment is currently running, we change its status to ENV_RUNNABLE and switch curenv to our new environment e. We update its env_status and env_runs (indicating how many times this environment has been run), install its page directory and finally call env_pop_tf() to perform the context switch.

And the user program will start executing at its entry point! Yay…

As usual user/hello prints out a line to the console. It does so by calling the sys_cputs() system call. We’ve already set up the IDT, but we haven’t implemented the trap handler yet, so the syscall won’t work.



Part B: Page Faults, Breakpoints Exceptions, and System Calls

In this part I modified the “real” trap handler, trap_dispatch():

static void
trap_dispatch(struct Trapframe *tf)
{
	// Handle processor exceptions.
	// LAB 3: Your code here.

	switch (tf->tf_trapno) {
	case T_DEBUG:
		monitor(tf);
		break;
	case T_BRKPT:
		monitor(tf);
		break;
	case T_PGFLT:
		page_fault_handler(tf);
		break;
	case T_SYSCALL:;
		uint32_t ret = syscall(tf->tf_regs.reg_eax,	\
							   tf->tf_regs.reg_edx,	\
							   tf->tf_regs.reg_ecx,	\
							   tf->tf_regs.reg_ebx,	\
							   tf->tf_regs.reg_edi,	\
							   tf->tf_regs.reg_esi);
		if (ret == -E_INVAL)
			panic("unknown syscall\n");

		tf->tf_regs.reg_eax = ret;
		break;

	default:
		// Unexpected trap: The user process or the kernel has a bug.
		print_trapframe(tf);
		if (tf->tf_cs == GD_KT)
			panic("unhandled trap in kernel\n");
		else {
			env_destroy(curenv);
			return;
		}
	}
}

Nothing new. The idea is to env_destroy() the environment if something bad happened in userland, and panic()s if something bad happened in the kernel. For example, when handling page fault (T_PGFLT) exceptions:

void
page_fault_handler(struct Trapframe *tf)
{
	uint32_t fault_va;

	// Read processor's CR2 register to find the faulting address
	fault_va = rcr2();

	// Handle kernel-mode page faults.
	// LAB 3: Your code here.

	if ((tf->tf_cs & 3) == 0) {	// kernel mode
		panic("trap_dispatch: page fault happened in kernel mode!\n");
	}

	// We've already handled kernel-mode exceptions, so if we get here,
	// the page fault happened in user mode.
	// Destroy the environment that caused the fault.

	cprintf("[%08x] user fault va %08x ip %08x\n",
		curenv->env_id, fault_va, tf->tf_eip);
	print_trapframe(tf);
	env_destroy(curenv);
}

Uh huh.

Syscall (T_SYSCALL) arguments are passed by registers in JOS.

Breakpoint exceptions (T_BRKPT) are handled quite interestingly, though: If a user program does int T_BRKPT, the CPU generates a breakpoint exception which will be caught by trap_dispatch(). trap_dispatch() drops the kernel monitor for breakpoint exceptions. I modified the kernel monitor so that it now behaves like a debugger:

int
mon_stepi(int argc, char **argv, struct Trapframe *tf)
{
	if (!tf) {
		cprintf("stepi: trapframe does not exist!\n");
		return 0;
	}

	switch (tf->tf_trapno) {
	case T_BRKPT:
		tf->tf_eflags |= FL_TF;
		return -1;
	case T_DEBUG:
		if (tf->tf_eflags & FL_TF) {
			return -1;
		}
	default:
		cprintf("stepi: monitor must be invoked via breakpoint exception!\n");
		return 0;
	}	
}
int
mon_continue(int argc, char **argv, struct Trapframe *tf)
{
	if (!tf) {
		cprintf("continue: trapframe does not exist!\n");
		return 0;
	}

	if ((tf->tf_trapno == T_DEBUG) | (tf->tf_trapno == T_BRKPT)) {
		if (tf->tf_eflags & FL_TF) {
			tf->tf_eflags &= ~FL_TF;
			return -1;
		}
	}

	cprintf("continue: monitor must be invoked via breakpoint or debug exception!\n");
	return 0;
}

monitor() breaks its infinite loop if any of the commands returned negative values. In our case, it returns to userland if it was invoked by either breakpoint or debug (explained later) exceptions.

If the Trap Flag in %eflags is enabled, the CPU single-steps one instruction before generating a debug (T_DEBUG) exception. We route debug exceptions to the kernel monitor, too. In this way, when a user program generates a breakpoint interruption:

peilin@PWN:~/6.828/2018/lab$ make run-breakpoint-nox

trap 0x00000003 Breakpoint

eip 0x00800037
K> stepi

trap 0x00000001 Debug

eip 0x00800038
K> stepi

trap 0x00000001 Debug

eip 0x0080008f

K> continue

[00001000] exiting gracefully
[00001000] free env 00001000
Destroyed the only environment – nothing more to do!
Welcome to the JOS kernel monitor!
Type ‘help’ for a list of commands.
K>

Ugh…What happened to 0x00800038…?

  800038:	c3                   	ret

Okay Okay… 🙂

backtrace raises page faults, though:

K> backtrace
Stack backtrace:
  ebp efffff00  eip f01014c4  args 00000001 efffff28 f01d4000 f010803d 00001000
         kern/monitor.c:461: monitor+342
  ebp efffff80  eip f010505d  args f01d4000 efffffbc f03bc000 00000092 eebfd000
         kern/trap.c:209: trap+316
  ebp efffffb0  eip f0105177  args efffffbc 00000000 00000000 eebfdfc0 efffffdc
         kern/syscall.c:72: syscall+0
  ebp eebfdfc0  eip 0080008f  args 00000000 00000000 eebfdff0 00800055 00000000
         lib/libmain.c:31: libmain+86
  ebp eebfdff0  eip 00800031Incoming TRAP frame at 0xeffffe74
kernel panic at kern/trap.c:294: trap_dispatch: page fault happened in kernel mode!

Welcome to the JOS kernel monitor!
Type ‘help’ for a list of commands.
K>

This is because mon_backtrace() tried to access addresses higher than USTACKTOP when trying to print out arguments of the last stack frame. Take a look at inc/memlayout.h:

 *                     +------------------------------+ 0xeebff000
 *                     |       Empty Memory (*)       | --/--  PGSIZE
 *    USTACKTOP  --->  +------------------------------+ 0xeebfe000
 *                     |      Normal User Stack       | RW/RW  PGSIZE
 *                     +------------------------------+ 0xeebfd000

0xeebfdff0 is only 0x10 bytes away from USTACKTOP.

Well, this completes this lab. See you next time!