Select Page
internal FS tests [fs/test.c]: OK (2.2s)
  fs i/o: OK
  check_bc: OK
  check_super: OK
  check_bitmap: OK
  alloc_block: OK
  file_open: OK
  file_get_block: OK
  file_flush/file_truncate/file rewrite: OK
testfile: OK (2.5s)
  serve_open/file_stat/file_close: OK
  file_read: OK
  file_write: OK
  file_read after file_write: OK
  open: OK
  large file: OK
spawn via spawnhello: OK (0.8s)
Protection I/O space: OK (1.7s)
PTE_SHARE [testpteshare]: OK (1.0s)
PTE_SHARE [testfdsharing]: OK (1.8s)
start the shell [icode]: Timeout! OK (31.5s)
testshell: OK (1.3s)
primespipe: OK (5.7s)
Score: 150/150

Ta-dah!


JOS is an exokernel O/S, and its file system runs as a user environment server. In order to allow the FS server to execute privileged I/O instructions, we need to modify env_create() to handle it differently.

Exercise 1. i386_init() identifies the file system environment by passing the type ENV_TYPE_FS to your environment creation function, env_create(). Modify env_create() in env.c, so that it gives the file system environment I/O privilege, but never gives that privilege to any other environment.

Lab 5: File system, Spawn and Shell
void
env_create(uint8_t *binary, enum EnvType type)
{
	// LAB 3: Your code here.
	struct Env *e;
	env_alloc(&e, 0);
	load_icode(e, binary);
	e->env_type = type;

	// If this is the file server (type == ENV_TYPE_FS) give it I/O privileges.
	// LAB 5: Your code here.
	if (type == ENV_TYPE_FS)
		e->env_tf.tf_eflags |= FL_IOPL_3;
}

The JOS FS server is limited to handle disks of size 3GB of less. Therefore, it’s safe to simply map the entire disk into FS server‘s virtual address space. It does not really load the entire disk into memory upon boot-up, of course, since that would be too slow. In stead, its buffer cache (or block cache) layer only loads blocks from the disk when it needs to.

Exercise 2. Implement the bc_pgfault() and flush_block() functions in fs/bc.c. bc_pgfault() is a page fault handler, just like the one your wrote in the previous lab for copy-on-write fork(), except that its job is to load pages in from the disk in response to a page fault…
The flush_block() function should write a block out to disk if necessary. flush_block() shouldn’t do anything if the block isn’t even in the block cache (that is, the page isn’t mapped) or if it’s not dirty…After writing the block to disk, flush_block() should clear the PTE_D bit using sys_page_map().

static void
bc_pgfault(struct UTrapframe *utf)
{
	void *addr = (void *) utf->utf_fault_va;
	uint32_t blockno = ((uint32_t)addr - DISKMAP) / BLKSIZE;
	int r;

	// Check that the fault was within the block cache region
	if (addr < (void*)DISKMAP || addr >= (void*)(DISKMAP + DISKSIZE))
		panic("page fault in FS: eip %08x, va %08x, err %04x",
		      utf->utf_eip, addr, utf->utf_err);

	// Sanity check the block number.
	if (super && blockno >= super->s_nblocks)
		panic("reading non-existent block %08x\n", blockno);

	// Allocate a page in the disk map region, read the contents
	// of the block from the disk into that page.
	// Hint: first round addr to page boundary. fs/ide.c has code to read
	// the disk.
	//
	// LAB 5: you code here:
	addr = ROUNDDOWN(addr, PGSIZE);
	sys_page_alloc(thisenv->env_id, addr, PTE_SYSCALL);
	ide_read(blockno * BLKSECTS, addr, BLKSECTS);

	// Clear the dirty bit for the disk block page since we just read the
	// block from disk
	if ((r = sys_page_map(0, addr, 0, addr, uvpt[PGNUM(addr)] & PTE_SYSCALL)) < 0)
		panic("in bc_pgfault, sys_page_map: %e", r);

	// Check that the block we read was allocated. (exercise for
	// the reader: why do we do this *after* reading the block
	// in?)
	if (bitmap && block_is_free(blockno))
		panic("reading free block %08x\n", blockno);
}

Note that ide_read() works on sectors instead of blocks. One JOS block is BLKSECTS (currently 8) sectors.

void
flush_block(void *addr)
{
	uint32_t blockno = ((uint32_t)addr - DISKMAP) / BLKSIZE;

	if (addr < (void*)DISKMAP || addr >= (void*)(DISKMAP + DISKSIZE))
		panic("flush_block of bad va %08x", addr);

	// LAB 5: Your code here.
	if ((va_is_mapped(addr) == 0) || (va_is_dirty(addr) == 0))
		return;

	addr = ROUNDDOWN(addr, PGSIZE);
	ide_write(blockno * BLKSECTS, addr, BLKSECTS);

	sys_page_map(thisenv->env_id, addr, thisenv->env_id, addr, PTE_SYSCALL);
}

If a block is not dirty, or even not mapped yet, we don’t flush it. After flushing it to disk, we call sys_page_map() to clear out its dirty bit.


The JOS FS maintains a bitmap, keep track of all free blocks on the disk, as opposed to that of xv6 FS.

Exercise 3. Use free_block() as a model to implement alloc_block() in fs/fs.c, which should find a free disk block in the bitmap, mark it used, and return the number of that block. When you allocate a block, you should immediately flush the changed bitmap block to disk with flush_block(), to help file system consistency.

int
alloc_block(void)
{
	// The bitmap consists of one or more blocks.  A single bitmap block
	// contains the in-use bits for BLKBITSIZE blocks.  There are
	// super->s_nblocks blocks in the disk altogether.

	// LAB 5: Your code here.
	for (int i = 0; i < super->s_nblocks; i++) {
		if (block_is_free(i)) {
			bitmap[i / 32] &= ~(1 << (i % 32));
			flush_block((void *)bitmap + PGSIZE * (i / BLKBITSIZE));
			return i;
		}
	}

	return -E_NO_DISK;
}

A File structure may contain one or more blocks of data. Under the hood, chances are these data blocks are scattered all over the physical disk. We obviously don’t want a high-level function like read() to be even aware of these details — We need a function to translate “virtual block numbers” into physical ones. That is exactly the job of file_block_walk(). file_get_block() is a wrapper of file_block_walk(), and is responsible for allocating a new physical block if necessary.

static int
file_block_walk(struct File *f, uint32_t filebno, uint32_t **ppdiskbno, bool alloc)
{
       // LAB 5: Your code here.
		int r;

    	if (filebno >= NDIRECT + NINDIRECT)
			return -E_INVAL;

		if (filebno < NDIRECT) {
			*ppdiskbno = &f->f_direct[filebno];
			return 0;
		}

		filebno -= NDIRECT;
		if (f->f_indirect == 0) {
			if (alloc == 0)
				return -E_NOT_FOUND;
			if ((r = alloc_block()) < 0)	// -E_NO_DISK
				return r;
			memset(diskaddr(r), 0, BLKSIZE);
			f->f_indirect = r;
		}
		uint32_t *f_indirect = diskaddr(f->f_indirect);
		*ppdiskbno = &f_indirect[filebno];
		return 0;
}

uint32_t **ppdiskbno: Pointer to Pointer to a DISK Block NO. 🙂

int
file_get_block(struct File *f, uint32_t filebno, char **blk)
{
       // LAB 5: Your code here.
		uint32_t *pdiskbno;
		int r;

		if ((r = file_block_walk(f, filebno, &pdiskbno, 1)) < 0)
			return r;
		if (*pdiskbno == 0) {	// Block is not allocate yet.
			if ((r = alloc_block()) < 0)	// -E_NO_DISK
				return r;
			memset(diskaddr(r), 0, BLKSIZE);
			*pdiskbno = r;
		}
		*blk = diskaddr(*pdiskbno);
		return 0;
}

The typical interaction between a normal user environment and the JOS FS server looks like this:

      Regular env           FS env
   +---------------+   +---------------+
   |      read     |   |   file_read   |
   |   (lib/fd.c)  |   |   (fs/fs.c)   |
...|.......|.......|...|.......^.......|...............
   |       v       |   |       |       | RPC mechanism
   |  devfile_read |   |  serve_read   |
   |  (lib/file.c) |   |  (fs/serv.c)  |
   |       |       |   |       ^       |
   |       v       |   |       |       |
   |     fsipc     |   |     serve     |
   |  (lib/file.c) |   |  (fs/serv.c)  |
   |       |       |   |       ^       |
   |       v       |   |       |       |
   |   ipc_send    |   |   ipc_recv    |
   |       |       |   |       ^       |
   +-------|-------+   +-------|-------+
           |                   |
           +-------------------+

The communication is done on top of the IPC mechanism.

Exercise 5. Implement serve_read() in fs/serv.c.

int
serve_read(envid_t envid, union Fsipc *ipc)
{
	struct Fsreq_read *req = &ipc->read;
	struct Fsret_read *ret = &ipc->readRet;

	if (debug)
		cprintf("serve_read %08x %08x %08x\n", envid, req->req_fileid, req->req_n);

	// Lab 5: Your code here:
	int r;
	struct OpenFile *po;
	
	if ((r = openfile_lookup(envid, req->req_fileid, &po)) < 0)
		return r;
	if ((r = file_read(po->o_file, ret->ret_buf, req->req_n, po->o_fd->fd_offset)) < 0)
		return r;
	po->o_fd->fd_offset += r;
	return r;
}

Don’t forget to update the fd_offset.

Exercise 6. Implement serve_write() in fs/serv.c and devfile_write() in lib/file.c.

int
serve_write(envid_t envid, struct Fsreq_write *req)
{
	if (debug)
		cprintf("serve_write %08x %08x %08x\n", envid, req->req_fileid, req->req_n);

	// LAB 5: Your code here.
	int r;
	struct OpenFile *po;

	if ((r = openfile_lookup(envid, req->req_fileid, &po)) < 0)
		return r;
	if ((r = file_write(po->o_file, req->req_buf, req->req_n, po->o_fd->fd_offset)) < 0)
		return r;
	po->o_fd->fd_offset += r;
	return r;
}
static ssize_t
devfile_write(struct Fd *fd, const void *buf, size_t n)
{
	// Make an FSREQ_WRITE request to the file system server.  Be
	// careful: fsipcbuf.write.req_buf is only so large, but
	// remember that write is always allowed to write *fewer*
	// bytes than requested.
	// LAB 5: Your code here

	if (n > sizeof(fsipcbuf.write.req_buf))
		panic("devfile_write: invalid n\n");

	fsipcbuf.write.req_fileid = fd->fd_file.id;
	fsipcbuf.write.req_n = n;
	memmove(fsipcbuf.write.req_buf, buf, n);
	
	return fsipc(FSREQ_WRITE, NULL);
}

As shown in the diagram above, devfile_write() belongs to the caller side. I simply panic() if size_t n is too large.


The traditional UNIX fork() and exec() is great, but since JOS is an exokernel O/S, both fork() and exec() need to be implemented in user land — and it’s pretty hard for the child to change it’s own address space using exec() while running inside it, without any special help from the kernel. In stead, we simply let the parent handle everything for the child, using spawn().

Exercise 7. spawn() relies on the new syscall sys_env_set_trapframe() to initialize the state of the newly created environment. Implement sys_env_set_trapframe() in kern/syscall.c.

static int
sys_env_set_trapframe(envid_t envid, struct Trapframe *tf)
{
	// LAB 5: Your code here.
	// Remember to check whether the user has supplied us with a good
	// address!
	struct Env *e;
	int r;

	if ((r = envid2env(envid, &e, 1)) < 0)
		return r;
	user_mem_assert(curenv, (void *)tf, sizeof(struct Trapframe), 0);	// It's ok to be read-only.
	
	memmove(&e->env_tf, tf, sizeof(struct Trapframe));
	e->env_tf.tf_cs |= 3;
	e->env_tf.tf_eflags |= FL_IF;
	e->env_tf.tf_eflags &= ~FL_IOPL_MASK;
	return 0;
}

Be careful when manipulating these important bits in %eflags.


COW fork() is great, but now, after introducing the FS, we actually want some pages to be shared between parent and child, for example, the file descriptor table. We use another bit in PTE_AVAIL, called PTE_SHARE, to tell fork() as well as spawn() to handle these pages differently.

Exercise 8. Change duppage() in lib/fork.c to follow the new convention…Likewise, implement copy_shared_pages() in lib/spawn.c.

static int
duppage(envid_t ceid, unsigned pn)
{
	// LAB 4: Your code here.
	int r;
	extern volatile pte_t uvpt[];
	envid_t peid = sys_getenvid();
	intptr_t va = (intptr_t)(pn * PGSIZE);

	if (uvpt[pn] & PTE_SHARE) {
		if ((r = sys_page_map(peid, (void *)va, ceid, (void *)va, uvpt[pn] & PTE_SYSCALL)) < 0)
			return r;
	} else if (uvpt[pn] & (PTE_COW|PTE_W)) {
		if ((r = sys_page_map(peid, (void *)va, ceid, (void *)va, (PTE_COW|PTE_U))) < 0)
			return r;
		if ((r = sys_page_map(peid, (void *)va, peid, (void *)va, (PTE_COW|PTE_U))) < 0)
			return r;
	} else {
		if ((r = sys_page_map(peid, (void *)va, ceid, (void *)va, PTE_U)) < 0)
			return r;
	}
	return 0;
}
static int
copy_shared_pages(envid_t child)
{
	// LAB 5: Your code here.
	extern volatile pde_t uvpd[];
	extern volatile pte_t uvpt[];
	int r;

	for (uintptr_t va = 0; va < UTOP;) {
		if ((uvpd[va >> PDXSHIFT] & PTE_P) == 0) {	// Page table not mapped.
			va += NPTENTRIES * PGSIZE;
			continue;
		}

		int perm = uvpt[va >> PTXSHIFT] & PTE_SYSCALL;
		if ((perm & PTE_P) == 0) {	// Page not mapped.
			va += PGSIZE;
			continue;
		}
		if (perm & PTE_SHARE) {
			if ((r = sys_page_map(thisenv->env_id, (void *)va, child, (void *)va, perm)) < 0)
				return r;
		}
		va += PGSIZE;
	}
	return 0;
}

In addition to the JOS monitor, we now need to handle keyboard interrupts for user environments as well:

Exercise 9. In your kern/trap.c, call kbd_intr() to handle trap IRQ_OFFSET+IRQ_KBD and serial_intr() to handle trap IRQ_OFFSET+IRQ_SERIAL.

	// Handle keyboard and serial interrupts.
	// LAB 5: Your code here.
	case IRQ_OFFSET + IRQ_KBD:
		kbd_intr();
		break;
	
	case IRQ_OFFSET + IRQ_SERIAL:
		serial_intr();
		break;

Finally we can run a shell on JOS!

Exercise 10. Add I/O redirection for < to user/sh.c.

In runcmd():

			// LAB 5: Your code here.
			if ((fd = open(t, O_RDONLY)) < 0) {
				cprintf("open %s for read: %e", t, fd);
				exit();
			}
			if (fd != 0) {
				dup(fd, 0);
				close(fd);
			}
			break;

Now if we run JOS:

peilin@PWN:~/6.828/2018/lab$ make run-icode-nox CPUS=4

FS is running
icode startup
FS can do I/O
icode: open /motd
Device 1 presence: 1
block cache is good
superblock is good
bitmap is good
alloc_block is good
file_open is good
file_get_block is good
file_flush is good
file_truncate is good
file rewrite is good
icode: read /motd
This is /motd, the message of the day.

Welcome to the JOS kernel, now with a file system!

icode: close /motd
icode: spawn /init
icode: exiting
init: running
init: data seems okay
init: bss seems okay
init: args: ‘init’ ‘initarg1’ ‘initarg2’
init: running sh
init: starting sh
user@computer$ sh < script
This is from the script.
    1 Lorem ipsum dolor sit amet, consectetur
    2 adipisicing elit, sed do eiusmod tempor
    3 incididunt ut labore et dolore magna
    4 aliqua. Ut enim ad minim veniam, quis
    5 nostrud exercitation ullamco laboris
    6 nisi ut aliquip ex ea commodo consequat.
    7 Duis aute irure dolor in reprehenderit
    8 in voluptate velit esse cillum dolore eu
    9 fugiat nulla pariatur. Excepteur sint
   10 occaecat cupidatat non proident, sunt in
   11 culpa qui officia deserunt mollit anim
   12 id est laborum.
These are my file descriptors.
fd 0: name script isdir 0 size 132 dev file
fd 1: name <cons> isdir 0 size 0 dev cons
This is the end of the script.
$

Beautiful.

This completes the lab. See you next time!