internal FS tests [fs/test.c]: OK (2.2s)
fs i/o: OK
check_bc: OK
check_super: OK
check_bitmap: OK
alloc_block: OK
file_open: OK
file_get_block: OK
file_flush/file_truncate/file rewrite: OK
testfile: OK (2.5s)
serve_open/file_stat/file_close: OK
file_read: OK
file_write: OK
file_read after file_write: OK
open: OK
large file: OK
spawn via spawnhello: OK (0.8s)
Protection I/O space: OK (1.7s)
PTE_SHARE [testpteshare]: OK (1.0s)
PTE_SHARE [testfdsharing]: OK (1.8s)
start the shell [icode]: Timeout! OK (31.5s)
testshell: OK (1.3s)
primespipe: OK (5.7s)
Score: 150/150
Ta-dah!
JOS is an exokernel O/S, and its file system runs as a user environment server. In order to allow the FS server to execute privileged I/O instructions, we need to modify env_create()
to handle it differently.
Exercise 1.
Lab 5: File system, Spawn and Shelli386_init()
identifies the file system environment by passing the typeENV_TYPE_FS
to your environment creation function,env_create()
. Modifyenv_create()
inenv.c
, so that it gives the file system environment I/O privilege, but never gives that privilege to any other environment.
void env_create(uint8_t *binary, enum EnvType type) { // LAB 3: Your code here. struct Env *e; env_alloc(&e, 0); load_icode(e, binary); e->env_type = type; // If this is the file server (type == ENV_TYPE_FS) give it I/O privileges. // LAB 5: Your code here. if (type == ENV_TYPE_FS) e->env_tf.tf_eflags |= FL_IOPL_3; }
The JOS FS server is limited to handle disks of size 3GB of less. Therefore, it’s safe to simply map the entire disk into FS server‘s virtual address space. It does not really load the entire disk into memory upon boot-up, of course, since that would be too slow. In stead, its buffer cache (or block cache) layer only loads blocks from the disk when it needs to.
Exercise 2. Implement the
bc_pgfault()
andflush_block()
functions infs/bc.c
.bc_pgfault()
is a page fault handler, just like the one your wrote in the previous lab for copy-on-writefork()
, except that its job is to load pages in from the disk in response to a page fault…
Theflush_block()
function should write a block out to disk if necessary.flush_block()
shouldn’t do anything if the block isn’t even in the block cache (that is, the page isn’t mapped) or if it’s not dirty…After writing the block to disk,flush_block()
should clear thePTE_D
bit usingsys_page_map()
.
static void bc_pgfault(struct UTrapframe *utf) { void *addr = (void *) utf->utf_fault_va; uint32_t blockno = ((uint32_t)addr - DISKMAP) / BLKSIZE; int r; // Check that the fault was within the block cache region if (addr < (void*)DISKMAP || addr >= (void*)(DISKMAP + DISKSIZE)) panic("page fault in FS: eip %08x, va %08x, err %04x", utf->utf_eip, addr, utf->utf_err); // Sanity check the block number. if (super && blockno >= super->s_nblocks) panic("reading non-existent block %08x\n", blockno); // Allocate a page in the disk map region, read the contents // of the block from the disk into that page. // Hint: first round addr to page boundary. fs/ide.c has code to read // the disk. // // LAB 5: you code here: addr = ROUNDDOWN(addr, PGSIZE); sys_page_alloc(thisenv->env_id, addr, PTE_SYSCALL); ide_read(blockno * BLKSECTS, addr, BLKSECTS); // Clear the dirty bit for the disk block page since we just read the // block from disk if ((r = sys_page_map(0, addr, 0, addr, uvpt[PGNUM(addr)] & PTE_SYSCALL)) < 0) panic("in bc_pgfault, sys_page_map: %e", r); // Check that the block we read was allocated. (exercise for // the reader: why do we do this *after* reading the block // in?) if (bitmap && block_is_free(blockno)) panic("reading free block %08x\n", blockno); }
Note that ide_read()
works on sectors instead of blocks. One JOS block is BLKSECTS
(currently 8) sectors.
void flush_block(void *addr) { uint32_t blockno = ((uint32_t)addr - DISKMAP) / BLKSIZE; if (addr < (void*)DISKMAP || addr >= (void*)(DISKMAP + DISKSIZE)) panic("flush_block of bad va %08x", addr); // LAB 5: Your code here. if ((va_is_mapped(addr) == 0) || (va_is_dirty(addr) == 0)) return; addr = ROUNDDOWN(addr, PGSIZE); ide_write(blockno * BLKSECTS, addr, BLKSECTS); sys_page_map(thisenv->env_id, addr, thisenv->env_id, addr, PTE_SYSCALL); }
If a block is not dirty, or even not mapped yet, we don’t flush it. After flushing it to disk, we call sys_page_map()
to clear out its dirty bit.
The JOS FS maintains a bitmap, keep track of all free blocks on the disk, as opposed to that of xv6 FS.
Exercise 3. Use
free_block()
as a model to implementalloc_block()
infs/fs.c
, which should find a free disk block in the bitmap, mark it used, and return the number of that block. When you allocate a block, you should immediately flush the changed bitmap block to disk withflush_block()
, to help file system consistency.
int alloc_block(void) { // The bitmap consists of one or more blocks. A single bitmap block // contains the in-use bits for BLKBITSIZE blocks. There are // super->s_nblocks blocks in the disk altogether. // LAB 5: Your code here. for (int i = 0; i < super->s_nblocks; i++) { if (block_is_free(i)) { bitmap[i / 32] &= ~(1 << (i % 32)); flush_block((void *)bitmap + PGSIZE * (i / BLKBITSIZE)); return i; } } return -E_NO_DISK; }
A File
structure may contain one or more blocks of data. Under the hood, chances are these data blocks are scattered all over the physical disk. We obviously don’t want a high-level function like read()
to be even aware of these details — We need a function to translate “virtual block numbers” into physical ones. That is exactly the job of file_block_walk()
. file_get_block()
is a wrapper of file_block_walk()
, and is responsible for allocating a new physical block if necessary.
static int file_block_walk(struct File *f, uint32_t filebno, uint32_t **ppdiskbno, bool alloc) { // LAB 5: Your code here. int r; if (filebno >= NDIRECT + NINDIRECT) return -E_INVAL; if (filebno < NDIRECT) { *ppdiskbno = &f->f_direct[filebno]; return 0; } filebno -= NDIRECT; if (f->f_indirect == 0) { if (alloc == 0) return -E_NOT_FOUND; if ((r = alloc_block()) < 0) // -E_NO_DISK return r; memset(diskaddr(r), 0, BLKSIZE); f->f_indirect = r; } uint32_t *f_indirect = diskaddr(f->f_indirect); *ppdiskbno = &f_indirect[filebno]; return 0; }
uint32_t **ppdiskbno
: Pointer to Pointer to a DISK Block NO. 🙂
int file_get_block(struct File *f, uint32_t filebno, char **blk) { // LAB 5: Your code here. uint32_t *pdiskbno; int r; if ((r = file_block_walk(f, filebno, &pdiskbno, 1)) < 0) return r; if (*pdiskbno == 0) { // Block is not allocate yet. if ((r = alloc_block()) < 0) // -E_NO_DISK return r; memset(diskaddr(r), 0, BLKSIZE); *pdiskbno = r; } *blk = diskaddr(*pdiskbno); return 0; }
The typical interaction between a normal user environment and the JOS FS server looks like this:
Regular env FS env +---------------+ +---------------+ | read | | file_read | | (lib/fd.c) | | (fs/fs.c) | ...|.......|.......|...|.......^.......|............... | v | | | | RPC mechanism | devfile_read | | serve_read | | (lib/file.c) | | (fs/serv.c) | | | | | ^ | | v | | | | | fsipc | | serve | | (lib/file.c) | | (fs/serv.c) | | | | | ^ | | v | | | | | ipc_send | | ipc_recv | | | | | ^ | +-------|-------+ +-------|-------+ | | +-------------------+
The communication is done on top of the IPC mechanism.
Exercise 5. Implement
serve_read()
infs/serv.c
.
int serve_read(envid_t envid, union Fsipc *ipc) { struct Fsreq_read *req = &ipc->read; struct Fsret_read *ret = &ipc->readRet; if (debug) cprintf("serve_read %08x %08x %08x\n", envid, req->req_fileid, req->req_n); // Lab 5: Your code here: int r; struct OpenFile *po; if ((r = openfile_lookup(envid, req->req_fileid, &po)) < 0) return r; if ((r = file_read(po->o_file, ret->ret_buf, req->req_n, po->o_fd->fd_offset)) < 0) return r; po->o_fd->fd_offset += r; return r; }
Don’t forget to update the fd_offset
.
Exercise 6. Implement
serve_write()
infs/serv.c
anddevfile_write()
inlib/file.c
.
int serve_write(envid_t envid, struct Fsreq_write *req) { if (debug) cprintf("serve_write %08x %08x %08x\n", envid, req->req_fileid, req->req_n); // LAB 5: Your code here. int r; struct OpenFile *po; if ((r = openfile_lookup(envid, req->req_fileid, &po)) < 0) return r; if ((r = file_write(po->o_file, req->req_buf, req->req_n, po->o_fd->fd_offset)) < 0) return r; po->o_fd->fd_offset += r; return r; }
static ssize_t devfile_write(struct Fd *fd, const void *buf, size_t n) { // Make an FSREQ_WRITE request to the file system server. Be // careful: fsipcbuf.write.req_buf is only so large, but // remember that write is always allowed to write *fewer* // bytes than requested. // LAB 5: Your code here if (n > sizeof(fsipcbuf.write.req_buf)) panic("devfile_write: invalid n\n"); fsipcbuf.write.req_fileid = fd->fd_file.id; fsipcbuf.write.req_n = n; memmove(fsipcbuf.write.req_buf, buf, n); return fsipc(FSREQ_WRITE, NULL); }
As shown in the diagram above, devfile_write()
belongs to the caller side. I simply panic()
if size_t n
is too large.
The traditional UNIX fork()
and exec()
is great, but since JOS is an exokernel O/S, both fork()
and exec()
need to be implemented in user land — and it’s pretty hard for the child to change it’s own address space using exec()
while running inside it, without any special help from the kernel. In stead, we simply let the parent handle everything for the child, using spawn()
.
Exercise 7.
spawn()
relies on the new syscallsys_env_set_trapframe()
to initialize the state of the newly created environment. Implementsys_env_set_trapframe()
inkern/syscall.c
.
static int sys_env_set_trapframe(envid_t envid, struct Trapframe *tf) { // LAB 5: Your code here. // Remember to check whether the user has supplied us with a good // address! struct Env *e; int r; if ((r = envid2env(envid, &e, 1)) < 0) return r; user_mem_assert(curenv, (void *)tf, sizeof(struct Trapframe), 0); // It's ok to be read-only. memmove(&e->env_tf, tf, sizeof(struct Trapframe)); e->env_tf.tf_cs |= 3; e->env_tf.tf_eflags |= FL_IF; e->env_tf.tf_eflags &= ~FL_IOPL_MASK; return 0; }
Be careful when manipulating these important bits in %eflags.
COW fork()
is great, but now, after introducing the FS, we actually want some pages to be shared between parent and child, for example, the file descriptor table. We use another bit in PTE_AVAIL
, called PTE_SHARE
, to tell fork()
as well as spawn()
to handle these pages differently.
Exercise 8. Change
duppage()
inlib/fork.c
to follow the new convention…Likewise, implementcopy_shared_pages()
inlib/spawn.c
.
static int duppage(envid_t ceid, unsigned pn) { // LAB 4: Your code here. int r; extern volatile pte_t uvpt[]; envid_t peid = sys_getenvid(); intptr_t va = (intptr_t)(pn * PGSIZE); if (uvpt[pn] & PTE_SHARE) { if ((r = sys_page_map(peid, (void *)va, ceid, (void *)va, uvpt[pn] & PTE_SYSCALL)) < 0) return r; } else if (uvpt[pn] & (PTE_COW|PTE_W)) { if ((r = sys_page_map(peid, (void *)va, ceid, (void *)va, (PTE_COW|PTE_U))) < 0) return r; if ((r = sys_page_map(peid, (void *)va, peid, (void *)va, (PTE_COW|PTE_U))) < 0) return r; } else { if ((r = sys_page_map(peid, (void *)va, ceid, (void *)va, PTE_U)) < 0) return r; } return 0; }
static int copy_shared_pages(envid_t child) { // LAB 5: Your code here. extern volatile pde_t uvpd[]; extern volatile pte_t uvpt[]; int r; for (uintptr_t va = 0; va < UTOP;) { if ((uvpd[va >> PDXSHIFT] & PTE_P) == 0) { // Page table not mapped. va += NPTENTRIES * PGSIZE; continue; } int perm = uvpt[va >> PTXSHIFT] & PTE_SYSCALL; if ((perm & PTE_P) == 0) { // Page not mapped. va += PGSIZE; continue; } if (perm & PTE_SHARE) { if ((r = sys_page_map(thisenv->env_id, (void *)va, child, (void *)va, perm)) < 0) return r; } va += PGSIZE; } return 0; }
In addition to the JOS monitor, we now need to handle keyboard interrupts for user environments as well:
Exercise 9. In your
kern/trap.c
, callkbd_intr()
to handle trapIRQ_OFFSET+IRQ_KBD
andserial_intr()
to handle trapIRQ_OFFSET+IRQ_SERIAL
.
// Handle keyboard and serial interrupts. // LAB 5: Your code here. case IRQ_OFFSET + IRQ_KBD: kbd_intr(); break; case IRQ_OFFSET + IRQ_SERIAL: serial_intr(); break;
Finally we can run a shell on JOS!
Exercise 10. Add I/O redirection for < to
user/sh.c
.
In runcmd()
:
// LAB 5: Your code here. if ((fd = open(t, O_RDONLY)) < 0) { cprintf("open %s for read: %e", t, fd); exit(); } if (fd != 0) { dup(fd, 0); close(fd); } break;
Now if we run JOS:
peilin@PWN:~/6.828/2018/lab$ make run-icode-nox CPUS=4
…
FS is running
icode startup
FS can do I/O
icode: open /motd
Device 1 presence: 1
block cache is good
superblock is good
bitmap is good
alloc_block is good
file_open is good
file_get_block is good
file_flush is good
file_truncate is good
file rewrite is good
icode: read /motd
This is /motd, the message of the day.
Welcome to the JOS kernel, now with a file system!
icode: close /motd
icode: spawn /init
icode: exiting
init: running
init: data seems okay
init: bss seems okay
init: args: ‘init’ ‘initarg1’ ‘initarg2’
init: running sh
init: starting sh
user@computer$ sh < script
This is from the script.
1 Lorem ipsum dolor sit amet, consectetur
2 adipisicing elit, sed do eiusmod tempor
3 incididunt ut labore et dolore magna
4 aliqua. Ut enim ad minim veniam, quis
5 nostrud exercitation ullamco laboris
6 nisi ut aliquip ex ea commodo consequat.
7 Duis aute irure dolor in reprehenderit
8 in voluptate velit esse cillum dolore eu
9 fugiat nulla pariatur. Excepteur sint
10 occaecat cupidatat non proident, sunt in
11 culpa qui officia deserunt mollit anim
12 id est laborum.
These are my file descriptors.
fd 0: name script isdir 0 size 132 dev file
fd 1: name <cons> isdir 0 size 0 dev cons
This is the end of the script.
$
Beautiful.
This completes the lab. See you next time!