6.828/2018 Homework: User-level Threads

If a concurrent thread causes the condition to be true, that thread must either hold the lock on the condition before the sleeping thread acquired it, or after the sleeping thread released it in sleep. If before, the sleeping thread must have seen the new condition value, and decided to sleep anyway, so it doesn’t matter if it misses the wakeup. If after, then the earliest the waker could acquire the lock on the condition is after sleep acquires ptable.lock, so that wakeup’s acquisition of pt- able.lock must wait until sleep has completely finished putting the sleeper to sleep…
xv6 book

I thought I could read English. 🙂

A few days ago I tried to start working on Lab 4: Preemptive Multitasking. However, jeez, multitasking is harder than I expected… So I decided to take a look at LEC 10 and LEC 11 first. Well, sometimes when reading the “Scheduling” chapter of the xv6 book I felt like I understood every single word in a paragraph but still couldn’t figure out what it was talking about. So…I decided to go breadth-first! 🙂 I hope I will figure out how sleep() and wakeup() really work in the Lab.

Anyway let’s first take a look at this homework. This one is fairly simple.

Download and read uthread.c and uthread.asm:

#include "types.h"
#include "stat.h"
#include "user.h"

/* Possible states of a thread; */
#define FREE        0x0
#define RUNNING     0x1
#define RUNNABLE    0x2

#define STACK_SIZE  8192
#define MAX_THREAD  4

typedef struct thread thread_t, *thread_p;
typedef struct mutex mutex_t, *mutex_p;

struct thread {
  int        sp;                /* saved stack pointer */
  char stack[STACK_SIZE];       /* the thread's stack */
  int        state;             /* FREE, RUNNING, RUNNABLE */
};
static thread_t all_thread[MAX_THREAD];
thread_p  current_thread;
thread_p  next_thread;
extern void thread_switch(void);

void 
thread_init(void)
{
  // main() is thread 0, which will make the first invocation to
  // thread_schedule().  it needs a stack so that the first thread_switch() can
  // save thread 0's state.  thread_schedule() won't run the main thread ever
  // again, because its state is set to RUNNING, and thread_schedule() selects
  // a RUNNABLE thread.
  current_thread = &all_thread[0];
  current_thread->state = RUNNING;
}

static void 
thread_schedule(void)
{
  thread_p t;

  /* Find another runnable thread. */
  next_thread = 0;
  for (t = all_thread; t < all_thread + MAX_THREAD; t++) {
    if (t->state == RUNNABLE && t != current_thread) {
      next_thread = t;
      break;
    }
  }

  if (t >= all_thread + MAX_THREAD && current_thread->state == RUNNABLE) {
    /* The current thread is the only runnable thread; run it. */
    next_thread = current_thread;
  }

  if (next_thread == 0) {
    printf(2, "thread_schedule: no runnable threads\n");
    exit();
  }

  if (current_thread != next_thread) {         /* switch threads?  */
    next_thread->state = RUNNING;
    thread_switch();
  } else
    next_thread = 0;
}

void 
thread_create(void (*func)())
{
  thread_p t;

  for (t = all_thread; t < all_thread + MAX_THREAD; t++) {
    if (t->state == FREE) break;
  }
  t->sp = (int) (t->stack + STACK_SIZE);   // set sp to the top of the stack
  t->sp -= 4;                              // space for return address
  * (int *) (t->sp) = (int)func;           // push return address on stack
  t->sp -= 32;                             // space for registers that thread_switch expects
  t->state = RUNNABLE;
}

void 
thread_yield(void)
{
  current_thread->state = RUNNABLE;
  thread_schedule();
}

static void 
mythread(void)
{
  int i;
  printf(1, "my thread running\n");
  for (i = 0; i < 100; i++) {
    printf(1, "my thread 0x%x\n", (int) current_thread);
    thread_yield();
  }
  printf(1, "my thread: exit\n");
  current_thread->state = FREE;
  thread_schedule();
}


int 
main(int argc, char *argv[]) 
{
  thread_init();
  thread_create(mythread);
  thread_create(mythread);
  thread_schedule();
  return 0;
}

Doing uthread runs this C program. It initializes some data structures, creates two threads, and runs a scheduler. Each thread is described using a thread structure, including its saved stack pointer int sp, a stack (char stack[STACK_SIZE]) as well as a state (“enum” int state, one of FREE, RUNNING and RUNNABLE).

It maintains some global variables: current_thread, next_thread and a thread table, all_thread[MAX_THREAD].

thread_init() registers the current “thread” (main()) as all_thread[0] and marks it RUNNING.

thread_create() finds the next free thread_t in all_thread, sets up some registers on its stack so that later when this thread is scheduled for the first time, it will start at func(), just like a simple trapframe.

Finally the scheduler, thread_schedue(), iterates over all_thread to find the next RUNNABLE threads, updates current_thread and next_thread, change its state to RUNNING, then thread_switch to it.

If the current thread is the only RUNNABLE thread in all_thread, thread_schedule() sets next_schedule to zero, and returns back to the same thread. The next call to thread_schedule() is guaranteed to exit(), so the whole user process can terminate.

If an user thread wants to yield, it calls, well, thread_yield():

void 
thread_yield(void)
{
  current_thread->state = RUNNABLE;
  thread_schedule();
}

thread_yield() marks the caller thread as RUNNABLE so that later thread_schedule() can find it and call it again. The “main thread” created in thread_init() sets itself to RUNNING instead and will never be scheduled again, which is intended.

We need to implement thread_switch(). My solution saves the status of the old thread, switch to the new stack, set current_thread to next_thread, set next_thread to zero, then load the status of the new thread. Nothing fancy:

	.text

/* Switch from current_thread to next_thread. Make next_thread
 * the current_thread, and set next_thread to 0.
 * Use eax as a temporary register; it is caller saved.
 */
	.globl thread_switch
thread_switch:
	/* we do whatever we want to eax */
	/* save old */
	pushal
	movl current_thread, %eax
	movl %esp, (%eax)

	/* switch stack */
	movl next_thread, %eax
	movl (%eax), %esp

	/* set current_thread to next_thread */
	movl %eax, current_thread
	
	/* set next_thread to 0 */
	movl $0, next_thread

	/* load new */
	popal
	ret

We run the program with one CPU:

peilin@PWN:~/6.828/2018/xv6-public$ make CPUS=1 qemu-nox
qemu-system-i386 -nographic -drive file=fs.img,index=1,media=disk,format=raw -drive file=xv6.img,index=0,media=disk,format=raw -smp 1 -m 512
…
$ uthread
my thread running
my thread 0x2DA8
my thread running
my thread 0x4DB0
my thread 0x2DA8
my thread 0x4DB0
my thread 0x2DA8
my thread 0x4DB0
…
my thread 0x2DA8
my thread 0x4DB0
my thread: exit
my thread: exit
thread_schedule: no runnable threads
$

Good.

The only thing that bothered me for a while is: Where does thread_swtich() return to, really? We are not explicitly storing the %eip. Take the normal example of one user thread thread_switch()ing to another. From the C code we know that the calling sequence should be: (In the for loop of mythread(),) thread_yield(), thread_schedule() then thread_switch. Let’s follow this example in uthread.asm:

 1b0:	83 ec 04             	sub    $0x4,%esp
 1b3:	ff 35 cc 8d 00 00    	pushl  0x8dcc
 1b9:	68 23 0a 00 00       	push   $0xa23
 1be:	6a 01                	push   $0x1
 1c0:	e8 cb 04 00 00       	call   690 <printf>
  current_thread->state = RUNNABLE;
 1c5:	a1 cc 8d 00 00       	mov    0x8dcc,%eax
 1ca:	c7 80 04 20 00 00 02 	movl   $0x2,0x2004(%eax)
 1d1:	00 00 00 
  thread_schedule();
 1d4:	e8 27 ff ff ff       	call   100 <thread_schedule>
  for (i = 0; i < 100; i++) {
 1d9:	83 c4 10             	add    $0x10,%esp
 1dc:	83 eb 01             	sub    $0x1,%ebx
 1df:	75 cf                	jne    1b0 <mythread+0x20>

This is the for loop part. To my surprise, there’s no calls to thread_yield() at all – These highlighted lines do the same effect. Anyway, at 0x1d4 we call thread_schedule():

00000100 <thread_schedule>:
    if (t->state == RUNNABLE && t != current_thread) {
 100:	8b 15 cc 8d 00 00    	mov    0x8dcc,%edx
  next_thread = 0;
 106:	c7 05 d0 8d 00 00 00 	movl   $0x0,0x8dd0
 10d:	00 00 00 
  for (t = all_thread; t < all_thread + MAX_THREAD; t++) {
 110:	b8 a0 0d 00 00       	mov    $0xda0,%eax
    if (t->state == RUNNABLE && t != current_thread) {
 115:	83 b8 04 20 00 00 02 	cmpl   $0x2,0x2004(%eax)
 11c:	74 22                	je     140 <thread_schedule+0x40>
  for (t = all_thread; t < all_thread + MAX_THREAD; t++) {
 11e:	05 08 20 00 00       	add    $0x2008,%eax
 123:	3d c0 8d 00 00       	cmp    $0x8dc0,%eax
 128:	75 eb                	jne    115 <thread_schedule+0x15>
  if (t >= all_thread + MAX_THREAD && current_thread->state == RUNNABLE) {
 12a:	83 ba 04 20 00 00 02 	cmpl   $0x2,0x2004(%edx)
 131:	75 43                	jne    176 <thread_schedule+0x76>
    next_thread = 0;
 133:	c7 05 d0 8d 00 00 00 	movl   $0x0,0x8dd0
 13a:	00 00 00 
 13d:	c3                   	ret    
 13e:	66 90                	xchg   %ax,%ax
    if (t->state == RUNNABLE && t != current_thread) {
 140:	39 c2                	cmp    %eax,%edx
 142:	74 da                	je     11e <thread_schedule+0x1e>
  if (t >= all_thread + MAX_THREAD && current_thread->state == RUNNABLE) {
 144:	3d c0 8d 00 00       	cmp    $0x8dc0,%eax
      next_thread = t;
 149:	a3 d0 8d 00 00       	mov    %eax,0x8dd0
  if (t >= all_thread + MAX_THREAD && current_thread->state == RUNNABLE) {
 14e:	73 13                	jae    163 <thread_schedule+0x63>
  if (next_thread == 0) {
 150:	85 c0                	test   %eax,%eax
 152:	74 22                	je     176 <thread_schedule+0x76>
    next_thread->state = RUNNING;
 154:	c7 80 04 20 00 00 01 	movl   $0x1,0x2004(%eax)
 15b:	00 00 00 
    thread_switch();
 15e:	e9 65 01 00 00       	jmp    2c8 <thread_switch>
...

Highlighted lines are instructions actually executed. There’s no stack operations at all, and we jmp to thread_switch(), instead of calling it! Wow. This means next time when thread_switch() switches to this thread, it will ret from the call to thread_schedule (at 0x1d4), and resume executing at 0x1d9.

You may wonder what is my point for all of this. I’m just wondering: How did we tell the compiler to make sure that, whenever we jmp to thread_switch(), the stack top will always be the intended return address? You know, thread_schedule() has a local variable, thread_p t, but for some reason the compiler just didn’t make room for it on the stack (it used %eax instead).

I researched a little bit and learned about this is something called Tail Call Optimization (TCO), a kind of optimization performed by compilers. Indeed, in our case, thread_switch() is the last statement in thread_schedule() (in one of the if branches, of course). Well, maybe one day I’ll learn more about compilers. 🙂

See you next time!

Post Views: 2,028

6.828/2018 Homework: User-level Threads

Share this:

Submit a Comment Cancel reply