If a concurrent thread causes the condition to be true, that thread must either hold the lock on the condition before the sleeping thread acquired it, or after the sleeping thread released it in sleep. If before, the sleeping thread must have seen the new condition value, and decided to sleep anyway, so it doesn’t matter if it misses the wakeup. If after, then the earliest the waker could acquire the lock on the condition is after sleep acquires ptable.lock, so that wakeup’s acquisition of pt- able.lock must wait until sleep has completely finished putting the sleeper to sleep…
xv6 book
I thought I could read English. 🙂
A few days ago I tried to start working on Lab 4: Preemptive Multitasking. However, jeez, multitasking is harder than I expected… So I decided to take a look at LEC 10 and LEC 11 first. Well, sometimes when reading the “Scheduling” chapter of the xv6 book I felt like I understood every single word in a paragraph but still couldn’t figure out what it was talking about. So…I decided to go breadth-first! 🙂 I hope I will figure out how sleep()
and wakeup()
really work in the Lab.
Anyway let’s first take a look at this homework. This one is fairly simple.
Download and read uthread.c and uthread.asm:
#include "types.h" #include "stat.h" #include "user.h" /* Possible states of a thread; */ #define FREE 0x0 #define RUNNING 0x1 #define RUNNABLE 0x2 #define STACK_SIZE 8192 #define MAX_THREAD 4 typedef struct thread thread_t, *thread_p; typedef struct mutex mutex_t, *mutex_p; struct thread { int sp; /* saved stack pointer */ char stack[STACK_SIZE]; /* the thread's stack */ int state; /* FREE, RUNNING, RUNNABLE */ }; static thread_t all_thread[MAX_THREAD]; thread_p current_thread; thread_p next_thread; extern void thread_switch(void); void thread_init(void) { // main() is thread 0, which will make the first invocation to // thread_schedule(). it needs a stack so that the first thread_switch() can // save thread 0's state. thread_schedule() won't run the main thread ever // again, because its state is set to RUNNING, and thread_schedule() selects // a RUNNABLE thread. current_thread = &all_thread[0]; current_thread->state = RUNNING; } static void thread_schedule(void) { thread_p t; /* Find another runnable thread. */ next_thread = 0; for (t = all_thread; t < all_thread + MAX_THREAD; t++) { if (t->state == RUNNABLE && t != current_thread) { next_thread = t; break; } } if (t >= all_thread + MAX_THREAD && current_thread->state == RUNNABLE) { /* The current thread is the only runnable thread; run it. */ next_thread = current_thread; } if (next_thread == 0) { printf(2, "thread_schedule: no runnable threads\n"); exit(); } if (current_thread != next_thread) { /* switch threads? */ next_thread->state = RUNNING; thread_switch(); } else next_thread = 0; } void thread_create(void (*func)()) { thread_p t; for (t = all_thread; t < all_thread + MAX_THREAD; t++) { if (t->state == FREE) break; } t->sp = (int) (t->stack + STACK_SIZE); // set sp to the top of the stack t->sp -= 4; // space for return address * (int *) (t->sp) = (int)func; // push return address on stack t->sp -= 32; // space for registers that thread_switch expects t->state = RUNNABLE; } void thread_yield(void) { current_thread->state = RUNNABLE; thread_schedule(); } static void mythread(void) { int i; printf(1, "my thread running\n"); for (i = 0; i < 100; i++) { printf(1, "my thread 0x%x\n", (int) current_thread); thread_yield(); } printf(1, "my thread: exit\n"); current_thread->state = FREE; thread_schedule(); } int main(int argc, char *argv[]) { thread_init(); thread_create(mythread); thread_create(mythread); thread_schedule(); return 0; }
Doing uthread
runs this C program. It initializes some data structures, creates two threads, and runs a scheduler. Each thread is described using a thread
structure, including its saved stack pointer int sp
, a stack (char stack[STACK_SIZE]
) as well as a state (“enum” int state
, one of FREE
, RUNNING
and RUNNABLE
).
It maintains some global variables: current_thread
, next_thread
and a thread table, all_thread[MAX_THREAD]
.
thread_init()
registers the current “thread” (main()
) as all_thread[0]
and marks it RUNNING
.
thread_create()
finds the next free thread_t
in all_thread
, sets up some registers on its stack so that later when this thread is scheduled for the first time, it will start at func()
, just like a simple trapframe.
Finally the scheduler, thread_schedue()
, iterates over all_thread
to find the next RUNNABLE
threads, updates current_thread
and next_thread
, change its state to RUNNING
, then thread_switch
to it.
If the current thread is the only RUNNABLE
thread in all_thread
, thread_schedule()
sets next_schedule
to zero, and returns back to the same thread. The next call to thread_schedule()
is guaranteed to exit()
, so the whole user process can terminate.
If an user thread wants to yield, it calls, well, thread_yield()
:
void thread_yield(void) { current_thread->state = RUNNABLE; thread_schedule(); }
thread_yield()
marks the caller thread as RUNNABLE
so that later thread_schedule()
can find it and call it again. The “main thread” created in thread_init()
sets itself to RUNNING
instead and will never be scheduled again, which is intended.
We need to implement thread_switch()
. My solution saves the status of the old thread, switch to the new stack, set current_thread
to next_thread
, set next_thread
to zero, then load the status of the new thread. Nothing fancy:
.text /* Switch from current_thread to next_thread. Make next_thread * the current_thread, and set next_thread to 0. * Use eax as a temporary register; it is caller saved. */ .globl thread_switch thread_switch: /* we do whatever we want to eax */ /* save old */ pushal movl current_thread, %eax movl %esp, (%eax) /* switch stack */ movl next_thread, %eax movl (%eax), %esp /* set current_thread to next_thread */ movl %eax, current_thread /* set next_thread to 0 */ movl $0, next_thread /* load new */ popal ret
We run the program with one CPU:
peilin@PWN:~/6.828/2018/xv6-public$ make CPUS=1 qemu-nox
qemu-system-i386 -nographic -drive file=fs.img,index=1,media=disk,format=raw -drive file=xv6.img,index=0,media=disk,format=raw -smp 1 -m 512
…
$ uthread
my thread running
my thread 0x2DA8
my thread running
my thread 0x4DB0
my thread 0x2DA8
my thread 0x4DB0
my thread 0x2DA8
my thread 0x4DB0
…
my thread 0x2DA8
my thread 0x4DB0
my thread: exit
my thread: exit
thread_schedule: no runnable threads
$
Good.
The only thing that bothered me for a while is: Where does thread_swtich()
ret
urn to, really? We are not explicitly storing the %eip. Take the normal example of one user thread thread_switch()
ing to another. From the C code we know that the calling sequence should be: (In the for
loop of mythread()
,) thread_yield()
, thread_schedule()
then thread_switch
. Let’s follow this example in uthread.asm
:
1b0: 83 ec 04 sub $0x4,%esp 1b3: ff 35 cc 8d 00 00 pushl 0x8dcc 1b9: 68 23 0a 00 00 push $0xa23 1be: 6a 01 push $0x1 1c0: e8 cb 04 00 00 call 690 <printf> current_thread->state = RUNNABLE; 1c5: a1 cc 8d 00 00 mov 0x8dcc,%eax 1ca: c7 80 04 20 00 00 02 movl $0x2,0x2004(%eax) 1d1: 00 00 00 thread_schedule(); 1d4: e8 27 ff ff ff call 100 <thread_schedule> for (i = 0; i < 100; i++) { 1d9: 83 c4 10 add $0x10,%esp 1dc: 83 eb 01 sub $0x1,%ebx 1df: 75 cf jne 1b0 <mythread+0x20>
This is the for
loop part. To my surprise, there’s no calls to thread_yield()
at all – These highlighted lines do the same effect. Anyway, at 0x1d4
we call
thread_schedule()
:
00000100 <thread_schedule>: if (t->state == RUNNABLE && t != current_thread) { 100: 8b 15 cc 8d 00 00 mov 0x8dcc,%edx next_thread = 0; 106: c7 05 d0 8d 00 00 00 movl $0x0,0x8dd0 10d: 00 00 00 for (t = all_thread; t < all_thread + MAX_THREAD; t++) { 110: b8 a0 0d 00 00 mov $0xda0,%eax if (t->state == RUNNABLE && t != current_thread) { 115: 83 b8 04 20 00 00 02 cmpl $0x2,0x2004(%eax) 11c: 74 22 je 140 <thread_schedule+0x40> for (t = all_thread; t < all_thread + MAX_THREAD; t++) { 11e: 05 08 20 00 00 add $0x2008,%eax 123: 3d c0 8d 00 00 cmp $0x8dc0,%eax 128: 75 eb jne 115 <thread_schedule+0x15> if (t >= all_thread + MAX_THREAD && current_thread->state == RUNNABLE) { 12a: 83 ba 04 20 00 00 02 cmpl $0x2,0x2004(%edx) 131: 75 43 jne 176 <thread_schedule+0x76> next_thread = 0; 133: c7 05 d0 8d 00 00 00 movl $0x0,0x8dd0 13a: 00 00 00 13d: c3 ret 13e: 66 90 xchg %ax,%ax if (t->state == RUNNABLE && t != current_thread) { 140: 39 c2 cmp %eax,%edx 142: 74 da je 11e <thread_schedule+0x1e> if (t >= all_thread + MAX_THREAD && current_thread->state == RUNNABLE) { 144: 3d c0 8d 00 00 cmp $0x8dc0,%eax next_thread = t; 149: a3 d0 8d 00 00 mov %eax,0x8dd0 if (t >= all_thread + MAX_THREAD && current_thread->state == RUNNABLE) { 14e: 73 13 jae 163 <thread_schedule+0x63> if (next_thread == 0) { 150: 85 c0 test %eax,%eax 152: 74 22 je 176 <thread_schedule+0x76> next_thread->state = RUNNING; 154: c7 80 04 20 00 00 01 movl $0x1,0x2004(%eax) 15b: 00 00 00 thread_switch(); 15e: e9 65 01 00 00 jmp 2c8 <thread_switch> ...
Highlighted lines are instructions actually executed. There’s no stack operations at all, and we jmp
to thread_switch()
, instead of call
ing it! Wow. This means next time when thread_switch()
switches to this thread, it will ret
from the call
to thread_schedule
(at 0x1d4
), and resume executing at 0x1d9
.
You may wonder what is my point for all of this. I’m just wondering: How did we tell the compiler to make sure that, whenever we jmp
to thread_switch()
, the stack top will always be the intended return address? You know, thread_schedule()
has a local variable, thread_p t
, but for some reason the compiler just didn’t make room for it on the stack (it used %eax instead).
I researched a little bit and learned about this is something called Tail Call Optimization (TCO), a kind of optimization performed by compilers. Indeed, in our case, thread_switch()
is the last statement in thread_schedule()
(in one of the if
branches, of course). Well, maybe one day I’ll learn more about compilers. 🙂
See you next time!