6.828/2018 Lab 6: Network Driver

testtime: OK (7.8s)
pci attach: OK (1.2s)
testoutput [5 packets]: OK (1.6s)
testoutput [100 packets]: OK (13.4s)
Part A score: 35/35
testinput [5 packets]: OK (1.9s)
testinput [100 packets]: OK (1.8s)
tcp echo server [echosrv]: OK (1.6s)
web server [httpd]:
http://localhost:26002/: OK (2.3s)
http://localhost:26002/index.html: OK (1.4s)
http://localhost:26002/random_file.txt: OK (2.4s)
Part B score: 70/70
Score: 105/105

Finally!

In this lab, we are going to extend JOS to support network services. The entire system consists of several user environments, as shown below:

Lab 6: Network Driver (default final project)

We are going to implement the parts highlighted in green, starting from a E1000 driver (from scratch!), input/output helper environments of the network server, then finally an HTTP daemon. Sounds great, let’s go! 😉

Part A: Initialization and transmitting packets
Part B: Receiving packets and the web server

Part A: Initialization and transmitting packets

We don’t have to implement the timer helper ourselves, but in order to make it work, we need to modify the kernel a little bit:

Exercise 1. Add a call to time_tick() for every clock interrupt in kern/trap.c. Implement sys_time_msec() and add it to syscall in kern/syscall.c so that user space has access to the time.

	case IRQ_OFFSET + IRQ_TIMER:
		lapic_eoi();

		// Add time tick increment to clock interrupts.
		// Be careful! In multiprocessors, clock interrupts are
		// triggered on every CPU.
		if (thiscpu == bootcpu)
			time_tick();

		sched_yield();

static int
sys_time_msec(void)
{
	return time_msec();
}

The E1000 network card is a PCI device, which needs to be initialized during system boot-up. i386_init() calls pci_init()(kern/pci.c):

int
pci_init(void)
{
	static struct pci_bus root_bus;
	memset(&root_bus, 0, sizeof(root_bus));

	return pci_scan_bus(&root_bus);
}

pci_scan_bus() scans through the PCI bus and calls pci_attach() for each detected PCI device. A PCI device can be either identified by its vendor ID and device ID (as is the case for E1000), or its class and subclass:

static int
pci_attach(struct pci_func *f)
{
	return
		pci_attach_match(PCI_CLASS(f->dev_class),
				 PCI_SUBCLASS(f->dev_class),
				 &pci_attach_class[0], f) ||
		pci_attach_match(PCI_VENDOR(f->dev_id),
				 PCI_PRODUCT(f->dev_id),
				 &pci_attach_vendor[0], f);
}

In the case of E1000, pci_attach_vendor() loops through a list called pci_attach_vendor:

static int __attribute__((warn_unused_result))
pci_attach_match(uint32_t key1, uint32_t key2,
		 struct pci_driver *list, struct pci_func *pcif)
{
	uint32_t i;

	for (i = 0; list[i].attachfn; i++) {
		if (list[i].key1 == key1 && list[i].key2 == key2) {
			int r = list[i].attachfn(pcif);
			if (r > 0)
				return r;
			if (r < 0)
				cprintf("pci_attach_match: attaching "
					"%x.%x (%p): e\n",
					key1, key2, list[i].attachfn, r);
		}
	}
	return 0;
}

Each entry in pci_attach_vendor is a pci_driver struct defined as:

// PCI driver table
struct pci_driver {
	uint32_t key1, key2;
	int (*attachfn) (struct pci_func *pcif);
};

If a detected device matches an entry (namely key1 and key2), pci_attach_match() calls its initialization function.

Therefore, we need to add an entry in pci_attach_vendor for E1000, as well as implement a function to initialize it:

Exercise 3. Implement an attach function to initialize the E1000. Add an entry to the pci_attach_vendor array in kern/pci.c to trigger your function if a matching PCI device is found (be sure to put it before the {0, 0, 0} entry that mark the end of the table). You can find the vendor ID and device ID of the 82540EM that QEMU emulates in section 5.2. You should also see these listed when JOS scans the PCI bus while booting.
For now, just enable the E1000 device via pci_func_enable(). We’ll add more initialization throughout the lab.

// pci_attach_vendor matches the vendor ID and device ID of a PCI device. key1
// and key2 should be the vendor ID and device ID respectively
struct pci_driver pci_attach_vendor[] = {
	{ PCI_E1000_VENDOR_ID, PCI_E1000_DEVICE_ID, &e1000_attach },
	{ 0, 0, 0 },
};

By “section 5.2” it means Intel’s Software Developer’s Manual for the E1000. kern/pci.c includes kern/e1000.h. We will put our code there:

// Table 5-1. Component Identification
#define PCI_E1000_VENDOR_ID     0x8086
#define PCI_E1000_DEVICE_ID     0x100e

int
e1000_attach(struct pci_func *pcif)
{
	pci_func_enable(pcif);
	return 0;
}

Well…so what does pci_func_enable() do, exactly? E1000 uses MMIO. pci_func_enable() negotiates the range of physical memory addresses (as well as the IRQ line) assigned to E1000, and stores these values in its in-memory descriptor, pci_func:

struct pci_func {
    struct pci_bus *bus;	// Primary bus for bridges

    uint32_t dev;
    uint32_t func;

    uint32_t dev_id;
    uint32_t dev_class;

    uint32_t reg_base[6];
    uint32_t reg_size[6];
    uint8_t irq_line;
};

Since MMIO regions are assigned very high physical addresses, we can’t simply access them by doing something like KADDR(pci_func.reg_base[0]). Instead, we mmio_map_region() it, just like what we did for LAPICs:

Exercise 4. In your attach function, create a virtual memory mapping for the E1000’s BAR 0 by calling mmio_map_region (which you wrote in lab 4 to support memory-mapping the LAPIC).
To test your mapping, try printing out the device status register (section 13.4.2). This is a 4 byte register that starts at byte 8 of the register space. You should get 0x80080783, which indicates a full duplex link is up at 1000 MB/s, among other things.

volatile void *e1000;

int
e1000_attach(struct pci_func *pcif)
{
	pci_func_enable(pcif);

    e1000 = mmio_map_region(pcif->reg_base[0], pcif->reg_size[0]);
	cprintf("Device Status Register: 0x%x\n", *(uint32_t *)(e1000 + DSR_OFFSET));
    return 0;
}

peilin@PWN:~/6.828/2018/lab$ make qemu-nox
…
PCI: 00:00.0: 8086:1237: class: 6.0 (Bridge device) irq: 0
PCI: 00:01.0: 8086:7000: class: 6.1 (Bridge device) irq: 0
PCI: 00:01.1: 8086:7010: class: 1.1 (Storage controller) irq: 0
PCI: 00:01.3: 8086:7113: class: 6.80 (Bridge device) irq: 9
PCI: 00:02.0: 1234:1111: class: 3.0 (Display controller) irq: 0
PCI: 00:03.0: 8086:100e: class: 2.0 (Network controller) irq: 11
PCI function 00:03.0 (8086:100e) enabled
Device Status Register: 0x80080783

Nice.

E1000 use a circular queue, called Transmit Descriptor Ring (TDR), to maintain outgoing packets. It needs to be initialized first.

Exercise 5. Perform the initialization steps described in section 14.5 (but not its subsections). Use section 13 as a reference for the registers the initialization process refers to and sections 3.3.3 and 3.4 for reference to the transmit descriptors and transmit descriptor array.

We add another function, rx_init() to e1000_attach():

int
e1000_attach(struct pci_func *pcif)
{
	pci_func_enable(pcif);

    e1000 = mmio_map_region(pcif->reg_base[0], pcif->reg_size[0]);
	cprintf("Device Status Register: 0x%x\n", *(uint32_t *)(e1000 + DSR_OFFSET));
    
    tx_init();
    return 0;
}

Okay. First of all, how does TDR work?

Transmit Descriptor Ring (TDR)

Each Transmit Descriptor (TD) in TDR is defined as:

// 3.3.3 Legacy Transmit Descriptor Format
struct td
{
    uint64_t address;
    uint16_t length;
    uint8_t cso;
    uint8_t cmd;
    uint8_t status; // RSV + STA
    uint8_t css;
    uint16_t special;
};

It consist of: The physical base address of the data buffer of the packet; the length of the packet, as well as a whole bunch of flags. 🙂

When transmitting a packet, the software (our driver) first fills in the descriptor (as well as its data buffer), then increments E1000’s Transmit Descriptor Tail (TDT) register, telling E1000 that the packet is ready to be sent. E1000 keep transmitting packets whose TD is pointed by Transmit Descriptor Head (TDH) register, then increment TDH. When TDH catches up with (equals to) TDT, E1000 knows that the transmitting queue is currently empty, and it stops transmitting.

The software has to be very careful not to move TDT more than one round ahead of TDH, which we will deal with later.

Ok good. So, how we initialize TDR?

We allocate TDR and tell E1000 where it is (TDBAL, Transmit Descriptor Base Address Low), as well as how long it is (TDLEN).
E1000 set TDH and TDT to zero after a power-on or software reset, but we assign them to zero again, just to make sure.
Other crazy flags in Transmit Control Register (TCTL) and Transmit IPG register (TIPG), which I don’t really understand… 🙁

Frankly I don’t know if my solution is 100% correct or not. For example, do we still have to set TDBAH to zero, even if we don’t use it since we are 32-bit? Anyway, it works for me. 🙂

__attribute__((__aligned__(16)))    // paragraph size
struct td tdr[NTDRENTRIES] = {0};
uint32_t tdt = 0;     // Index into tdr[] and tx_pktbufs[].

__attribute__((__aligned__(PGSIZE)))
char tx_pktbufs[NTDRENTRIES][DESC_BUF_SZ] = {0};

static void
tx_init(void)
{
    // initialize tdr
    for (int i = 0; i < NTDRENTRIES; i++)
        tdr[i].address = PADDR(&tx_pktbufs[i]);
    
    // 14.5 Transmit Initialization
    *(uint32_t *)(e1000 + TDBAL_OFFSET) = PADDR(tdr);

    if ((sizeof(tdr) & TDLEN_ALIGN_MASK) != 0)
        panic("TDLEN must be 128-byte aligned!\n");
    *(uint32_t *)(e1000 + TDLEN_OFFSET) = sizeof(tdr);

    *(uint32_t *)(e1000 + TDH_OFFSET) = 0;
    *(uint32_t *)(e1000 + TDT_OFFSET) = 0;

    // TCTL.CT is ignored, since we assume full-duplex operation.
    *(uint32_t *)(e1000 + TCTL_OFFSET) = (TCTL_COLD_FDX << TCTL_COLD_SHIFT) | TCTL_PSP | TCTL_EN;

    *(uint32_t *)(e1000 + TIPG_OFFSET) = (TIPG_IPGR2 << TIPG_IPGR2_SHIFT | \
                                          TIPG_IPGR1 << TIPG_IPGR1_SHIFT | \
                                          TIPG_IPGR);
}

Now I can implement the actual function to transmit packets:

Exercise 6. Write a function to transmit a packet by checking that the next descriptor is free, copying the packet data into the next descriptor, and updating TDT. Make sure you handle the transmit queue being full.

I named the function tx_pkt():

int
tx_pkt(const char *buf, size_t nbytes)
{
    if (nbytes > DESC_BUF_SZ)
        panic("tx_pkt: invalid packet size!\n");

    if ((tdr[tdt].cmd & TDESC_CMD_RS) != 0) {         // If this descriptor has been used before. We always set RS when transmitting a packet.
        if ((tdr[tdt].status & TDESC_STA_DD) == 0)    // Still in use!
            return -E_TX_FULL;
    }
    // Copy data into packet buffer
    memcpy(&tx_pktbufs[tdt], buf, nbytes);
    tdr[tdt].length = (uint16_t)nbytes;
    tdr[tdt].cmd |= TDESC_CMD_RS | TDESC_CMD_EOP;

    tdt = (tdt + 1) % NTDRENTRIES;     // It's a ring
    *(uint32_t *)(e1000 + TDT_OFFSET) = tdt;
    return 0;
}

As we will see later, E1000 expects the length of data buffers of incoming packets to be one of a few predefined values. I chose 2048 (defined as DESC_BUF_SZ) bytes, since it needs to be larger than the maximum size of an Ethernet packet, 1518 bytes. Therefore, I used the same value for outgoing packet data buffers here, just for simplicity.

If we set the RS flag for a packet, E1000 turns on its DD flag after the packet has been successfully sent. This is how we driver tell whether it’s safe to reuse a TD or not.

If we want to send a packet that is longer than one TD, we turn on the EOP bit only for its last TD, telling E1000 this is the End Of the Packet. I don’t want to deal with long packets at all, so I simply set the EOP bits for every packet I transmit.

Update TDT only when everything else is done.

That completes the transmitting part of the driver. Now it’s time to implement the output helper environment. The helper calls tx_pkt() through a syscall:

Exercise 7. Add a system call that lets you transmit packets from user space. The exact interface is up to you. Don’t forget to check any pointers passed to the kernel from user space.

// Send a packet.
static int
sys_tx_pkt(const char *buf, size_t nbytes)
{
	user_mem_assert(curenv, (void *)buf, nbytes, 0);
	return tx_pkt(buf, nbytes);
}

The output helper loops forever. During each iteration, it reads a packet from the network server through page-sharing IPC implemented in Lab 4, then sends the packet to the driver using the syscall sys_tx_pkt() we just added.

#include "ns.h"
#include <inc/lib.h>

extern union Nsipc nsipcbuf;

void
output(envid_t ns_envid)
{
	binaryname = "ns_output";

	// LAB 6: Your code here:
	for (;;) {
		int r;
		envid_t from_env_store;

		// 	- read a packet from the network server
		if ((r = ipc_recv(&from_env_store, (void *)&nsipcbuf, 0)) < 0)
			panic("output: ipc_recv() failed!\n");
		if (from_env_store != ns_envid)
			panic("output: unexpected IPC sender!\n");
		if (r != NSREQ_OUTPUT)
			panic("output: unexpected IPC type!\n");

		//	- send the packet to the device driver
		sys_tx_pkt(nsipcbuf.pkt.jp_data, nsipcbuf.pkt.jp_len);
	}
}

The low_level_output() function (see net/lwip/jos/jif/jif.c) in the core network server sends outgoing packets to our output helper.

This completes the transmitting part of the system! Let’s do a test:

peilin@PWN:~/6.828/2018/lab$ make E1000_DEBUG=TXERR,TX run-net_testoutput-nox
Transmitting packet 0
Transmitting packet 1
e1000: index 0: 0x2f8000 : 9000009 0
Transmitting packet 2
e1000: index 1: 0x2f8800 : 9000009 0
block cache is good
superblock is good
Transmitting packet 3
e1000: index 2: 0x2f9000 : 9000009 0
bitmap is good
Transmitting packet 4
e1000: index 3: 0x2f9800 : 9000009 0
Transmitting packet 5
e1000: index 4: 0x2fa000 : 9000009 0
Transmitting packet 6
e1000: index 5: 0x2fa800 : 9000009 0
Transmitting packet 7
e1000: index 6: 0x2fb000 : 9000009 0
Transmitting packet 8
e1000: index 7: 0x2fb800 : 9000009 0
Transmitting packet 9
e1000: index 8: 0x2fc000 : 9000009 0
e1000: index 9: 0x2fc800 : 9000009 0

It’s kind of funny to see these file system logs in between: our preemptive multitasking is working fine! 😉

Part B: Receiving packets and the web server

Similar to tx_pkt(), we also need to initialize the Receive Descriptor Ring (RDR). I call the function rx_pkt():

Exercise 10. Set up the receive queue and configure the E1000 by following the process in section 14.4. You don’t have to support “long packets” or multicast. For now, don’t configure the card to use interrupts; you can change that later if you decide to use receive interrupts. Also, configure the E1000 to strip the Ethernet CRC, since the grade script expects it to be stripped.

__attribute__((__aligned__(16)))    // paragraph size
struct td tdr[NTDRENTRIES] = {0};
struct rd rdr[NRDRENTRIES] = {0};
uint32_t tdt = 0;     // Index into tdr[] and tx_pktbufs[].
uint32_t rdt = NRDRENTRIES - 1;

__attribute__((__aligned__(PGSIZE)))
char tx_pktbufs[NTDRENTRIES][DESC_BUF_SZ] = {0};
char rx_pktbufs[NRDRENTRIES][DESC_BUF_SZ] = {0};

// 3.2.3 Receive Descriptor Format
struct rd
{
    uint64_t address;
    uint16_t length;
    uint16_t pkt_cks;
    uint8_t status;
    uint8_t errors;
    uint16_t special;
};

static void
rx_init(void)
{
    // initialize rdr
    for (int i = 0; i < NRDRENTRIES; i++)
        rdr[i].address = PADDR(&rx_pktbufs[i]);

    // 14.4 Receive Initialization
    *(uint32_t *)(e1000 + RAL_0_OFFSET) = 0x12005452;       // QEMU's default MAC address:
    *(uint32_t *)(e1000 + RAH_0_OFFSET) = 0x5634 | RAH_AV;  // 52:54:00:12:34:56

    for (int i = 0; i < NMTAENTRIES; i++)
        ((uint32_t *)(e1000 + MTA_OFFSET))[i] = 0;

    *(uint32_t *)(e1000 + RDBAL_OFFSET) = PADDR(rdr);

    if ((sizeof(rdr) & RDLEN_ALIGN_MASK) != 0)
        panic("RDLEN must be 128-byte aligned!\n");
    *(uint32_t *)(e1000 + RDLEN_OFFSET) = sizeof(rdr);

    *(uint32_t *)(e1000 + RDH_OFFSET) = 0;
    *(uint32_t *)(e1000 + RDT_OFFSET) = NRDRENTRIES - 1;

    uint32_t rctl = RCTL_SECRC | DESC_BUF_SZ << RCTL_BSIZE_SHIFT | RCTL_BAM | RCTL_EN;
    rctl &= ~RCTL_LPE;
    *(uint32_t *)(e1000 + RCTL_OFFSET) = rctl;
}

Don’t forget to turn on the AV (Address Valid) bit of RAH (Receive Address High) register.

Again, this step requires referencing the manual a lot, and I’m not sure my solution is 100% correct: It just works.

Exercise 11. Write a function to receive a packet from the E1000 and expose it to user space by adding a system call. Make sure you handle the receive queue being empty.

int
rx_pkt(char *buf)
{
    uint32_t next = (rdt + 1) % NRDRENTRIES;
    if ((rdr[next].status & RDESC_STA_DD) == 0)
        return -E_RX_EMPTY;
    
    rdt = next;
    pkt_count++;
    // Copy data out of packet buffer
    uint16_t length = rdr[rdt].length;
    memcpy(buf, &rx_pktbufs[rdt], length);
    
    rdr[next].status &= ~RDESC_STA_DD;
    *(uint32_t *)(e1000 + RDT_OFFSET) = rdt;
    return length;
}

Unlike tx_init() where we set both TDH and TDT to zero, in rx_init() we initialize RDH to zero, but RDT to NRDRENTRIES - 1. Otherwise R1000 won’t receive any packets at the first place! If RDH equals to RDT, R1000 thinks that RDR is full, and simply drops every packet it received.

We clear the DD bit of an RD when we are done with it. As always, update RDT only after we’ve done everything else.

// Receive a packet.
static int
sys_rx_pkt(char *buf)
{
	user_mem_assert(curenv, (void *)buf, DESC_BUF_SZ, 0);	// It's ok to be read-only.
	return rx_pkt(buf);
}

Actually, more precisely this syscall should be name as sys_try_rx_pkt() just like how named ipc_try_send(). We can’t really loop here in a syscall since it may block the entire system: Remember in JOS there’s only one kernel thread at any point of time, running with interruption disabled.

Exercise 12. Implement net/input.c.

#include "ns.h"

uint32_t pkt_count = 0;

__attribute__((__aligned__(PGSIZE)))
union Nsipc nsipcbufs[2];

void
input(envid_t ns_envid)
{
	binaryname = "ns_input";
	int r;
	int8_t s = 0; // nsipcbufs selector
	char tmpbuf[NS_BUFSIZE] = {0};

	// LAB 6: Your code here:
	for (;;) {
		// 	- read a packet from the device driver
		while ((r = sys_rx_pkt(tmpbuf)) < 0);
		pkt_count++;

		//	- send it to the network server
		// Hint: When you IPC a page to the network server, it will be
		// reading from it for a while, so don't immediately receive
		// another packet in to the same physical page.
		nsipcbufs[s].pkt.jp_len = r;
		memcpy(nsipcbufs[s].pkt.jp_data, tmpbuf, r);
		
		ipc_send(ns_envid, NSREQ_INPUT, (void *)&nsipcbufs[s], PTE_U|PTE_P);
		s ^= 1;
	}
}

This exercise took me several hours of debugging. As mentioned in the highlighted comment lines, I shouldn’t immediately receive a new package onto the same physical page while the core server may still be reading from it. My solution simply uses two buffers instead of one, but this approach may fail if we use a different environment scheduling policy other than round-robin. Maybe a better solution is to use a lock?

Finally we complete the HTTP daemon.

Exercise 13. The web server is missing the code that deals with sending the contents of a file back to the client. Finish the web server by implementing send_file() and send_data().

static int
send_data(struct http_request *req, int fd, off_t size)
{
	// LAB 6: Your code here.
	char *buf;
	int r;

	if ((buf = (char *)malloc(size)) == 0)
		return -1;
	if ((r = read(fd, buf, size)) < size)
		return -1;
	if (write(req->sock, buf, size) != size)
		return -1;
	return 0;
}

static int
send_file(struct http_request *req)
{
	int r;
	off_t file_size = -1;
	int fd;
	struct Stat stat;

	// open the requested url for reading
	// if the file does not exist, send a 404 error using send_error
	// if the file is a directory, send a 404 error using send_error
	// set file_size to the size of the file

	// LAB 6: Your code here.
	if ((fd = open(req->url, O_RDONLY)) < 0) {
		send_error(req, 404);
		goto end;
	}
	if ((r = fstat(fd, &stat)) < 0)
		goto end;
	if (stat.st_isdir) {
		send_error(req, 404);
		goto end;	
	}
	file_size = stat.st_size;
	
	if ((r = send_header(req, 200)) < 0)
		goto end;
	if ((r = send_size(req, file_size)) < 0)
		goto end;
	if ((r = send_content_type(req)) < 0)
		goto end;
	if ((r = send_header_fin(req)) < 0)
		goto end;

	r = send_data(req, fd, file_size);
end:
	close(fd);
	return r;
}

Time to test it!

peilin@PWN:~/6.828/2018/lab$ make run-httpd-nox
ns: 52:54:00:12:34:56 bound to static IP 10.0.2.15
ns: TCP/IP initialized.
Waiting for http connections...

peilin@PWN:~/6.828/2018/lab$ make which-ports
Local port 26001 forwards to JOS port 7 (echo server)
Local port 26002 forwards to JOS port 80 (web server)
peilin@PWN:~/6.828/2018/lab$ curl localhost:26002/index.html
<html>
<head>
       <title>jhttpd on JOS</title>
</head>
<body>
       <center>
             <h2>This file came from JOS.</h2>
             <marquee>Cheesy web page!</marquee>
       </center>
</body>
</html>
peilin@PWN:~/6.828/2018/lab$ curl localhost:26002/../../../etc/passwd
<html><body><p>404 – Not Found</p></body></html>

Unfortunately my VM does not have a GUI…Let me expose that port to my host machine and point “my favorite browser” to localhost:8848/index.html:

Yes!!

Finally here’s my complete kern/e1000.h:

#ifndef JOS_KERN_E1000_H
#define JOS_KERN_E1000_H
#endif  // SOL >= 6

#include <kern/pci.h>

int e1000_attach(struct pci_func *pcif);
int tx_pkt(const char *buf, size_t nbytes);
int rx_pkt(char *buf);

// 3.2.3 Receive Descriptor Format
struct rd
{
    uint64_t address;
    uint16_t length;
    uint16_t pkt_cks;
    uint8_t status;
    uint8_t errors;
    uint16_t special;
};

// 3.3.3 Legacy Transmit Descriptor Format
struct td
{
    uint64_t address;
    uint16_t length;
    uint8_t cso;
    uint8_t cmd;
    uint8_t status; // RSV + STA
    uint8_t css;
    uint16_t special;
};

#define NTDRENTRIES     64      // Must be multiple of 8
#define NRDRENTRIES     128
#define DESC_BUF_SZ     2048    // 3.2.2 Receive Data Storage

// 3.2.3.1 Receive Descriptor Status Field
#define RDESC_STA_DD    0x01

// 3.3.3.1 Transmit Descriptor Command Field Format
#define TDESC_CMD_EOP   0x01
#define TDESC_CMD_RS    0x08

// 3.3.3.2 Transmit Descriptor Status Field Format
#define TDESC_STA_DD    0x01

// Table 5-1. Component Identification
#define PCI_E1000_VENDOR_ID     0x8086
#define PCI_E1000_DEVICE_ID     0x100e

// Table 13-2. Ethernet Controller Register Summary
#define RCTL_OFFSET     0x100

#define TCTL_OFFSET     0x400
#define TIPG_OFFSET     0x410

#define RDBAL_OFFSET    0x2800
#define RDBAH_OFFSET    0x2804
#define RDLEN_OFFSET    0x2808
#define RDH_OFFSET      0x2810
#define RDT_OFFSET      0x2818

#define TDBAL_OFFSET    0x3800
#define TDLEN_OFFSET    0x3808
#define TDH_OFFSET      0x3810
#define TDT_OFFSET      0x3818

#define MTA_OFFSET      0x5200
#define NMTAENTRIES     ((0x53FC - 0x5200) / 127)

#define RAL_0_OFFSET    0x5400
#define RAH_0_OFFSET    0x5404

// 13.4.2 Device Status Register
#define DSR_OFFSET      0x8

// 13.4.22 Receive Control Register
#define RCTL_EN             (1 << 1)
#define RCTL_LPE            (1 << 5)
#define RCTL_LBM_MASK       (0b11 << 6)
#define RCTL_BAM            (1 << 15)
#define RCTL_BSIZE_SHIFT    16
#define RCTL_SECRC          (1 << 26)

// 13.4.33 Transmit Control Register
#define TCTL_EN         0x2
#define TCTL_PSP        0x8

#define TCTL_COLD_SHIFT     12
#define TCTL_COLD_FDX       0x40

// 13.5.3 Receive Address High
#define RAH_AV      (1 << 31)

// Table 13-77. TIPG Register Bit Description
#define TIPG_IPGR           10      // IEEE 802.3 

#define TIPG_IPGR1_SHIFT    10
#define TIPG_IPGR1          4       // IEEE 802.3

#define TIPG_IPGR2_SHIFT    20
#define TIPG_IPGR2          6       // IEEE 802.3 

// 14.4 Receive Initialization
#define RDLEN_ALIGN_MASK    (128 - 1)

// 14.5 Transmit Initialization
#define TDLEN_ALIGN_MASK    (128 - 1)

// Misc
#define MAX_ETH_SZ      1518

As well as kern/e1000.c:

#include <inc/error.h>
#include <inc/string.h>

#include <kern/e1000.h>
#include <kern/pmap.h>

volatile void *e1000;   // uint32_t

static void rx_init();
static void tx_init();

int pkt_count = 0;

// 3.4 Transmit Descriptor Ring Structure
__attribute__((__aligned__(16)))    // paragraph size
struct td tdr[NTDRENTRIES] = {0};
struct rd rdr[NRDRENTRIES] = {0};
uint32_t tdt = 0;     // Index into tdr[] and tx_pktbufs[].
uint32_t rdt = NRDRENTRIES - 1;

__attribute__((__aligned__(PGSIZE)))
char tx_pktbufs[NTDRENTRIES][DESC_BUF_SZ] = {0};
char rx_pktbufs[NRDRENTRIES][DESC_BUF_SZ] = {0};

static void
rx_init(void)
{
    // initialize rdr
    for (int i = 0; i < NRDRENTRIES; i++)
        rdr[i].address = PADDR(&rx_pktbufs[i]);

    // 14.4 Receive Initialization
    *(uint32_t *)(e1000 + RAL_0_OFFSET) = 0x12005452;       // QEMU's default MAC address:
    *(uint32_t *)(e1000 + RAH_0_OFFSET) = 0x5634 | RAH_AV;  // 52:54:00:12:34:56

    for (int i = 0; i < NMTAENTRIES; i++)
        ((uint32_t *)(e1000 + MTA_OFFSET))[i] = 0;

    *(uint32_t *)(e1000 + RDBAL_OFFSET) = PADDR(rdr);

    if ((sizeof(rdr) & RDLEN_ALIGN_MASK) != 0)
        panic("RDLEN must be 128-byte aligned!\n");
    *(uint32_t *)(e1000 + RDLEN_OFFSET) = sizeof(rdr);

    *(uint32_t *)(e1000 + RDH_OFFSET) = 0;
    *(uint32_t *)(e1000 + RDT_OFFSET) = NRDRENTRIES - 1;

    uint32_t rctl = RCTL_SECRC | DESC_BUF_SZ << RCTL_BSIZE_SHIFT | RCTL_BAM | RCTL_EN;
    rctl &= ~RCTL_LPE;
    *(uint32_t *)(e1000 + RCTL_OFFSET) = rctl;
}

static void
tx_init(void)
{
    // initialize tdr
    for (int i = 0; i < NTDRENTRIES; i++)
        tdr[i].address = PADDR(&tx_pktbufs[i]);
    
    // 14.5 Transmit Initialization
    *(uint32_t *)(e1000 + TDBAL_OFFSET) = PADDR(tdr);

    if ((sizeof(tdr) & TDLEN_ALIGN_MASK) != 0)
        panic("TDLEN must be 128-byte aligned!\n");
    *(uint32_t *)(e1000 + TDLEN_OFFSET) = sizeof(tdr);

    *(uint32_t *)(e1000 + TDH_OFFSET) = 0;
    *(uint32_t *)(e1000 + TDT_OFFSET) = 0;

    // TCTL.CT is ignored, since we assume full-duplex operation.
    *(uint32_t *)(e1000 + TCTL_OFFSET) = (TCTL_COLD_FDX << TCTL_COLD_SHIFT) | TCTL_PSP | TCTL_EN;

    *(uint32_t *)(e1000 + TIPG_OFFSET) = (TIPG_IPGR2 << TIPG_IPGR2_SHIFT | \
                                          TIPG_IPGR1 << TIPG_IPGR1_SHIFT | \
                                          TIPG_IPGR);
}

int
e1000_attach(struct pci_func *pcif)
{
	pci_func_enable(pcif);

    e1000 = mmio_map_region(pcif->reg_base[0], pcif->reg_size[0]);
	cprintf("Device Status Register: 0x%x\n", *(uint32_t *)(e1000 + DSR_OFFSET));
    
    rx_init();
    tx_init();
    return 0;
}

int
tx_pkt(const char *buf, size_t nbytes)
{
    if (nbytes > DESC_BUF_SZ)
        panic("tx_pkt: invalid packet size!\n");

    if ((tdr[tdt].cmd & TDESC_CMD_RS) != 0) {         // If this descriptor has been used before. We always set RS when transmitting a packet.
        if ((tdr[tdt].status & TDESC_STA_DD) == 0)    // Still in use!
            return -E_TX_FULL;
    }
    // Copy data into packet buffer
    memcpy(&tx_pktbufs[tdt], buf, nbytes);
    tdr[tdt].length = (uint16_t)nbytes;
    tdr[tdt].cmd |= TDESC_CMD_RS | TDESC_CMD_EOP;

    tdt = (tdt + 1) % NTDRENTRIES;     // It's a ring
    *(uint32_t *)(e1000 + TDT_OFFSET) = tdt;
    return 0;
}

// return a length
int
rx_pkt(char *buf)
{
    uint32_t next = (rdt + 1) % NRDRENTRIES;
    if ((rdr[next].status & RDESC_STA_DD) == 0)
        return -E_RX_EMPTY;
    
    rdt = next;
    pkt_count++;
    // Copy data out of packet buffer
    uint16_t length = rdr[rdt].length;
    memcpy(buf, &rx_pktbufs[rdt], length);
    
    rdr[next].status &= ~RDESC_STA_DD;
    *(uint32_t *)(e1000 + RDT_OFFSET) = rdt;
    return length;
}

That’s it! From virtual memory management, exception handling, preemptive multitasking, file system to network server, it has been a long journey! Now it’s time to explore more, and more!

Post Views: 6,179

6.828/2018 Lab 6: Network Driver

Share this:

Submit a Comment Cancel reply