The famous zombie researcher “Donn Beach” almost created an immunization
against the dipsomanie virus. This severe disease leads to the inability to
defend against Zombies, later causes a complete loss of memory and finally
turns you into one of them. Inexplicably Donn forgot where he put the
license key for his centrifuge. Provide him a new one and humanity will owe
you a debt of gratitude for fighting one of the most wicked illnesses
today.

https://ctf.fluxfingers.net/challenges/donn_beach.exe
ctf.fluxfingers.net tcp/2055

credits: 500 +3 (1st), +2 (2nd), +1 (3rd)

This Win32 executable starts by asking a name, hashing it and comparing it to a constant, then asks a key, does several computations on it using a VM obfuscated with SSE3 instructions, and compares the result of these computations to four integer constants.

We can safely patch the name hashing and make sure that the name hash value is the constant we want - using hardware breakpoints, we can see that the name itself isn’t used later, but the name hash is. The key is composed of three 32 bits integers, read from stdin like this (in hex): AAAAAAAA-BBBBBBBB-CCCCCCCC.

Before running code in the VM, the executable initializes the VM state with all the inputs to the algorithm:

  • Name hash
  • First part of the key (key1)
  • Second part of the key (key2)
  • Third part of the key (key3)
  • Pointer to the current instruction
  • Pointer to a constant 256 bytes array
  • Stack pointer (points to freshly allocated memory)

After running the VM code, it unpacks these values from the state and stores them back to stack variables. They are then compared to the constant values.

Looking inside the VM code a bit closer for 1 hour, and stepping into it with a debugger, we can notice several interesting things:

  • The bytecode is interlaced with the VM code in the binary. The x86 code regularly contains long multi-byte NOPs, in which the VM code is placed. The VM simply ignores any instruction it does not know and skips to the next byte, so it will only execute the instructions from inside the NOPs.
  • The VM state is contained in MMX registers mm0 to mm3, scrambled. Bytes of each of the VM 8 32 bits registers are shuffled to fill these 4 64 bits registers.
  • The instruction pointer always goes forward, and there does not seem to be anything that increments it with a non constant increment. This means the VM does not support jumps of any sort, so the logic inside of the VM is very reduced.

The 8 VM registers initially contain the following values:

  • Reg 0: Constant 256 bytes array ptr
  • Reg 1: Name hash
  • Reg 2: key1
  • Reg 3: key2
  • Reg 4: key3
  • Reg 5: 0
  • Reg 6: Stack pointer
  • Reg 7: EIP

Something also makes our life a lot easier: inside the instruction handlers, to read a register value, the code does not inline the SSE instructions to unshuffle and unpack the register. Instead, it gets a function pointer from a table which contains 8 register read functions (one for each register), and calls that function to get the register value in mm4. The same can be observed for register writes. This allows us to very easily notice the instructions reading and writing to registers.

Using all of these infos, I started to statically reverse engineer all the instruction handlers present in the binary. After one additional hour of work and a lot of laughs after I was rickrolled by an instruction handler, a disassembler was ready:

code = map(ord, open('donn_beach.exe').read()[0x2400:])

OPCODES = {
    0x11: ("ABORT", 1),
    0x09: ("EXIT", 1),
    0x3E: ("ADD", 2),
    0x0D: ("PUSH", 2),
    0x2A: ("PUSH8", 2),
    0x26: ("MOV", 2),
    0x4C: ("POP", 2),
    0x17: ("XOR", 2),
    0x54: ("MOV", 2),
    0x7D: ("SLL", 2),
    0x2C: ("LOAD8", 2),
    0x3B: ("WRITE8", 2),
    0x1B: ("SLL", 2),
    0x5D: ("SRL", 2),
    0x34: ("MOV", 2),
    0x31: ("AND", 2),
}

i = 0
while i < len(code):
    op = code[i]
    if op in OPCODES:
        print OPCODES[code[i]][0],
        if OPCODES[code[i]][1] == 2: print "%02x" % code[i + 1]
        else: print
        i += OPCODES[code[i]][1]
    else:
        i += 1

Running it on the binary gives us the following output:

PUSH 00
PUSH 04
PUSH 03
PUSH 02
PUSH8 ff
POP 02
PUSH8 08
POP 04
MOV 31
AND 32
ADD 30
LOAD8 53
MOV 31
SRL 34
AND 32
ADD 30
LOAD8 33
SLL 34
XOR 53
MOV 31
SRL 34
SRL 34
AND 32
ADD 30
LOAD8 33
SLL 34
SLL 34
XOR 53
MOV 31
SRL 34
SRL 34
SRL 34
ADD 30
LOAD8 33
SLL 34
SLL 34
SLL 34
XOR 53
POP 01
PUSH 05
PUSH 05
MOV 31
AND 32
ADD 30
LOAD8 53
MOV 31
SRL 34
AND 32
ADD 30
LOAD8 33
SLL 34
XOR 53
MOV 31
SRL 34
SRL 34
AND 32
ADD 30
LOAD8 33
SLL 34
SLL 34
XOR 53
MOV 31
SRL 34
SRL 34
SRL 34
ADD 30
LOAD8 33
SLL 34
SLL 34
SLL 34
XOR 53
POP 04
POP 03
POP 01
PUSH 03
PUSH 05
PUSH 04
PUSH8 08
POP 04
MOV 31
AND 32
ADD 30
LOAD8 53
MOV 31
SRL 34
AND 32
ADD 30
LOAD8 33
SLL 34
XOR 53
MOV 31
SRL 34
SRL 34
AND 32
ADD 30
LOAD8 33
SLL 34
SLL 34
XOR 53
MOV 31
SRL 34
SRL 34
SRL 34
ADD 30
LOAD8 33
SLL 34
SLL 34
SLL 34
XOR 53
POP 04
POP 03
POP 02
POP 01
PUSH 05
PUSH 05
PUSH 03
PUSH 04
PUSH8 ff
POP 02
PUSH8 08
POP 04
MOV 31
AND 32
ADD 30
LOAD8 53
MOV 31
SRL 34
AND 32
ADD 30
LOAD8 33
SLL 34
XOR 53
MOV 31
SRL 34
SRL 34
AND 32
ADD 30
LOAD8 33
SLL 34
SLL 34
XOR 53
MOV 31
SRL 34
SRL 34
SRL 34
ADD 30
LOAD8 33
SLL 34
SLL 34
SLL 34
XOR 53
POP 01
POP 02
POP 03
MOV 45
PUSH8 08
POP 00
MOV 52
SLL 50
SRL 20
SRL 20
SRL 20
XOR 25
MOV 54
SRL 50
SLL 40
SLL 40
SLL 40
XOR 45
PUSH8 10
POP 00
MOV 53
SRL 50
SLL 30
XOR 35
MOV 01
XOR 12
XOR 23
XOR 34
XOR 40
POP 00
POP 00
PUSH 04
PUSH 03
PUSH 02
PUSH8 ff
POP 02
PUSH8 08
POP 04
MOV 31
AND 32
ADD 30
LOAD8 53
MOV 31
SRL 34
AND 32
ADD 30
LOAD8 33
SLL 34
XOR 53
MOV 31
SRL 34
SRL 34
AND 32
ADD 30
LOAD8 33
SLL 34
SLL 34
XOR 53
MOV 31
SRL 34
SRL 34
SRL 34
ADD 30
LOAD8 33
SLL 34
SLL 34
SLL 34
XOR 53
POP 01
PUSH 05
PUSH 05
MOV 31
AND 32
ADD 30
LOAD8 53
MOV 31
SRL 34
AND 32
ADD 30
LOAD8 33
SLL 34
XOR 53
MOV 31
SRL 34
SRL 34
AND 32
ADD 30
LOAD8 33
SLL 34
SLL 34
XOR 53
MOV 31
SRL 34
SRL 34
SRL 34
ADD 30
LOAD8 33
SLL 34
SLL 34
SLL 34
XOR 53
POP 04
POP 03
POP 01
PUSH 03
PUSH 05
PUSH 04
PUSH8 08
POP 04
MOV 31
AND 32
ADD 30
LOAD8 53
MOV 31
SRL 34
AND 32
ADD 30
LOAD8 33
SLL 34
XOR 53
MOV 31
SRL 34
SRL 34
AND 32
ADD 30
LOAD8 33
SLL 34
SLL 34
XOR 53
MOV 31
SRL 34
SRL 34
SRL 34
ADD 30
LOAD8 33
SLL 34
SLL 34
SLL 34
XOR 53
POP 04
POP 03
POP 02
POP 01
PUSH 05
PUSH 05
PUSH 03
PUSH 04
PUSH8 ff
POP 02
PUSH8 08
POP 04
MOV 31
AND 32
ADD 30
LOAD8 53
MOV 31
SRL 34
AND 32
ADD 30
LOAD8 33
SLL 34
XOR 53
MOV 31
SRL 34
SRL 34
AND 32
ADD 30
LOAD8 33
SLL 34
SLL 34
XOR 53
MOV 31
SRL 34
SRL 34
SRL 34
ADD 30
LOAD8 33
SLL 34
SLL 34
SLL 34
XOR 53
POP 01
POP 02
POP 03
MOV 45
PUSH8 08
POP 00
MOV 52
SLL 50
SRL 20
SRL 20
SRL 20
XOR 25
MOV 54
SRL 50
SLL 40
SLL 40
SLL 40
XOR 45
PUSH8 10
POP 00
MOV 53
SRL 50
SLL 30
XOR 35
MOV 01
XOR 12
XOR 23
XOR 34
XOR 40
EXIT

After a lot of boring reverse on this code, this gives us the following algorithm (the mapping table is the constant array mentioned earlier in the VM state):

static const unsigned char mapping[] = {
    0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5, 0x30, 0x01, 0x67, 0x2b, 0xfe, 0xd7, 0xab, 0x76,
    0xca, 0x82, 0xc9, 0x7d, 0xfa, 0x59, 0x47, 0xf0, 0xad, 0xd4, 0xa2, 0xaf, 0x9c, 0xa4, 0x72, 0xc0,
    0xb7, 0xfd, 0x93, 0x26, 0x36, 0x3f, 0xf7, 0xcc, 0x34, 0xa5, 0xe5, 0xf1, 0x71, 0xd8, 0x31, 0x15,
    0x04, 0xc7, 0x23, 0xc3, 0x18, 0x96, 0x05, 0x9a, 0x07, 0x12, 0x80, 0xe2, 0xeb, 0x27, 0xb2, 0x75,
    0x09, 0x83, 0x2c, 0x1a, 0x1b, 0x6e, 0x5a, 0xa0, 0x52, 0x3b, 0xd6, 0xb3, 0x29, 0xe3, 0x2f, 0x84,
    0x53, 0xd1, 0x00, 0xed, 0x20, 0xfc, 0xb1, 0x5b, 0x6a, 0xcb, 0xbe, 0x39, 0x4a, 0x4c, 0x58, 0xcf,
    0xd0, 0xef, 0xaa, 0xfb, 0x43, 0x4d, 0x33, 0x85, 0x45, 0xf9, 0x02, 0x7f, 0x50, 0x3c, 0x9f, 0xa8,
    0x51, 0xa3, 0x40, 0x8f, 0x92, 0x9d, 0x38, 0xf5, 0xbc, 0xb6, 0xda, 0x21, 0x10, 0xff, 0xf3, 0xd2,
    0xcd, 0x0c, 0x13, 0xec, 0x5f, 0x97, 0x17, 0x44, 0xc4, 0xa7, 0x7e, 0x3d, 0x64, 0x5d, 0x19, 0x73,
    0x60, 0x81, 0x4f, 0xdc, 0x22, 0x2a, 0x90, 0x88, 0x46, 0xee, 0xb8, 0x14, 0xde, 0x5e, 0x0b, 0xdb,
    0xe0, 0x32, 0x3a, 0x0a, 0x49, 0x06, 0x24, 0x5c, 0xc2, 0xd3, 0xac, 0x62, 0x91, 0x95, 0xe4, 0x79,
    0xe7, 0xc8, 0x37, 0x6d, 0x8d, 0xd5, 0x4e, 0xa9, 0x6c, 0x56, 0xf4, 0xea, 0x65, 0x7a, 0xae, 0x08,
    0xba, 0x78, 0x25, 0x2e, 0x1c, 0xa6, 0xb4, 0xc6, 0xe8, 0xdd, 0x74, 0x1f, 0x4b, 0xbd, 0x8b, 0x8a,
    0x70, 0x3e, 0xb5, 0x66, 0x48, 0x03, 0xf6, 0x0e, 0x61, 0x35, 0x57, 0xb9, 0x86, 0xc1, 0x1d, 0x9e,
    0xe1, 0xf8, 0x98, 0x11, 0x69, 0xd9, 0x8e, 0x94, 0x9b, 0x1e, 0x87, 0xe9, 0xce, 0x55, 0x28, 0xdf,
    0x8c, 0xa1, 0x89, 0x0d, 0xbf, 0xe6, 0x42, 0x68, 0x41, 0x99, 0x2d, 0x0f, 0xb0, 0x54, 0xbb, 0x16
};

static unsigned int transpose(unsigned int n)
{
    unsigned int new_n = 0;

    new_n |= mapping[(n >> 0) & 0xFF] << 0;
    new_n |= mapping[(n >> 8) & 0xFF] << 8;
    new_n |= mapping[(n >> 16) & 0xFF] << 16;
    new_n |= mapping[(n >> 24) & 0xFF] << 24;

    return new_n;
}

static unsigned int rotl(unsigned int n, unsigned int sa)
{
    return (n << sa) | (n >> (32 - sa));
}

static void round(unsigned int* a, unsigned int* b, unsigned int* c, unsigned int* d)
{
    unsigned int at;

    *a = transpose(*a);
    *b = transpose(*b);
    *c = transpose(*c);
    *d = transpose(*d);

    at = *a;
    *b = rotl(*b, 8);
    *c = rotl(*c, 16);
    *d = rotl(*d, 24);

    *a ^= *b;
    *b ^= *c;
    *c ^= *d;
    *d ^= at;
}

int main(void)
{
    unsigned int nh = 0x4b17e245;   // name hash, constant
    unsigned int k1, k2, k3;

    scanf("%x-%x-%x", &k1, &k2, &k3);

    round(&nh, &k1, &k2, &k3);
    round(&nh, &k1, &k2, &k3);

    if (nh != 0x01020304 || k1 != 0x05060708 || k2 != 0x09101112 || k3 != 0x0d14151e)
        puts("FAIL :(");
    else
        puts("SUCCESS :)");

    return 0;
}

Now that we have the algorithm, we still need to generate a key that will result in valid values in the end. As I was lazy and it was getting late in the evening, I implemented the algorithm with Z3Py and asked it to solve the problem for me. Unfortunately I failed several times to implement the algorithm, and the iteration time was quite long because Z3 needed 20-30 minutes to get me a key matching my description of the problem, so we only got the answer in the morning.

from z3 import *

s = Solver()

mapping = Array('mapping', BitVecSort(8), BitVecSort(8))
for l in open('mapping.txt'):
    l = l.strip()
    a, b = l.split()
    s.add(mapping[int(b, 16)] == int(a, 16))

nh = list(BitVecs('nh1 nh2 nh3 nh4', 8))
k1 = list(BitVecs('k11 k12 k13 k14', 8))
k2 = list(BitVecs('k21 k22 k23 k24', 8))
k3 = list(BitVecs('k31 k32 k33 k34', 8))

s.add(nh[0] == 0x4b, nh[1] == 0x17, nh[2] == 0xe2, nh[3] == 0x45)

def transpose(n):
    return [mapping[n[0]], mapping[n[1]], mapping[n[2]], mapping[n[3]]]

def rotl8(n):
    return [n[1], n[2], n[3], n[0]]

def rotl16(n):
    return [n[2], n[3], n[0], n[1]]

def rotl24(n):
    return [n[3], n[0], n[1], n[2]]

def xor(a, b):
    return [a[0] ^ b[0], a[1] ^ b[1], a[2] ^ b[2], a[3] ^ b[3]]

def hash(a, b, c, d, transp=True):
    if transp:
        at = transpose(a)
        bt = transpose(b)
        ct = transpose(c)
        dt = transpose(d)
    else:
        at, bt, ct, dt = a, b, c, d

    r1 = xor(at, rotl8(bt))
    r2 = xor(rotl8(bt), rotl16(ct))
    r3 = xor(rotl16(ct), rotl24(dt))
    r4 = xor(rotl24(dt), at)

    return r1, r2, r3, r4

r1, r2, r3, r4 = hash(nh, k1, k2, k3)
r1, r2, r3, r4 = hash(r1, r2, r3, r4)

s.add(r1[0] == 0x01, r1[1] == 0x02, r1[2] == 0x03, r1[3] == 0x04)
s.add(r2[0] == 0x05, r2[1] == 0x06, r2[2] == 0x07, r2[3] == 0x08)
s.add(r3[0] == 0x09, r3[1] == 0x10, r3[2] == 0x11, r3[3] == 0x12)
s.add(r4[0] == 0x0d, r4[1] == 0x14, r4[2] == 0x15, r4[3] == 0x1e)

print s.check()
print s.model()

After running for 20 minutes, this gave me the following valid key: e5304760-47b7c45f-f59a8f29.

Later, a friend tried to find a better way to solve this problem, and noticed that it was reductible to a 32 bits bruteforce. Using this method, we found the previous key, but also a second valid key: b6b09bf0-f23daa06-ac4ee747.