For this exercise with two parts (400 and 500 points), we were given too files: a binary named archiv and some data named FluxArchiv.arc. The two parts involved the same binary.

When running the binary with no options, it displays an usage message containing the different options possible. We have:

  • An option to list the files contained in the archive.
  • An option to add a file to the archive.
  • An option to extract a file from the archive.
  • An option to delete a file in the archive.

Every command takes at least the archive name and a password. The last three also take a filename.

If you want to try it, it was dumped here, thanks to Jonathan Salwan.

Part 1: Find the password

Sooo, the first part of the exercise requires us to find the password of the archive FluxArchiv.arc given. We started reversing the binary and noticed a first thing: Awesome, the symbols were not stripped! … Well actually they were shuffled, which is not that good, but it is not a real problem either. In this write-up, We will always keep the wrong names, but explain what they actually do.

We started following the path in main that lists the files and followed the code to understand what is done to the password. This can be easily done by following parsing of the command line arguments.

The first function called on the password argument is incorrectly named checkHashOfPassword. It will initialize a global buffer of length 0x14 named hash_of_password (correctly) with the SHA-1 digest of the given password. This function is simple.

If we continue to follow the listing option, it then checks that it can access the archive file given, fopens it and then calls encryptDecryptData, that really only checks the magic number of the archive format, at position 0x0: FluxArhiv13.

If this went OK, it will then call verifyArchiv. This function will do the interesting thing for this part. It will check that our password is correct.

It first fseeks to offset 0xC, and then reads 0x14 from the archive: another SHA-1 digest. Then it will fill an internal buffer with a re-ordered version of hash_of_password. It will then take this buffer and calculate the SHA-1 digest of it. This digest is compared to the one read from the archive. If it matches, the password is good.

So, in summary, the password is good if sha1(reorder(sha1(password))) equals to the 20 bytes at offset 0xC in the archive.

The subject says that the humans who created the archive were drunk and decided to use a 6 character, upper-case or digit password. That is 2.176.782.336 passwords possible. That looks brute-force worthy.

We first wrote the reordering part (the one that calculates the source index) in python to compute them all. Once done, we decided to write something to brute-force the algorithm. The source code of the brute-forcer can be found here. With 8 threads, it takes 2 minutes and 30-something seconds to go through the whole password space on my i7, and outputs one password: PWF41L.

Part 1 solved. For those interested, the archive contains 3 images and one mp3 file. They are not really useful.

Part 2: Find more!

OK so now that we have the password we can decrypt the data. Yes, indeed, the data is encrypted with RC4, using hash_of_password as the key. The decrypt part is in the function sanitizeFilename. First interesting thing: it is called a lot, and it always resets RC4. So you can’t decipher the whole archive in one shot. Damn, we must understand the format then.

The code is quite simple, but I am honestly bad at reverse engineering, so I decided to take this opportunity to try another approach for once: rewrite the program in C.

The complete source code can be downloaded here. It doesn’t contain the whole program but only the parts I needed to understand what the program was doing and how to finish this part.

I started by scrolling the functions randomly and trying to understand the simple ones. One that was really useful was listAllFilesInArchiv.

First, we can see in it a pattern we will find a lot: read 8 bytes, decrypt it and reverse it in a value byte per byte. I called this function read_int in my C code, it reads a 64-bit integer and switches its endianness.

So the function reads two integers (a andb) and then starts to do the interesting thing: It will clear both with zeros. Then it clears a field of size 0x10, and then a field of size 0x60.

Another pattern we will find often is a loop for i from 0 to b excluded, seek to a, read the integer at that position and use it as next a, then clear it and continue. In short, a is the offset of the next block in a linked list of blocks, and the first block contains 4 fields, with the second one being the number of blocks. Later we discovered that this is necessary because the last block doesn’t begin with an offset set to 0, but to some value to permit calculating its actual size. Here is the C:

void listAllFilesInArchiv(FILE* stream, unsigned int off) { // delete_file
    char ptr[0x8];
    uint64_t counter;
    uint64_t curpos;
    uint64_t nbblocks;
    uint64_t nextblock;

    fseek(stream, off, SEEK_SET);
    nextblock = read_int(stream);
    nbblocks = read_int(stream);
    fseek(stream, off, SEEK_SET);
    clear_data(stream, 8);
    clear_data(stream, 8);
    clear_data(stream, MD5_DIGEST_LENGTH);
    clear_data(stream, FILENAME_SZ);
    for (counter = 0; counter < nbblocks; ++counter) {
        fseek(stream, nextblock * 1040 + 0x20, SEEK_SET);
        curpos = ftell(stream);
        nextblock = read_int(stream);
        fseek(stream, curpos, SEEK_SET);
        clear_data(stream, 8);
    }
}

The second interesting thing about this function is that it is called on delete (we can see it from command parsing). So an interesting thing rises: if a file was added and then deleted, its data is still present in the archive. It is only deleted from the listing and its blocks are considered “free”.

The offset given to it comes from extractFileFromArchiv. This function starts by seeking to offset 0x20, so just after the global magic + the SHA-1 for the password. It checks a magic (“FluXL1sT”), then reads an integer and then checks for 8 structures of 128 bytes. This is the index! The integer read, if not null, is a link to the next list of 8 files (still beginning by the magic).

Now we have enough to use my technique to find the unused blocks, but I actually rewrote the complete file listing and extraction to make sure I did it correctly. I then basically logged every block used: all blocks used are 1040 bytes long (this is why we have 8 entries of 128 bytes). I then compared it to the possible list of blocks and just decrypted these blocks. The key was in block at address 0x28a20 + 0x8:

$ python hacklu2013-fluxarchiv-unused-blocks.py logs
Found unused block: 0x28200
Found unused block: 0x28610
Found unused block: 0x28a20
[...]
$ python hacklu2013-fluxarchiv-decrypt.py 0x28a28 0x410
b"[...] alike.\n\n+++The Mentor+++\n\nFlag: D3letinG-1nd3x_F4iL\n\n[...]"

Example logs here.

Conclusion

I didn’t finish the second part in time to have the points. I actually used techniques that took a lot of time, and I was quite slow anyway. My goal was not productivity. I took the first part as an opportunity to check that I remembered how to use pthread and the second part as a good example to try another technique for reverse engineering I never used before. Although it was a “slow” technique, it really helped me organise my thoughts and test/fetch data (like the offsets of used blocks, even though it was possible without).

It was interesting to see. Next time will be for speed!