hack.lu 2013: FluxArchiv Write-up (both parts)
For this exercise with two parts (400 and 500 points), we were given too files:
a binary named archiv
and some data named FluxArchiv.arc
. The two parts
involved the same binary.
When running the binary with no options, it displays an usage message containing the different options possible. We have:
- An option to list the files contained in the archive.
- An option to add a file to the archive.
- An option to extract a file from the archive.
- An option to delete a file in the archive.
Every command takes at least the archive name and a password. The last three also take a filename.
If you want to try it, it was dumped here, thanks to Jonathan Salwan.
Part 1: Find the password
Sooo, the first part of the exercise requires us to find the password of the
archive FluxArchiv.arc
given. We started reversing the binary and noticed a
first thing: Awesome, the symbols were not stripped! … Well actually they were
shuffled, which is not that good, but it is not a real problem either. In this
write-up, We will always keep the wrong names, but explain what they actually
do.
We started following the path in main
that lists the files and followed the
code to understand what is done to the password. This can be easily done by
following parsing of the command line arguments.
The first function called on the password argument is incorrectly named
checkHashOfPassword
. It will initialize a global buffer of length 0x14
named
hash_of_password
(correctly) with the SHA-1 digest of the given password. This
function is simple.
If we continue to follow the listing option, it then checks that it can access
the archive file given, fopen
s it and then calls encryptDecryptData
, that
really only checks the magic number of the archive format, at position 0x0
:
FluxArhiv13
.
If this went OK, it will then call verifyArchiv
. This function will do the
interesting thing for this part. It will check that our password is correct.
It first fseek
s to offset 0xC
, and then reads 0x14
from the archive:
another SHA-1 digest. Then it will fill an internal buffer with a re-ordered
version of hash_of_password
. It will then take this buffer and calculate the
SHA-1 digest of it. This digest is compared to the one read from the archive. If
it matches, the password is good.
So, in summary, the password is good if sha1(reorder(sha1(password)))
equals
to the 20
bytes at offset 0xC
in the archive.
The subject says that the humans who created the archive were drunk and decided
to use a 6 character, upper-case or digit password. That is 2.176.782.336
passwords possible. That looks brute-force worthy.
We first wrote the reordering part (the one that calculates the source
index) in python to compute them all. Once done, we decided to write something to
brute-force the algorithm. The source code of the brute-forcer can be found
here. With 8 threads, it takes 2 minutes and 30-something seconds to go
through the whole password space on my i7, and outputs one password: PWF41L
.
Part 1 solved. For those interested, the archive contains 3 images and one mp3 file. They are not really useful.
Part 2: Find more!
OK so now that we have the password we can decrypt the data. Yes, indeed, the
data is encrypted with RC4, using hash_of_password
as the key. The decrypt
part is in the function sanitizeFilename
. First interesting thing: it is
called a lot, and it always resets RC4. So you can’t decipher the whole archive
in one shot. Damn, we must understand the format then.
The code is quite simple, but I am honestly bad at reverse engineering, so I decided to take this opportunity to try another approach for once: rewrite the program in C.
The complete source code can be downloaded here. It doesn’t contain the whole program but only the parts I needed to understand what the program was doing and how to finish this part.
I started by scrolling the functions randomly and trying to understand the
simple ones. One that was really useful was listAllFilesInArchiv
.
First, we can see in it a pattern we will find a lot: read 8 bytes, decrypt it
and reverse it in a value byte per byte. I called this function read_int
in my
C code, it reads a 64-bit integer and switches its endianness.
So the function reads two integers (a
andb
) and then starts to do the
interesting thing: It will clear both with zeros. Then it clears a field of size
0x10
, and then a field of size 0x60
.
Another pattern we will find often is a loop for i
from 0
to b
excluded,
seek to a
, read the integer at that position and use it as next a
, then
clear it and continue. In short, a
is the offset of the next block in a linked
list of blocks, and the first block contains 4 fields, with the second one being
the number of blocks. Later we discovered that this is necessary because the
last block doesn’t begin with an offset set to 0, but to some value to permit
calculating its actual size. Here is the C:
void listAllFilesInArchiv(FILE* stream, unsigned int off) { // delete_file
char ptr[0x8];
uint64_t counter;
uint64_t curpos;
uint64_t nbblocks;
uint64_t nextblock;
fseek(stream, off, SEEK_SET);
nextblock = read_int(stream);
nbblocks = read_int(stream);
fseek(stream, off, SEEK_SET);
clear_data(stream, 8);
clear_data(stream, 8);
clear_data(stream, MD5_DIGEST_LENGTH);
clear_data(stream, FILENAME_SZ);
for (counter = 0; counter < nbblocks; ++counter) {
fseek(stream, nextblock * 1040 + 0x20, SEEK_SET);
curpos = ftell(stream);
nextblock = read_int(stream);
fseek(stream, curpos, SEEK_SET);
clear_data(stream, 8);
}
}
The second interesting thing about this function is that it is called on delete (we can see it from command parsing). So an interesting thing rises: if a file was added and then deleted, its data is still present in the archive. It is only deleted from the listing and its blocks are considered “free”.
The offset given to it comes from extractFileFromArchiv
. This function starts
by seeking to offset 0x20
, so just after the global magic + the SHA-1 for the
password. It checks a magic (“FluXL1sT”), then reads an integer and then checks
for 8 structures of 128 bytes. This is the index! The integer read, if not null,
is a link to the next list of 8 files (still beginning by the magic).
Now we have enough to use my technique to find the unused blocks, but I actually
rewrote the complete file listing and extraction to make sure I did it
correctly. I then basically logged every block used: all blocks used are 1040
bytes long (this is why we have 8 entries of 128 bytes). I then compared it to
the possible list of blocks and just decrypted these blocks. The key
was in block at address 0x28a20 + 0x8
:
$ python hacklu2013-fluxarchiv-unused-blocks.py logs
Found unused block: 0x28200
Found unused block: 0x28610
Found unused block: 0x28a20
[...]
$ python hacklu2013-fluxarchiv-decrypt.py 0x28a28 0x410
b"[...] alike.\n\n+++The Mentor+++\n\nFlag: D3letinG-1nd3x_F4iL\n\n[...]"
Example logs here.
Conclusion
I didn’t finish the second part in time to have the points. I actually used
techniques that took a lot of time, and I was quite slow anyway. My goal was not
productivity. I took the first part as an opportunity to check that I remembered
how to use pthread
and the second part as a good example to try another
technique for reverse engineering I never used before. Although it was a “slow”
technique, it really helped me organise my thoughts and test/fetch data (like
the offsets of used blocks, even though it was possible without).
It was interesting to see. Next time will be for speed!