
0xCAFEBABE ?  java class file format, an overview
Written by Bruno Pujos and Nassim Eddequiouaq
20140428 23:51:31Lately, we’ve been having a look into java. First, we tried to understand the fileformat. A java application is often presented in a .jar, which is basically a zip archive (you can also find .war files which are also zip archive). Inside this archive you'll find several files, especially some .class files which are the one containing the java bytecode. Those files are the one we'll look into.
The file begins with a header including the magic number (
0xCAFEBABE
), the minor version which is 0 and the major version for Java SE 7:0x0033
(51.00). Every number in the class file are stored in bigendian. Right after that header, we can find the Constant Pool count which is the number of entries in the constant pool table plus one and then the array. There are several entries representing several items in the constant pool like constants, classes, etc..After that, there is the access flag of the class, the
this_class
andsuper_class
identifiers which are indices in the constant pool in order to refer to the current class and the super class. This is followed by the interface table and its size, the table contains all the interfaces from which the current class inherits. Then we find the field table and size, followed by the methods and the attributes of the class.Here is mainly an overview of the class file.
Constant Pool
The constant pool is probably the most important part of the Class file. It contains all the information that will be needed on the other part of the file. The constant pool is an array containing several entries, the index of the array starts at 1, not 0. The different structures in the table do not have the same size, and so the constant pool may have a variable size. Each entry begins with a tag on one byte, indicating the type of entry:

CONSTANT_Utf8
: indicating an utf8 modified entry. Java uses a particular type of utf8 for representing the constant string values. 
CONSTANT_Integer
: representing a constant integer on 4 bytes, just like everything in the class file format the integer is a bigendian. 
CONSTANT_Float
: representing a float on 4 bytes, it follows the IEEE 754 floating point format, with possibility of representing both infinity and NaN. 
CONSTANT_Long
: same asCONSTANT_Integer
but represents the integer on 8 bytes. Something particular about this entry is that it is counting twice in the constant pool’s number of entries. 
CONSTANT_Double
: asCONSTANT_Float
it follows the IEEE 754 for the double format, likeCONSTANT_Long
it stores the number on 8 bytes and also counts twice in the constant pool. 
CONSTANT_Class
: this one is used to represent a class or an interface, it has only one caracteristic which is an index in the constant pool to aCONSTANT_Utf8
indicating the name of the class. 
CONSTANT_String
: its goal is to represent constant object of string type. likeCONSTANT_Class
, it only contains one information which is the index of aCONSTANT_Utf8
in the constant pool to represent the string’svalue. 
CONSTANT_Fieldref
: this represents a reference to a field. it contains the index ofCONSTANT_Class
to represent the class or interface in which the field is and the index of aCONSTANT_NameAndType
(see below) for representing the name and the field’s type. (http://docs.oracle.com/javase/specs/jvms/se7/html/jvms4.html#jvms4.3.2). 
CONSTANT_Methodref
: likeCONSTANT_Fieldref
, it contains aCONSTANT_Class
index and aCONSTANT_NameAndType
. TheCONSTANT_Class
must represent a class and not an interface. TheCONSTANT_NameAndType
must represent a method descriptor (http://docs.oracle.com/javase/specs/jvms/se7/html/jvms4.html#jvms4.3.3). 
CONSTANT_InterfaceMethodref
: it is similar to theCONSTANT_Methodref
type except that theCONSTANT_Class
entry must represent an interface. 
CONSTANT_NameAndType
: this structure is used to represent a field or method without indicating the class or interface it belongs to, it contains two indices in the constant pool which must have the typeCONSTANT_Utf8
, the first represents the name and the other one represents a valid descriptor of the field or the method. 
CONSTANT_MethodHandle
: this field is used to resolve symbolic reference to a method handle. The way of resolving a method depends on something called the bytecode behavior which is indicated by a kind indicator (from 1 to 9). It also contains a reference on two bytes which is an index in the constant pool pointing on aCONSTANT_Fieldref
,CONSTANT_Methodref
orCONSTANT_InterfaceMethodref
depending on the kind. 
CONSTANT_MethodType
: this field is used to resolve the method’s type, it contains an index to aCONSTANT_Utf8
which should represent the method’s type. 
CONSTANT_InvokeDynamic
: this structure is used by the invokedynamic instruction to specify a bootstrap method. It contains an index into the bootstrap method table (see attributes below) and an index into the constant pool to aCONSTANT_NameAndType
representing the method name and method descriptor.
Here is global overview of each of those structures:
General and Interfaces
After the Constant Pool, we can find several information about the current class, there is an information about the class name and the superclass. There are also general information about the class in the access flag. There are several access flag types for classes, fields and methods. The different kind of access flags are:
After the general information field, there is an information field about the interfaces. All the interfaces the class implements are represented in the interface table. Each entry in that table is a constant pool index representing a
CONSTANT_Class
which must be an interface.Attributes
Each field, method and class have others characteristics and informations. These information are contained inside attributes. There are several attribute types, each one of them can be applied to one or several fields, methods, classes and codes. The attributes are used to represent :
 Code
 Local variables, constant value, information about the stack and exceptions
 Inner Classes, Bootstrap Methods, Enclosing methods
 Annotations
 Information for debug/decompilation
 Complementary information (Deprecated, Signature...)
Each attribute begins by an index into the constant pool, it must point to a
CONSTANT_Utf8
entry telling which type of attribute this is. Afterward, since the different types of attributes have different structures, the attribute length is indicated. An implementation of the Java Virtual Machine is not necessary in order to handle each kind of attribute because knowing the length allows to pass an unhandle attribute and execute correctly the file.The most important attribute is probably the Code:
It begins with the common header, the attribute name index should point to a
CONSTANT_Utf8
representing the string"Code"
.It is followed by two variables: max stack and max locals which represent the stack size and the size of the local variables including the one used for passing arguments to methods. Then there is the code length and the code which is the bytecode that will be executed by the JVM when the method is called.
Right after that, you’ll find a table representing the exception handlers inside the functions, it indicates the start and the end of the zone where the exception should be catched, the start of the entry if the exception is raised and the catch type which is an index to a
CONSTANT_Class
into the constant pool. The catch type can also be 0, in this case it will be called with every exceptions, this is in generally used for the finally statement.After the exceptions section, it is possible to add some attributes for the code especially about the stack and the local variable. The Code is an attribute that may contain other attribute.
Fields & Methods
The fields and methods are added in two tables which contain the same elements. The access flags are different for the fields and methods since they are represented above.
After the flag section, we find the name which is an index to a
CONSTANT_Utf8
in the constant pool representing the name of the method/field. The descriptor index is also an index to aCONSTANT_Utf8
which represents a descriptor defining the method or field type.Finally, the method and field can have attributes, moreover a method will contain a code attribute which will contain itself the method code.
Conclusion
The class file is really important for the JVM and having a look at the file format explains a lot of things about the way the JVM work internally.
Recently Java SE 8 has been released, there are several small differences with Java SE 7 even though the major part of the class file has not changed. In particular, it defines new attributes :
RuntimeVisibleTypeAnnotations
,RuntimeInvisibleTypeAnnotations
andMethodParameters
.There are also several modifications in different sections changing the default behaviour of the JVM. It also adds precision and constraints to parts of the class file. The version number for Java SE 7 is 51.00 and 52.00 for Java SE 8.
We’ve written a parser of the class file format in Python3 that you can find here : java.py.
It uses the srddl module for python, available here : https://bitbucket.org/kushou/srddl
TweetPermalink & comments 

Issue 54 in Java
Written by Bruno Pujos, Fabien Goncalves and Louis Feuvrier
20140429 09:05:00Introduction
One of the quite recent (at least, not too old) and amusing things to look at when you are beginning to study security in java is the issue 54 from Security Exploitation. This issue is quite interesting, because it is a low level trick and is, so far, not patched.
Security in java
Before talking about this particular issue, let's see some basics about security in Java in general.
The first thing to know is what and why are we attacking java? Java is designed to run code from untrusted sources securely. This is a well known property and you can find it "everyday" in your browser with the java applets. When an applet is downloaded from a website the browser will run it and you don't want a potentially malicious attacker to have full permissions under your machine.
Java implements a system of permissions to limit possibilities for the code executed with unprivileged rights (the applet). The goal for an attacker will be to acquire full privileges from an unprivileged application, allowing full jeopardy of the computer. The traditionnal attacks (overflow, useafterfree...) are still working but there is an additional type which is less common : the sandbox bypass (which can itself be divided in several parts: unsafe reflection, least privilege violation...).
The security of Java is based on several things, the first is the gestion of the memory which is handled by the JVM (Java Virtual Machine) and not by the user. It first avoids most of the stupid errors developers can make, but it is also mandatory for running code safely (if we can do what we want with the memory we already have the same privilege that the program).
The second part of the security is handled at the loading of a class. This loading process is divided into two parts: the class loader and the bytecode verifier.
The class loader has a similar goal as the dynamic linker in Unix systems. There are several implementations (classes) of the class loader, like the applet class loader which can load code over the internet from a website. All class loaders inherit from
java.lang.ClassLoader
. Of course, a class loader has to take some precautions not to execute malicious code. In particular, it will have to check that we are not trying to spoof a System class which would allow us to bypass all security protections.During the validation step by the class loader, the bytecode will be checked by the bytecode verifier. It is called from the class loader through the method
defineClass
. It will not perform any check of logic but only check that the bytecode is valid and other various things, for example that it is not overflowing the stack. Once the bytecode verifier has done his work, if the class loader validates the class, the code is considered to be of no harm to the JVM (this doesn't mean you have all the privileges).The last important part in java security is the security manager, it's the part which will check all the permissions during runtime. If unprivileged code tries to do something forbidden, it will raise an exception. The basic class for the security manager is
java.lang.SecurityManager
Usually, the security manager will be retrieved by a call togetSecurityManager
(java.lang.System
).If the security manager is set to
null
, no check is performed and the code runs with full privileges. Therefore, the goal of a lot of exploits will be to rewrite the security manager tonull
. Some permissions allow to change the security manager and to set it tonull
(AllPermission
,setSecurityManager
,createClassLoader
,accessClassInPackage.sun
...). Typically, a permission check looks like this:1 2 3 4
// From AppletClassLoader.java SecurityManager sm = System.getSecurityManager(); if (sm != null) sm.checkPackageAccess(name.substring(0, i));
The method
checkPackageAccess
and all the other check functions will throw an error if the code doesn't have the rights to perform the action desired. The check looks into the stackcall and if it finds an unprivileged function, it throws an exception. To go from unprivileged code to privileged code, Java uses theActionController.doPrivileged
method:1 2 3 4 5
AccessController.doPrivileged(new PrivilegedAction() { public Object run() { // insert priviledge code here } });
This check is performed by the security manager and will stop at the first doPrivileged it finds.
The MethodHandle resolution mecanism
In the constant pool of a class file it is possible to define a
MethodHandle
. This entry in the constant pool contains two elements: the reference kind and the reference index. The reference kind characterizes the bytecode behavior of the methodhandle, there are 9 possible kinds, as follow:1 2 3 4 5 6 7 8 9
REF_getField REF_getStatic REF_putField REF_putStatic REF_invokeVirtual REF_invokeStatic REF_invokeSpecial REF_newInvokeSpecial REF_invokeInterface
All 9 kinds are used to get a
MethodHandle
. This object can reference not only methods but also fields, constructors "and similar lowlevel operations". The kinds 1 to 4 are used to create aMethodHandle
on a field and the reference index must point to aCONSTANT_Fieldref
. For kinds 5 to 8 the reference index must point to aCONSTANT_Methodref
. It is used to get aMethodHandle
on a method.The last kind (
REF_invokeInterface
) is used forCONSTANT_InterfaceMethodref
and returns aMethodHandle
for an interface method. The interesting part about the use of aCONSTANT_MethodHandle
into the constant pool is that the creation of theMethodHandle
is done at the loading of the class file. Theoretically, it should make no difference between retrieving theMethodHandle
at the loading of the class and after the loading. We will see that it's not the case.Issue 54: the vulnerability
The issue 54 has been found by Security Exploitation and is well documented (http://www.securityexplorations.com/materials/se20120154.pdf). The usual way to get a Method Handler of a function in a class is to call the public method
findVirtual
from theMethodHandles.Lookup
module. The code of this method is the following:1 2 3 4 5 6
public MethodHandle findVirtual(Class<?> refc, String name, MethodType type) throws NoSuchMethodException, IllegalAccessException { MemberName method = resolveOrFail(refc, name, type, false); checkSecurityManager(refc, method); return accessVirtual(refc, method); }
In this code we can see the call to the method
checkSecurityManager
which checks whether the calling code has the right to get theMethodHandle
. In particular, it will forbid to get aMethodHandle
on a private method of a superclass. On the other hand, when getting a MethodHandle at class loading with aREF_invokeVirtual
, the method called isresolveVirtual
:1 2 3 4 5
private MethodHandle resolveVirtual(Class<?> refc, String name, MethodType type) throws NoSuchMethodException, IllegalAccessException { MemberName method = resolveOrFail(refc, name, type, false); return accessVirtual(refc, method); }
We can see that the only difference between these two functions is the call to the
checkSecurityManager
function which is not done in theresolveVirtual
method. TheresolveVirtual
function is of course private but during loading it is called by the class loader. That means that a specially crafted class can get virtual and static methods (the same issue exists with findStatic and resolveStatic) from a class, allowing to have a validMethodHandle
on something we shouldn't have had access to.This issue is also present in most of the different kinds of
CONSTANT_MethodHandle
entries in the constant pool of a class file. Still, this issue alone does not allow to execute code from an untrusted source as privileged. When Security Exploitation reported that vulnerability, they used a second one (Issue 55 http://www.securityexplorations.com/materials/SE201201ORACLE10.pdf) to get the execution of code as privileged. The issue 55 allows to bind aMethodHandle
to an object instance of incompatible type. This could allow to set the securitymanager to null, bypassing all the permission protection.Issue 54: the exploitation
With this issue, Security Exploitation has released a demonstration of the the issues 54 and 55 (http://www.securityexplorations.com/materials/se2012015060.zip). In particular, it contains a class
MyCL.class
which is "hand made", and contains the exploitation of issue 54. Here is the constant pool of this class:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
CONSTANT_MethodRef(10) 5, 16 CONSTANT_MethodRef(10) 5, 17 CONSTANT_String(8) 10 CONSTANT_Class(7) 18 CONSTANT_Class(7) 19 CONSTANT_Utf8(1) 6 : <init> CONSTANT_Utf8(1) 3 : ()V CONSTANT_Utf8(1) 4 : Code CONSTANT_Utf8(1) 15 : LineNumberTable CONSTANT_Utf8(1) 5 : dummy CONSTANT_Utf8(1) 57 : (Ljava/lang/String;[BIILjava/security/ProtectionDomain;)V CONSTANT_Utf8(1) 18 : get_defineClass_mh CONSTANT_Utf8(1) 20 : ()Ljava/lang/Object; CONSTANT_Utf8(1) 10 : SourceFile CONSTANT_Utf8(1) 9 : MyCL.java CONSTANT_NameAndType(12) 6, 7 CONSTANT_NameAndType(12) 20, 21 CONSTANT_Utf8(1) 4 : MyCL CONSTANT_Utf8(1) 21 : java/lang/ClassLoader CONSTANT_Utf8(1) 11 : defineClass CONSTANT_Utf8(1) 73 : (Ljava/lang/String;[BIILjava/security/ProtectionDomain;)Ljava/lang/Class; CONSTANT_MethodHandle(15) REF_invokeVirtual(5), 2
We can see that the entry 22 is a
CONSTANT_MethodHandle1
with the kindREF_invokeVirtual
for exploiting the vulnerability and refer to theCONSTANT_MethodRef
at the entry 2, which is in the class java.lang.ClassLoader the methoddefineClass
.The
defineClass
method injava.lang.ClassLoader
is a protected final method and it should be impossible to have an handle on this method and obviously to call it.In
MyCL.class
we have three methods :<init>
which is for the initialisationdummy
get_defineClass_mh
which returns theCONSTANT_MethodHandle
at the entry 22 of the constant pool.
The other part of the exploit is for the issue 55 which will allow to bind the method handle to another class and get it called with privileged rights, allowing a sandbox bypass.
Conclusion
This issue, even if a little old, is really interesting because it puts in light some internals of the class loading process which are often unclear. It is also really disturbing because Oracle doesn't seem to consider this issue as a problem, indicating that this was an "allowed behavior". Still not patched, this issue can be used for developing exploits, enlarging the possibility for finding vulnerabilities. Even if this issue was basically focused on a MethodHandle pointing to a method, the same problem exists with MethodHandle pointing on a field, allowing to gain even more access.

Using SAT and SMT to defeat simple hashing algorithms
Written by Pierre Bourdon
20120731 18:30:00Note: this article uses MathJax to display formulas written in TeX. Please enable Javascript in order to see the formulas correctly.
One week has passed since the end of LSE Week 2012 and I have received several partial solutions for the crackme that was released at the start of LSE Week for people to play with. Most people who bothered writing partial solutions were able to break the packing and anti debugging parts of it, but stopped at the very end when they faced a simple hashing algorithm they had to reverse to generate a valid key for the crackme. In pseudocode, the algorithm was the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
a, b, c, d are the four 32 bits integers given as input (key) Compute a simple checksum in order to avoid having several good solutions to the problem: if ((ROTL((a ^ b)  (c ^ d), 17) ^ (a + b + c + d)) != 0xa6779036) return 0; Kind of useless step just to make things a bit harder a = a XOR c b = b XOR d Then, 128 times in a row, for each integer: Shuffle the bits of the number (using a predefined table) XOR the number with a predefined constant Rotate left the number by N bits (N being another constant) Check if: a == 0x8e2c4c74 b == 0xa6c27e2a c == 0xf5e15d3d d == 0x7bebc2ba
Clever people might notice that all of the operations done by that "hashing" algorithm are actually non destructive and completely reversible. That means our hashing function is bijective (no collisions) and that it is very easy to get the input from the output: just run it in reverse (rotate left becomes rotate right, XOR stays the same, shuffle uses a slightly modified table). It was meant to keep the crackme easy to crack once the code has been recovered and understood. Unfortunately, some last minute bugs cropped up in the implementation of the algorithm (never try to fix bugs at 4AM without automated tests...) and made the algorithm completely different:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
a, b, c, d are the four 32 bits integers given as input (key) Compute a simple checksum in order to avoid having several good solutions to the problem: if ((ROTL((a ^ b)  (c ^ d), 17) ^ (a + b + c + d)) != 0xa6779036) return 0; a = a XOR c b = b XOR d Then, 128 times in a row, for each integer: Shuffle the bits of the number using a table that might map some bits two times, and some other bits zero times (DESTRUCTIVE) XOR the number with a predefined constant Rotate left the number by N bits (N being another constant), except if this is the last number  in this case, rotate the third integer and use it as the new value for the last integer Check if: a == 0x8e2c4c74 b == 0xa6c27e2a c == 0xf5e15d3d d == 0x7bebc2ba
The first error (in the shuffling part) comes from an indexing error in my bits position table. The table was defined like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
// Maps the input bit position to the output bit position static const char mapping[128] = { // First integer 2, 14, 4, 24, 7, 31, 16, 18, 30, 17, 12, 27, 6, 26, 9, 22, 1, 28, 5, 3, 11, 23, 13, 25, 19, 20, 10, 29, 8, 15, 21, 0, // Second integer 26, 20, 15, 27, 28, 14, 21, 7, 17, 22, 31, 12, 4, 13, 8, 10, 23, 19, 18, 25, 9, 2, 5, 11, 6, 3, 24, 1, 0, 30, 29, 16, // Third integer 4, 26, 20, 13, 21, 29, 3, 14, 5, 22, 18, 6, 28, 23, 16, 10, 15, 27, 25, 1, 17, 0, 30, 2, 8, 24, 7, 9, 31, 19, 12, 11, // Fourth integer 8, 22, 26, 1, 20, 2, 30, 23, 6, 9, 0, 14, 18, 31, 3, 21, 4, 29, 24, 7, 12, 28, 16, 25, 11, 17, 19, 27, 5, 10, 15, 13 };
But the indexing was done like this:
1 2 3
// *pn points to the current integer, i is the index of this integer (0, 1, // 2 or 3), j is the current bit. newn = ((*pn >> scramble[(i << 2) + j]) & 1) << j;
That
i << 2
should actually be ai << 5
in order to use the whole mapping table. This bug makes the algorithm destructive because some bits from the input will not be used to generate the output. That means you can't get the input of the step from its output: the destroyed bits could have been 0 or 1.The second bug is actually a stupid typo:
1 2 3 4
a = ROTL(a, 7); b = ROTL(b, 13); c = ROTL(c, 17); d = ROTL(c, 25);
I don't think this requires much explanation.
Now that the context of this article has been explained, the real question for me was the following: do these errors make the crackme unsolvable or can it still be solved easily using either bruteforce or more complex analysis techniques?
SAT and its applications to cryptography
I started writing a bruteforcer for this hash using backtracking for each destroyed bit and only exploring the branches that would be valid later on by predicting as much as possible. Unfortunately, while that worked for a small number of iterations of the hash, the original algorithm used 128 iterations and the number of possible combinations increased a lot too fast to use such a simple technique.
Two days later I got reminded by a friend of a talk presented by Mate Soos at Hackito Ergo Sum 2011 about SAT solvers and their application to cryptography for breaking weak ciphers and hashes. Mate is the author of CryptoMiniSat, a very fast implementation of SAT with a few tweaks that can be used to increase efficiency for crypto usages.
Before going into the details of how to use SAT to break ciphers, let's talk a little bit about SAT solvers. SAT solvers are programs designed to solve the Boolean Satisfiability Problem, which can be expressed very simply like this: For this boolean formula, can I find values for the variables that make the formula true. This is an NPcomplete problem (which means you can't solve the general case of that problem in polynomial time, only exponential time or slower) and is actually kind of the canonical NPcomplete problem: it is a very common technique to reduce a problem to show that it is equivalent to SAT in order to prove that it is an NP complete problem.
Most SAT solvers take their input in a format called DIMACS, which is an easytoparse representation of boolean formulas in CNF (Conjunctive Normal Form). A CNF formula is a special case of boolean formula which is always written like this:
$$(X_i \vee \neg X_j) \wedge (\neg X_k \vee X_l \vee \neg X_m \vee X_n) \wedge \ldots$$Basically, CNF is a logical product (aka. conjunction) of sums of variables or negated variables (\(\neg A\)). Every boolean formula can be converted into an equivalent CNF formula, either manually (distribute the \(\vee\) over products) or through an automated process (there are some conversion tables between simple boolean equations and their CNF equivalent).
SAT solvers have a lot of applications and tend to be very optimized in order to have extremely good performances in most cases. It is common to try to solve SAT problems with several hundreds of thousands of clauses (a clause is a single sum of variable, like \(A \vee B \vee \neg C \vee D\)) and tens of thousands of variables.
In his talk last year, Mate Soos told us about how HiTag2 (a cryptosystem used in car locks) was reverse engineered, then translated to mathematical formulas and finally converted to CNF formulas describing the relations between input bits and output bits. If you are interested by that talk, it is available on Youtube. I thought that this technique might be of some use in breaking my hash algorithm and started translating the hash algorithm to an equivalent CNF representation.
Breaking the hash with SAT
First of all, the definition of the algorithm would most likely use several thousands of clauses and about as much variables, so writing it by hand is out of the question. I started by writing a very simple library to generate DIMACS files, which exposes the following Python API:
CNFGenerator.new_var()
generates a new SAT variable instance, which has only one operation: logical negation (writtenx
)CNFGenerator.add(v1, v2, ..., vN)
adds a clause to the output DIMACS fileCNFGenerator.output()
outputs the DIMACS representation
CryptoMiniSAT also provides a very useful extension to DIMACS for cryptographical uses: the ability to use XOR clauses which are \(A \oplus B \oplus C \oplus \ldots\). These prove very useful in order to write equivalences (\(A \Leftrightarrow B \equiv \neg (A \oplus B) \equiv \neg A \oplus B\)) or simply XOR relations.
CNFGenerator.add_xor
handles the generation of such clauses.Let's start by defining our input variables. They are four vectors of 32 bits, so 128 boolean variables:
1 2 3 4 5 6 7 8
def cnf_int(gen, bits): return [gen.new_var() for i in range(bits)] # Input variables a = cnf_int(gen, 32) b = cnf_int(gen, 32) c = cnf_int(gen, 32) d = cnf_int(gen, 32)
If you follow the pseudocode above, the next step would normally be the checksum. This is actually the hardest part of this algorithm to formalize because of the arithmetic operations (additions and substractions of 32 bits numbers). We'll do that last. The following step is
a ^= c; b ^= d
. This is quite easy to formalize. Let's do it for the general case , i.e.a = b ^ c
. What this does is "make each bit ofa
equal to the same bit ofb
XOR the same bit ofc
". To formalize it, we can introduce a new variable \(A\) which is equivalent to \(B \oplus C\), which means the clause can only be true iff \(A\) has the same value as \(B \oplus C\). We just need to write that in a form that CryptoMiniSat can understand:$$\begin{aligned} A \Leftrightarrow (B \oplus C) & \equiv \neg (A \oplus (B \oplus C)) \\\\ & \equiv \neg (A \oplus B \oplus C) \\\\ & \equiv \neg A \oplus B \oplus C \end{aligned}$$Applied to the 32 bits of the variables and converted to Python, this gives us the following code:
1 2 3 4 5 6 7 8 9
def cnf_xor(gen, a, b): out = [gen.new_var() for i in range(len(a))] for (a, b, o) in zip(a, b, out): gen.add_xor(a, b, o) return out # a ^= c, b ^= d a = cnf_xor(gen, a, c) b = cnf_xor(gen, b, d)
Next comes the core of the hashing algorithm: the iterated loop shuffling the bits, XORing with a constant and rotating the number. The interesting part here is that shuffling and rotating bits does not require any clause or additional variables for the SAT representation of the algorithm: For example, if you have a 4 bit integer represented as the vector \(A_3 A_2 A_1 A_0\), rotating it to the left by 2 bits transforms it to the vector \(A_1 A_0 A_3 A_2\). You just need to swap the elements in the list representing your variables. This gives us the following Python code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
def cnf_rotl(gen, n, b): """Performs a left rotation of n by b bits""" return n[b:] + n[:b] def cnf_hash(gen, a, b, c, d): """Hashes a, b, c, d, returns new a, new b, new c, new d""" out = [] for i, n in enumerate((a, b, c, d)): scrambled = [n[SCRAMBLE_TABLE[i][j]] for j in range(len(n))] xored = cnf_xor_const(gen, scrambled, XOR_TABLE[i]) out.append(xored) out[0] = cnf_rotl(gen, out[0], ROT_TABLE[0]) out[1] = cnf_rotl(gen, out[1], ROT_TABLE[1]) out[2] = cnf_rotl(gen, out[2], ROT_TABLE[2]) out[3] = cnf_rotl(gen, out[2], ROT_TABLE[3]) return out # Iterate the hash 128 times for a, b, c and d for i in range(128): a, b, c, d = cnf_hash(gen, a, b, c, d)
cnf_xor_const
works the same ascnf_xor
but "optimized" in order to XOR with a constant number instead of a variable number.Now that we computed the hashed values, we just need to put some clauses to make sure they are equal to the hash we are looking for. In the crackme, the hash value was
8e2c4c74a6c27e2af5e15d3d7bebc2ba
. To make sure one of our boolean vectors is equal to a constant value, we add one clause per bit of the vector which forces it to True if the corresponding bit in the constant value is 1, and False if the bit is 0:1 2 3 4 5 6 7 8 9 10 11 12 13 14
def cnf_equal(gen, n, c): for i in range(len(n)): b = c & 1 c >>= 1 if b: gen.add(n[i]) else: gen.add(n[i]) # Check for equality cnf_equal(gen, a, 0x8e2c4c74) cnf_equal(gen, b, 0xa6c27e2a) cnf_equal(gen, c, 0xf5e15d3d) cnf_equal(gen, d, 0x7bebc2ba)
With only this code, the SAT solver will generate us values for a, b, c and d that compute to the hash we are looking for. However, we still have to defeat the checksum. Let's look at its code again:
1 2
if ((ROTL((a ^ b)  (c ^ d), 17) ^ (a + b + c + d)) != 0xa6779036) return 0;
We already now how to compute XORs, rotations and how to check for number equality, so the remaining part is additions and substractions on 32 bit numbers. As you may already know, substraction is actually very easy to implement in terms of addition and two's complement, which is itself very easy to implement in terms of binary inversion and addition:
$$x  y \equiv x + COMPL2(y) \equiv x + INVERT(y) + 1$$1 2 3 4 5 6 7 8 9 10 11
def cnf_invert(gen, n): inv = [gen.new_var() for b in n] for (b, i) in zip(n, inv): gen.add(b, i) gen.add(b, i) return inv def cnf_sub(gen, a, b): invb = cnf_invert(gen, b) complb = cnf_add(gen, invb, cnf_const32(gen, 1)) return cnf_add(gen, a, complb)
Addition on 32 bit integers is however a lot harder to define. If you did a bit of electrical engineering or if you have implemented an ALU (in HDL, with wires and logic gates, or even in Minecraft) you may know a very common way to define addition using two half adders to make a 1 bit full adder. Here is what a full adder looks like (image courtesy of Wikipedia):
It takes two bits, A and B, as well as a carry from a previous adder (
Cin
), and outputs the sumA + B + Cin
and the carry resulting from that sumCout
. You can then chain these 1 bit full adders to make a 32 bit adder (again, image from Wikipedia):Writing the truth table of a 1 bit full adder and simplifying the equations a bit, you get the following equations for
S
andCout
fromA
,B
andCin
:$$\begin{aligned} S & \equiv \overline{A} B \overline{C_{in}} \vee A \overline{B C_{in}} \vee A \overline{B} C_{in} \vee A B C_{in} \\\\ C_{out} & \equiv A B \vee A C_{in} \vee B C_{in} \end{aligned}$$You can then translate these formulas to CNF to describe a 1 bit full adder for the SAT solver. However doing that manually is a lot of work (especially if you're like me and never had proper formation on CNF and how to convert formulas to that form), so we're just going to use the boolean algebra package from Sage to do it automatically:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
sage: import sage.logic.propcalc as propcalc sage: f = propcalc.formula("d <> (~a&b&~c  a&~b&~c  ~a&~b&c  a&b&c)") sage: f.convert_cnf_table() sage: f (dab~c)&(da~bc)&(d~abc)&(d~a~b~c)&(~dabc)&(~da~b~c)&(~d~ab~c)&(~d~a~bc) sage: f = propcalc.formula("d <> (a&b  a&c  b&c)") sage: f.convert_cnf_table() sage: f (da~b~c)&(d~ab~c)&(d~a~bc)&(d~a~b~c)&(~dabc)&(~dab~c)&(~da~bc)&(~d~abc) sage: import sage.logic.propcalc as propcalc sage: f = propcalc.formula("d <> (~a&b&~c  a&~b&~c  ~a&~b&c  a&b&c)") sage: f.convert_cnf_table() sage: f (dab~c)&(da~bc)&(d~abc)&(d~a~b~c)&(~dabc)&(~da~b~c)&(~d~ab~c)&(~d~a~bc) sage: f = propcalc.formula("d <> (a&b  a&c  b&c)") sage: f.convert_cnf_table() sage: f (da~b~c)&(d~ab~c)&(d~a~bc)&(d~a~b~c)&(~dabc)&(~dab~c)&(~da~bc)&(~d~abc)
We can then convert the CNF clauses Sage gives us directly to Python:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
def cnf_1bitadder(gen, a, b, c): res = gen.new_var() res_carry = gen.new_var() # (da~b~c)&(d~ab~c)&(d~a~bc)&(d~a~b~c)&(~dabc)&(~dab~c)&(~da~bc)&(~d~abc) gen.add(res_carry, a, b, c) gen.add(res_carry, a, b, c) gen.add(res_carry, a, b, c) gen.add(res_carry, a, b, c) gen.add(res_carry, a, b, c) gen.add(res_carry, a, b, c) gen.add(res_carry, a, b, c) gen.add(res_carry, a, b, c) # (dab~c)&(da~bc)&(d~abc)&(d~a~b~c)&(~dabc)&(~da~b~c)&(~d~ab~c)&(~d~a~bc) gen.add(res, a, b, c) gen.add(res, a, b, c) gen.add(res, a, b, c) gen.add(res, a, b, c) gen.add(res, a, b, c) gen.add(res, a, b, c) gen.add(res, a, b, c) gen.add(res, a, b, c) return res, res_carry
Probably not the nicest way to do it, but most likely one of the simplest way. We can then use that one bit adder to make a 32 bit adder:
1 2 3 4 5 6 7 8 9 10
def cnf_add(gen, a, b): carry = gen.new_var() gen.add(carry) # The first carry is always 0 out = [] for (a, b) in zip(a, b): res, carry = cnf_1bitadder(gen, a, b, carry) out.append(res) return out
With this we can finally implement our checksum!
1 2 3 4
sum = cnf_add(gen, a, cnf_add(gen, b, cnf_add(gen, c, d))) sub = cnf_sub(gen, cnf_xor(gen, a, b), cnf_xor(gen, c, d)) cksum = cnf_xor(gen, cnf_rotl(gen, sub, 17), sum) cnf_equal(gen, cksum, 0xa6779036)
Running our Python program generates a DIMACS file with 17061 variables and 19365 clauses. CryptoMiniSat can find a set of values that satisfy the clauses in less than 0.05s on my Sandy Bridge based laptop. For example,
a = 0xe9e708e1, b = 0xf7e4c55a, c = 0x85e77db9 and d = 0x5467bd3c
pass both the checksum and the hash and are considered a valid solution.Using Z3 to make things easier
At first I planned to stop there: I had a proof that the crackme was still doable even with that broken hash algorithm. However when I explained what I was doing, a friend of mine told me about SMT solvers. One of their characteristics is that they can work on boolean algebra, but also functions and linear combinations of integer and real variables. For example, you can use an SMT solver for this kind of problem:
$$x^2 + y^2 < 1, 2x + y > 1$$I looked a bit at recent SMT solvers to see if it could make cracking my hash easier. I used the Z3 theorem prover from Microsoft Research, which is not open source but has Linux binaries and nice interfaces for programming languages like Python and OCaml. Z3 can work on real numbers, integers, functions but also bit vectors, and has a nice API to do so.
As expected, things are a lot easier when your solver has native support for your native problem representation (here, bit vectors and unsigned integers). The code cracking the hash using Z3 is a fair bit slower (still less than 5s) but also much shorter and easier to understand:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
def rotl32(n, sa): return (n << sa)  LShR(n, 32  sa) def hash(a, b, c, d): out = [] for i, n in enumerate((a, b, c, d)): nn = BitVecVal(0, 32) for j in range(32): nn = (LShR(n, SCRAMBLE_TABLE[i][j]) & 1) << j nn ^= XOR_TABLE[i] out.append(nn) out[0] = rotl32(out[0], ROT_TABLE[0]) out[1] = rotl32(out[1], ROT_TABLE[1]) out[2] = rotl32(out[2], ROT_TABLE[2]) out[3] = rotl32(out[2], ROT_TABLE[3]) return out if __name__ == '__main__': s = Solver() a = BitVec('a', 32) b = BitVec('b', 32) c = BitVec('c', 32) d = BitVec('d', 32) checksum = rotl32((a ^ b)  (c ^ d), 17) ^ (a + b + c + d) a ^= c b ^= d for i in range(128): a, b, c, d = hash(a, b, c, d) solve(checksum == 0xa6779036, a == 0x8e2c4c74, b == 0xa6c27e2a, c == 0xf5e15d3d, d == 0x7bebc2ba)
Here the shorter code is mostly due to the fact I did not have any nice API to use CryptoMiniSat and to translate arithmetic operations to CNF. SMT solvers do not provide that much of an edge over SAT solvers for these kind of problems: they shine a lot more as soon as you introduce functions or real numbers that can't easily be expressed as a bit vector.
Conclusion
Sometimes when simple bruteforce does not work you have to go a bit further to reverse a hash algorithm, and using a SAT solver enables you to do just that. The problem is not always easy to formalize, especially when you start using complex operations that can't easily be translated, but using SAT solvers for cryptography is a very interesting technique that has already proven itself a lot of times in the past, and will probably become more and more useful in the future as SAT solvers and ways to formalize hard problems (like AES) evolve.

Getting back determinism in the Low Fragmentation Heap
Written by Bruno Pujos
20141102 22:07:00Introduction
The Low Fragmentation Heap is the Front End allocator for the userland in modern Windows OS. It has been introduced in 2001 in Windows XP and is used by default since Windows Vista. Microsoft has introduce a lot of new mitigations in response to
generic
attack against the LFH since its first disclosure in Windows XP. One of them, introduced in Windows 8, is the nondeterminism of the allocation. This mitigation has quite a lot of consequences because it will break most of the overflows exploits but also useafterfree exploits.Low Fragmentation Heap
The goal of the Low Fragmentation Heap is, as its names says, to reduce the fragmentation. It is not a different heap but a different policy. Here is a global overview of how the allocation is made by Windows:
The HeapAlloc and HeapFree functions will make some tests and decide if it should call the backend or the frontend, which means the LFH since the removal of the LookAsideList. There are some conditions for an allocation to be made by the LFH: The size must be inferior to 0x4000 bytes The LFH must be activated for this heap. It is possible to deactivate the LFH in a heap by setting HEAP_NO_SERIALIZE when creating a heap with HeapCreate. The LFH must be activated for this allocation. It is also possible to deactivate the LFH for a particular allocation (still with HEAP_NO_SERIALIZE). There must be enough allocation for activating the LFH for this size.
The function charge of the allocation for the LFH is
RtlpLowFragHeapAllocFromContext
and the one for the backend isRtlpAllocateHeap
which will use the VirtualAlloc from the system.If you want more details about how the LFH works, you can look at my presentation in french about this subject during the lse summer week 2014, or read the excellent paper Windows 8 Heap Internals by Chris Valasek and Tarjei Mandt.
Mitigation
Microsoft has introduced two factors of randomization with Windows 8. The first one is to add a random offset for each virtual allocation, its primary goal is removing the predictability of heap metadata for preventing their corruption. The second is for randomizing the UserBlock returned by an allocation.
Before those changes, a simple code like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
#include <Windows.h> #include <stdio.h> #include <iostream> int main() { HANDLE hHeap = GetProcessHeap(); int i = 0; int size = 0x40; LPVOID chunk; // lfh activation while (i < 0x10) { chunk = HeapAlloc(hHeap, 0, size); i++; } chunk = HeapAlloc(hHeap, 0, size); printf("\n\nchunk1: %p\n", chunk); HeapFree(hHeap, 0, chunk); chunk = HeapAlloc(hHeap, 0, size); printf("\n\nchunk2: %p\n", chunk); HeapFree(hHeap, 0, chunk); }
would give us an output which is predictable and will allow us to trigger a simple useafterfree:
1 2
chunk1: 011D8CC0 chunk2: 011D8CC0
The same code since Windows 8 will give us this result:
1 2
chunk1: 011D8CC0 chunk2: 011D8DE0
We can clearly see that the result is not easily predictable anymore. The next question is how is this implemented ? The implementation is pretty simple, in
RtlpCreateLowFragHeap
there is a call toRtlpInitializeLfhRandomDataArray
:It will basically fill the
RtlpLowFragHeapRandomData
array with 0x100 random values. This array will then be used with a value calledLowFragHeapDataSlot
from the TEB (Thread Environment Block) which is use for storing value by thread.1 2 3 4 5 6
mov ecx, large fs:18h ; getting the current TEB movzx esi, word ptr [ecx+0x0FAA] ; getting LowFragHeapDataSlot lea eax, [esi+1] ; adding 1 to LowFragHeapDataSlot and eax, 0xff ; checking we don't go past the 0x100 mov [ecx + 0x0faa], ax ; rewrite LowFragHeapDataSlot + 1 movzx eax, byte ptr RtlpLowFragHeapRandomData[esi] ; getting the value at offset of LowFragHeapDataSlot
This value is then used with a ror on the bitmap to determine the position where to start, and from this position it will look for the first free block in the bitmap which will allow to make two consecutive allocations to not be at predictable place.
Attacks
In 2008 at ruxcon, Ben Hawkes presented Attacking the Vista Heap in its slides he made several claims and more precisely :
Application specific attacks are the future
. This is probably more true than ever with a heap implementation like the one we get since Windows 8 and the nondeterminism of the allocation is a big problem even for application specific attacks. This mitigations has been well documented but I haven't found any documentation about how to bypass it.First attack
The first basic attack we can think of is the idea to fill all the slots except one with the object we want to overwrite. We then fill the last slot remaining with the object in which we can overwrite.
This technic will work perfectly for useafterfree because we don't care about the data locate right before or right after it. On the contrary for a classic overflow we really care about what is after (I don't consider underflow but it is basically the same principle). If you control the whole userblock you have fair chances of success: the only case for which it will not work is if we get the last chunk.
But if there is some allocations made for the user blocks the probability of success will decrease a lot because you have more position which will end with a failure.
It also means that you need to be able to allocate enough time the data you want to overflow which could not be the case and in the end you will not be able to know which one you overflowed so you will need a way to test this.
Second attack
In the mitigation part we saw that for getting a random value the
RtlpLowFragHeapAllocFromContext
function take a value from the prepopulatedRtlpLowFragHeapRandomData
array. This array is changed in only two occasions: in theRtlpInitializaLfhRandomDataArray
function when creating the LFH in theRtlpSubSegmentInitialize
function which is called byRtlpLowFragHeapAllocFromContext
when a subsegment needs initialization So basically this array is not repopulated often, and we can make allocation without the array being changed. The counter is stored in the TEB and is incremented after each use of the array. Since the counter must stay inferior to array's size, a modulo 0x100 is made. That means that we can just make 0xff allocations and going back to the same position in the array. The following code will allow us to get twice the same chunk:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
#include <Windows.h> #include <stdio.h> #include <iostream> int main() { HANDLE hHeap = GetProcessHeap(); char c; int i = 0; int size = 0x40; LPVOID chunk; // activating the LFH for the size we choose while (i < 0x10) { chunk = HeapAlloc(hHeap, 0, size); i++; } // making the allocation we want chunk = HeapAlloc(hHeap, 0, size); printf("chunk: %p\n", chunk); HeapFree(hHeap, 0, chunk); // making 0xff allocation for getting back // to the same point in the RtlpLowFragHeapRandomData i = 0; while (i < 0x100  1) { chunk = HeapAlloc(hHeap, 0, size); HeapFree(hHeap, 0, chunk); i++; } // reallocating : we get the same chunk chunk = HeapAlloc(hHeap, 0, size); printf("chunk: %p\n", chunk); }
So even if we don't know where the allocation took place we get back to our first result with twice the same chunk:
1 2
chunk: 012A9A40 chunk: 012A9A40
This is really good for a useafterfree and we don't have to fill all the space in the userblock. What about the overflow ? If we don't free the chunk, after 0xff allocations, we will get back to the same point in the array, but this chunk will already be in use. At that point the allocation algorithm doesn't take an other random number and keep trying, it will just take the first next free chunk which follow: First we activate the LFH Then we allocate the vulnerable chunk We allocate 0xff chunks We allocate the chunk we want to overflow
It will give us the following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
#include <Windows.h> #include <stdio.h> #include <iostream> int main() { HANDLE hHeap = GetProcessHeap(); char c; int i = 0; int size = 0x40; LPVOID chunk; // activating the LFH for the size we choose while (i < 0x10) { chunk = HeapAlloc(hHeap, 0, size); i++; } // making the allocation which is vulnerable chunk = HeapAlloc(hHeap, 0, size); printf("vulnerable chunk: %p\n", chunk); // making 0xff allocation for getting back // to the same point in the RtlpLowFragHeapRandomData i = 0; while (i < 0x100  1) { chunk = HeapAlloc(hHeap, 0, size); HeapFree(hHeap, 0, chunk); i++; } // allocation we want to overwrite chunk = HeapAlloc(hHeap, 0, size); printf("chunk to overwrite: %p\n", chunk); }
Which will give us the result:
1 2
vulnerable chunk: 00559B20 chunk to overwrite: 00559B68
As you can see we have a gap of 8 bytes, those bytes are the size of the _HEAP_ENTRY struct which contains the metadata of our block.
One of the best advantages of this technic is that we need only one chunk to overwrite and we know which one it is.
Conclusion
We have seen two ways of getting some determinism from the LFH. None of those solutions are perfect. Exploiting a useafterfree with this technic will not be much harder than before if we can have some heapspray. Overflow is not that easy, even with the second solution we still have a chance to get the last chunk and our two blocks can be separated by an other allocated block. The best way of decreasing the risks is to append is to first fill a subsegment and then use the new one to trigger the overflow.