PythonGDB tutorial for reverse engineering - part 1

Dynamic analysis of computer software is usually done using two tools:

A disassembler, in order to get a good overview of the global program structure (or more locally, a good overview of a function control flow). Most people use Hex-Rays’ IDA to achieve that goal: it works on all 3 major operating systems, can handle almost anything you throw at him and is very extensible using IDAPython for scripting. It also has great support for analysis, allowing the reverse engineer to rename symbols or add comments to the disassembly.
A debugger, in order to trace the program execution, break on specific code locations or conditions, show the program state at any point, etc. Debuggers are usually specific to a system: Windows people mostly use OllyDBG (has a lot of community support and plugins), Immunity Debugger (a fork of an old OllyDBG version which went its on way, it supports Python plugins and has a lot of very interesting community contributions like mona) or WinDBG (mostly for remote kernel/drivers debugging). On Linux or Mac OS X most people use the very simple GDB, which I am going to talk about in this article.

Until very recently GDB did not have any support for scripting and plugins. The GDB command language was very simple and didn’t provide a lot of facilities to write complex functions: no way to interface with external libraries, very limited control flow features, no advanced variable types like lists or hashtables. This is ok when you only use your debugger to break at some point and display a variable, but when you want to write complex plugins such as mona (which is used to find sequences of instructions in memory that can be used to bypass NX/DEP) the GDB command language is really not enough.

In october 2009 GDB 7.0 was released, including changes from Tom Tromey from RedHat which added Python scripting support for GDB. Tromey also wrote a long list of articles about why Python scripting was a major feature, with a lot of examples including using PythonGDB to write pretty printers for custom data types. Recently I used PythonGDB a bit in order to reverse engineer some Linux binaries I did not have the source of, so I wanted to show some of the useful stuff PythonGDB can help for in your reverse engineering work.

Basic PythonGDB usage

First, check that the GDB version you have installed does have Python support:

(gdb) help python
Evaluate a Python command.

The command can be given as an argument, for instance:

    python print 23

If no argument is given, the following lines are read and used
as the Python commands.  Type a line containing "end" to indicate
the end of the command.
(gdb) python print "Hello, World!"
Hello, World!

There are three ways to run Python code from inside GDB:

Using the python command as shown above: good for one-shot Python instructions which you will likely not use anymore. Also good for testing the PythonGDB API directly from inside GDB.
Using the source command to load a Python script from the filesystem. This is the best way to load scripts manually.
Using the autoloading mechanisms to load specific scripts for a binary. If a script file named <binary>-gdb.py exists in the current directory, GDB will automatically load it. The binary itself can also specify GDB scripts to load when it is being debugged by putting the scripts in the .debug_gdb_scripts ELF section.

GDB exports a Python module called gdb which provides all of the required functions to interface with the debugger: reading memory from the process, getting the list of all breakpoints, registering commands and pretty printers, as well as a lot of additional features.

Writing our first PythonGDB plugin

Format strings bugs can be tricky to detect: unless you try to send format specifiers like %n everywhere you could send a string in the program, you will most likely never notice that one of your format strings can be user-controlled and provide a way for malicious users to inject code remotely into your process.

Our first PythonGDB plugin will try to help detecting these bugs by dynamically analyzing calls to the *printf family functions and break the execution if one of those functions is called with an argument that is not in read-only memory (like .rodata).

PythonGDB allows us to define custom breakpoints which execute Python code when they are triggered and can either stop the program or allow it to continue. We are going to use this to break on the printf functions, parse /proc/<pid>/maps to check if the memory location used as the format string is in read-write memory, and if it is, display an error and break execution. If it is not, just continue executing the program without interruption.

Here is how you would implement this with PythonGDB:

import gdb

class CheckFmtBreakpoint(gdb.Breakpoint):
    """Breakpoint checking whether the first argument of a function call is in
    read-only location, stopping program execution if it is not."""

    def __init__(self, spec, fmt_idx):
        # spec: specifies where to break
        #
        # gdb.BP_BREAKPOINT: specified that we are a breakpoint and not a
        # watchpoint.
        #
        # internal=True: the breakpoint won't show up in "info breakpoints" and
        # commands like this.
        super(CheckFmtBreakpoint, self).__init__(
            spec, gdb.BP_BREAKPOINT, internal=True
        )

        # Argument index of the format string (printf = 1, sprintf = 2)
        self.fmt_idx = fmt_idx

    def stop(self):
        """Method called by GDB when the breakpoint is triggered."""

        # Read the i-th argument of an x86_64 function call
        args = ["$rdi", "$rsi", "$rdx", "$rcx"]
        fmt_addr = int(gdb.parse_and_eval(args[self.fmt_idx]))

        # Parse /proc/<pid>/maps for this process
        proc_map = []
        with open("/proc/%d/maps" % gdb.selected_inferior().pid) as fp:
            proc_map = self._parse_map(fp.read())

        # Find the memory range which contains our format address
        for mapping in proc_map:
            if mapping["start"] <= fmt_addr < mapping["end"]:
                break
        else:
            print "%016x belongs to an unknown memory range" % fmt_addr
            return True

        # Check the memory permissions
        if "w" in mapping["perms"]:
            print "Format string in writable memory!"
            return True

        return False

    def _parse_map(self, file_contents):
        """Parse a /proc/<pid>/maps file to a list of dictionaries containing
        these fields:
          - start: the start address of the range
          - end: the end address of the range
          - perms: the permissions string"""

        zones = []
        for line in file_contents.split('\n'):
            if not line:
                continue
            memrange, perms, _ = line.split(None, 2)
            start, end = memrange.split('-')
            zones.append({
                'start': int(start, 16),
                'end': int(end, 16),
                'perms': perms
            })
        return zones

# Set breakpoints on all *printf functions
CheckFmtBreakpoint("printf", 0)
CheckFmtBreakpoint("fprintf", 1)
CheckFmtBreakpoint("sprintf", 1)
CheckFmtBreakpoint("snprintf", 2)
CheckFmtBreakpoint("vprintf", 0)
CheckFmtBreakpoint("vfprintf", 1)
CheckFmtBreakpoint("vsprintf", 1)
CheckFmtBreakpoint("vsnprintf", 2)

We can then test this with a very simple program that will execute code that allows the user to control a format string:

#include <stdio.h>

int main(int argc, char** argv)
{
    printf(argv[1]);
    return 0;
}

Using our simple plugin on this executable gives the following output:

(gdb) source checkfmt.py
Function "fprintf" not defined.
Function "sprintf" not defined.
Function "snprintf" not defined.
Function "vprintf" not defined.
Function "vfprintf" not defined.
Function "vsprintf" not defined.
Function "vsnprintf" not defined.
(gdb) r "Test %x"
Starting program: /home/delroth/test/fstring/a.out "Test %x"
Format string in writable memory!

Breakpoint -1, 0x00007ffff7a88b00 in printf () from /lib/libc.so.6
(gdb) bt
#0  0x00007ffff7a88b00 in printf () from /lib/libc.so.6
#1  0x0000000000400503 in main ()

This ends the first part of my PythonGDB for reverse engineering tutorial. The next part will be about execution tracing using GDB: performing timing attacks on binaries by counting the number of executed instructions, as well as counting the number of times a breakpoint was triggered to guess what a virtual machine is doing (kind of like I did in my PlaidCTF “simple” binary cracking).