Monday, April 7, 2014

Dynamically Unpacking Malware With Pin

A common approach that malware takes to hide itself is packing. Traditionally, packing was a means to compress your executable, then unpack and execute it at run time. Packing can also be used as an obfuscation technique for those who wish to hide their executable code. For a while I have been mulling over how to write a generic unpacker. A general rule I came up with is that the unpacked code would have to be written to memory then that memory would be executed. Since I was looking at a sample that did exactly this, I wrote a Pintool to retrieve the unpacked memory regions.

It is a fairly tedious task to follow execution in a debugger in order to retrieve unpacked code. You need to skim thousands of instructions, set breakpoints, watch calls to functions, unset breakpoints, accidentally allow the malware to execute, revert your VM, get back to where you were, read more disassembly, then finally dump memory and analyze that when you get to something interesting. This can take hours, sometimes days or more.

The Dropper

The dropper (MD5: 2E57C0CA7553263E7B6010B850FF2E48) is covered by an NDB signature, Win.Trojan.Zbot-30983. This signature targets bytes from the first stage’s unpacking loop as these bytes were seen to be consistent among all similar samples.


Win.Trojan.Zbot-30983:1:*:8b95a0f6ffff33c08a8415a7f6ffff83f00233858cf6ffff8b8da0f6ffff88840da7f6ffff{-20}410f95c0ff75203bc68d8d6cfdffff59e815ffffff{-75}8b95a0f6ffff33c08a8415a7f6ffff83f0028b8da0f6ffff88840da7f6ffff

This initial unpacking function opens the binary (itself), seeks and ftells for the size, mallocs a buffer, then reads its bytes into the buffer. Beginning at offset 0x4FD8 the function searches for the byte pattern:

   NN ?? (NN+1) ?? (NN+2) ?? (NN+3) ?? (NN+4)

Writing the same in Python we can identify the offset 0x51A9C, which places us 0x89D bytes from the end of the file. The matching pattern:

   9C 54 9D 91 9E FB 9F 69 A0

There is then a loop that copies the 0x956 bytes immediately following that pattern to a local buffer. It then xor decodes the first 0x84A bytes of that buffer with the 6th byte of the 9 bytes extracted above, 0xFB. That is the variable labeled as xor_byte in the above screenshot. Once this memory is decoded, it is executed.

The Pintool

Pin enables you to instrument binaries. That is, you can write code to execute between each instruction, basic block, or routine, you can instrument threads, as well, there is a lot more functionality that would be difficult to list here like hooking system calls. The goal of this Pintool was to simply execute this malware and retrieve the unpacked code.

To achieve this, I started with one of Pin’s examples which records memory reads and writes. I only cared about the writes, so I cut out the code for handling reads.

Any time an opcode for writing memory is detected, the program retrieves the write's target address. It then takes that write address and scans memory regions using VirtualQuery() in order to find the base address of the page that the write address belongs to. Once the owning page is found, that page's info is returned. The page's start and end addresses are stored in a map. Rather than storing every single address that was written to, we instead store ranges of memory, this saves a significant amount of space.

// Records a memory write
VOID RecordMemWrite(VOID * ip, VOID * addr) {
    map<VOID *, VOID *>::iterator i;

    for(i=writtenMap.begin(); i != writtenMap.end(); ++i) {
        if(addr >= i->first && addr < i->second) {
            return;
        }
    }

    WINDOWS::MEMORY_BASIC_INFORMATION *info = getAddrInfo(addr);
    if(info == NULL)
        return;

    writtenMap[info->BaseAddress] = ((UINT8 *)info->BaseAddress) + info->RegionSize;

    return;
}

In addition to recording what memory is written to, the tool checks the address of every basic block executed. If this address falls within one of the memory regions that was previously written to, that memory is dumped to file. The tool then removes the record of that write so that the memory will not be dumped to file again unless it is subsequently written to then executed. This avoids writing to the disk as every single basic block inside a memory region is executed.

VOID checkBBL(ADDRINT addr) {
    map<VOID *, VOID *>::iterator i;
    FILE *memdump;
    char fname[30];

    // Check if basic block (eip / rip) is in memory that was written to
    for(i=writtenMap.begin(); i != writtenMap.end(); ++i) {
        if(addr >= (ADDRINT)i->first && addr < (ADDRINT)i->second) {      
            // Dump memory to file
            sprintf(fname, "dumps\\%p.dump", i->first);

            memdump = fopen(fname, "wb");

            fwrite(i->first, sizeof(char), (size_t)((ADDRINT)i->second - (ADDRINT)i->first), memdump);

            fclose(memdump);

            // Remove write record so we don't dump at every bb
            writtenMap.erase(i->first);

            break;
        }
    }

    return;
}

The Result

Running the Pintool on the sample 2E57C0CA7553263E7B6010B850FF2E48, we get a total of 12 memory files.


Of the memory dumps highlighted above, the smallest two (0018D000 and 0018E000) contain the second stage of unpacking (first stage discussed above), and the two larger files are the third unpacking stage. In the third stage, there is one rather lengthy, hideous function. This function calls itself recursively in order to run through different stages. We see some anti-analysis from the strings vmtoolsd.exe, VBoxService.exe, and SbieDll.dll (Sandboxie). The first two are checked when the function is called with 6 as the first argument. Sandboxie is checked when it is called with a 5.


Eventually, the function calls itself with 9 and that leads to the last stage of unpacking. The final stage uses the RunPE method. It calls CreateProcess on InternetExplorer. It then calls WriteProcessMemory a few times in order to replace code in the newly created process. Finally it calls ResumeThread to begin execution.

The final Zbot payload is detected by a signature dating back to 2011.

Trojan.Spy.Zbot-142:1:*:4973576f77363450726f6365737300002200250073002200000000002200250073002200200025007300000075736572656e762e646c6c00437265617465456e7669726f6e6d656e74426c6f636b000044657374726f79456e7669726f6e6d656e74426c6f636b003a640d0a64656c20222573220d0a6966206578697374202225732220676f746f206400006200610074000000406563686f206f66660d0a25730d0a64656c202f4620222573220d0a000000002f006300200022002500730022

Conclusion

This Pintool was able to get me all of the stages of the unpacker, however, since the sample used RunPE as the final stage I had to dump that manually. The memory dumps did allow me to quickly identify where to break and reach the right functions. Jurriaan Bremer has done some work on unpacking RunPE malware with Pin by hooking the system calls that are used during this process. Another useful addition to this tool would be dumping the call stack when the unpacked code is called. This would allow rapid identification of the unpacking functions at each stage.

Pin is a powerful tool for dynamic malware analysis. This Pintool acts as good proof of concept to justify further work in this area. Setting up an unpacking environment with a powerful, generic unpacker will speed up analysis and classification of malware samples.

Add to Technorati Favorites Digg! This

3 comments:

Attila Suszter said...

Additional usage of this method is to get just-in-time compiled code from the process's address space.

Jurriaan Bremer said...

Good work Douglas! Happy to see such approaches giving actual results :)

Do you plan on making this tool even more generic later on? E.g., Linux support, sharing the code perhaps, etc? I'd be happy to hear if you do :-)

Douglas Goddard said...

+Jurriaan

It would be fun to implement on Linux, but I'm not sure there would be many uses besides CTF.

I'll keep you posted on any updates!