Thursday, April 10, 2014

Performing the Heartbleed Attack After the TLS Handshake

Over the past several days, many IPS rules for detecting the Heartbleed attack have been suggested that attempt to compare the TLS message size to the heartbeat message size.  This method works with most of the Proof-of-Concept attacks out there, which perform the Heartbleed attack before the TLS handshake has occurred.  Performing the attack before the TLS handshake results in both the attack and response data being sent in plaintext.  However, if a TLS handshake is performed first, all heartbeat data is encrypted, meaning that this type of detection comparing ciphertext (encrypted data) with the unencrypted TLS message size will not work.  This will almost always result in a false positive as chances are high that the encrypted data will appear to be a larger value than the TLS message size.  Adding to the challenge is the fact that there is nothing explicit within the heartbeat request nor the heartbeat response that indicates the heartbeat data is encrypted.

Our detection from the beginning has always ignored the heartbeat message data itself to avoid false positives arising from using ciphertext as if it was readable on the wire.  Instead, we only use the unencrypted values within the TLS header.

Monday night, before Heartbleed really hit the news and public exploit code became available, the VRT created a proof-of-concept to demonstrate the Heartbleed bug by analyzing the openssl-1.0.1f code and modifying it to send malicious heartbeats and dump out the response to view the exposed data.  By using this approach, the heartbeat request is sent after the TLS handshake, resulting in encrypted payloads.  It turns out that by using our own exploit as the basis for detection, we were able to avoid the mistakes made by some others that will result in false positives against legitimate traffic since we never made the assumption that we could read the heartbeat message size.

t1_lib.c.diff is a patch to the openssl-1.0.1f source tree that implements the Heartbleed attack, after the TLS handshake has occurred.  Steps to create the PoC are as follows --

$ wget https://labs.snort.org/files/t1_lib.c.diff
$ wget http://www.openssl.org/source/openssl-1.0.1f.tar.gz
$ tar -zxf openssl-1.0.1f.tar.gz
$ cd openssl-1.0.1f
$ patch -p0 < ../t1_lib.c.diff
$ ./config no-shared no-idea no-mdc2 no-rc5 zlib enable-tlsext no-ssl2 && make depend && make
$ apps/openssl s_client -tlsextdebug -connect <victim_server>:443


Once you connect, type 'B' to trigger a heartbeat then 'Q' to quit.  You can send a few heartbeats per session if you want.  At this point, many servers out there have disabled heartbeat support so don't be alarmed if you receive "peer does not accept heartbearts."  This is a good thing!

We detect Heartbleed attacks whether they are encrypted or not by using detection_filter ("threshold") rules to discover too many heartbeat requests in a short amount of time as an attacker tries to gather memory dumps and by inspecting the TLS size in heartbeat responses for a value that is greater than the normal heartbeat response size.

More information about how the exploit works and our detection for it can be read at our original blog post on this subject, http://vrt-blog.snort.org/2014/04/heartbleed-memory-disclosure-upgrade.html
Add to Technorati Favorites Digg! This

Heartbleed Continued - OpenSSL Client Memory Exposed

The Heartbleed vulnerability is bad. Not only does it pose a risk to servers running the vulnerable version of OpenSSL (1.0.1 through 1.0.1f) with heartbeats enabled, it also poses a serious risk to clients running the vulnerable versions.

OpenSSL clients process heartbeats using the same vulnerable functions: tls1_process_heartbeat() and dtls1_process_heartbeat(). The same memcpy() overread detailed in our previous blog post allows malicious servers to read blocks of client memory. In internal testing we were able to extract memory from several client programs such as curl and wget, that link against the vulnerable OpenSSL versions.  It is important to note the versions of these programs does not necessarily matter, if they are linking against the vulnerable OpenSSL versions.

Research into other clients that link against the vulnerable versions of OpenSSL continues. Again, it is strongly recommended that you upgrade to OpenSSL version 1.0.1g or install a version of OpenSSL with heartbeats disabled.

We have released detection for the client side attack in SIDs 30520 through 30523, we have expanded detection port ranges to cover more vulnerable clients and servers, and last but not least, all Heartbleed rules have been added to the community ruleset - because we care.
Add to Technorati Favorites Digg! This

Tuesday, April 8, 2014

Heartbleed Memory Disclosure - Upgrade OpenSSL Now!

Heartbleed is a serious vulnerability in OpenSSL 1.0.1 through 1.0.1f.   If you have not upgraded to OpenSSL 1.0.1g or installed a version of OpenSSL with -DOPENSSL_NO_HEARTBEATS it is strongly recommended that you do so immediately.

This vulnerability allows the attacker to read up to 64KB of heap memory from the victim without any privileged information or credentials. How is this possible? In short, OpenSSL's heartbeat processing functions use an attacker controlled length for copying data into heartbeat responses. Both DTLS and TLS heartbeat implementations are vulnerable.

The vulnerable functions are tls1_process_heartbeat() in ssl/t1_lib.c (for TLS) and dtls1_process_heartbeat() in ssl/d1_both.c (for DTLS). Looking at these functions you can see that OpenSSL first reads the heartbeat type and length:

hbtype = *p++;
n2s(p, payload);
pl = p;

n2s is a macro that takes two bytes from "p" and copies them to "payload". This is the length indicated by the SSL client for the heartbeat payload.  Note: The actual length of the SSL record is not checked. The variable "pl" is a pointer to the heartbeat data sent by the client.

OpenSSL allocates as much memory as the client asked for (two byte length up to 65535 bytes) plus 1 byte for heartbeat type, 2 bytes for payload length, and 16 bytes for padding:

buffer = OPENSSL_malloc(1 + 2 + payload + padding);
bp = buffer;

Then it builds the heartbeat response by copying the payload size sent in the request to the response using the macro s2n (opposite of n2s).  Finally (and here's the critical part), using the size supplied by the attacker rather than its actual length, it copies the request payload bytes to the response buffer.

*bp++ = TLS1_HB_RESPONSE;
s2n(payload, bp);
memcpy(bp, pl, payload);

If the specified heartbeat request length is larger than its actual length, this memcpy() will read memory past the request buffer and store it in the response buffer which is sent to the attacker. In internal testing we were able to successfully retrieve usernames, passwords, and SSL certificates.

To detect this vulnerability we use detection_filter ("threshold") rules to detect too many inbound heartbeat requests, which would be indicative of someone trying to read arbitrary blocks of data. Since OpenSSL uses hardcoded values that normally result in a 61 byte heartbeat message size, we also use rules to detect outbound heartbeat responses that are significantly above this size. Note: you can't simply compare the TLS record size with the heartbeat payload size since the heartbeat message (including the indicated payload size) is encrypted.

We have released detection in SIDs 30510 through 30517 to detect attacks targeting this vulnerability.

To keep people updated, Heartbleed rules have been added to the community ruleset.
Add to Technorati Favorites Digg! This

Microsoft Update Tuesday: April 2014, two final XP and Office 2003 fixes



It’s the last Microsoft Update Tuesday before the end-of-life of both Windows XP and Office 2003 and Microsoft is patching two vulnerabilities that also impact XP and two that also impact Office 2003 this month. All-in-all it’s a relatively light month this time around with only four bulletins covering eleven CVEs.

The first bulletin this month, MS14-017, deals with Word and covers three CVEs. One fix is for a 0-day vulnerability, CVE-2014-1761, that Microsoft previously addressed in advisory 2953095 and a “Fix it” that disables support for RTF completely in Word. The vulnerability results from an incorrect “listoverridecount” value in an “overridetable” structure in the RTF file.  This value is not properly checked by Word and setting it to an invalid value causes a type confusion bug, which can be exploited by an attacker to gain remote code execution.  The vulnerabilities addressed in this bulletin also cover Word 2003.

The requisite Internet Explorer bulletin, MS14-018, only covers six CVEs this month. As usual most of the issues are the result of use-after-free vulnerabilities. This time, none of the vulnerabilities that are being patched were publicly known. Given that IE runs on XP as well, this is one of the two bulletins that covers XP.

MS14-019 fixes a vulnerability (CVE-2014-0315) in the way that Windows handles files that can result in remote code execution. This is the second bulletin that also covers XP.

The final bulletin this month is MS14-020 and deals with Publisher, where a maliciously crafted file can result in remote code execution due to an arbitrary pointer dereference (CVE-2014-1759). As with the Word bulletin, this one also covers 2003.

Rules SID 24974-24975, 30497-30502, 30508-30509 address these vulnerabilities.

Add to Technorati Favorites Digg! This

CVE-2014-1761, Oh did you mean CVE-2012-2539?

When the VRT first received word of a new Microsoft Word 0-day I anxiously awaited details and the ever important hash of the in-the-wild exploit to be able to research it and provide coverage through Snort, ClamAV and the FireAmp suite of products. I was especially interested when word came that it was an RTF vulnerability, as I have spent a lot of time looking at high profile RTF vulnerabilities such as the ever popular CVE-2012-0158.

When the in the wild sample finally arrived I thought someone was playing an early April Fool's joke on us: I knew this vulnerability already. More than that, I had written the coverage for this almost a year and half ago! The vulnerability appeared to be CVE-2012-2539, which was released December 11th 2012 as Microsoft Security Bulletin MS12-079. I checked blogs, looked for any mistakes in the hash I had gotten but, no, this WAS the dreaded vulnerability that prompted Yahoo Finance to tell everyone not to open any RTF files. So I did some searching in my old research and found that I had written Snort rules 24974 and 24975 way back in December of 2012 for this vulnerability. The release posts on Snort.org's blog confirmed this (blog|rule changes). The rule even specifies the vulnerable element of the RTF specification, listoverridecount, in the message.

I enjoyed this hilarious state of affairs and we kept it to ourselves until someone else found it out, for dramatic effect if you will. Lo and behold, this week's blog posts by other security vendors popped up, pointing to listoverridecount as the exploitation vector. This confirmed what we already knew, that this vulnerability was centered around the listoverridecount value. The blog posts rightly deduced that the only legal values for this element are 0, 1 or 9 and other values could cause a crash. Our detection on both Snort and ClamAV already detected that. Interestingly though, there seems to be some programs that generate RTF out there that can generate values for listoverridecount that are not 0, 1 or 9, as we found out when someone submitted a sample to ClamAV that has the SHA256 hash:

3fbffe29252df6a87f37962afe72576ea2a7a5540d6c7993cbbff265fcd2734d

as a potential false positive for the a signature we have to detect attacks leveraging CVE-2012-2539.



ClamAV was the only vendor to detect it before we decided it was prudent to turn the signature into a PUA (Potentially Unwanted Application) signature since no one seemed to be exploiting it actively. The Snort rules have now been updated with new references and a non PUA ClamAV signature that references CVE-2014-1761 has gone out (I can only hope that alternate RTF generators stop using invalid values in their listoverridecounts).

All in all this 0-day has been a little bit disappointing since it was a rehash of a known vulnerability we already covered, but what I can console myself with is the fact that someone, somewhere is probably majorly annoyed because the exploit they built or bought is not working against Sourcefire/CISCO customers!
Add to Technorati Favorites Digg! This

Monday, April 7, 2014

Dynamically Unpacking Malware With Pin

A common approach that malware takes to hide itself is packing. Traditionally, packing was a means to compress your executable, then unpack and execute it at run time. Packing can also be used as an obfuscation technique for those who wish to hide their executable code. For a while I have been mulling over how to write a generic unpacker. A general rule I came up with is that the unpacked code would have to be written to memory then that memory would be executed. Since I was looking at a sample that did exactly this, I wrote a Pintool to retrieve the unpacked memory regions.

It is a fairly tedious task to follow execution in a debugger in order to retrieve unpacked code. You need to skim thousands of instructions, set breakpoints, watch calls to functions, unset breakpoints, accidentally allow the malware to execute, revert your VM, get back to where you were, read more disassembly, then finally dump memory and analyze that when you get to something interesting. This can take hours, sometimes days or more.

The Dropper

The dropper (MD5: 2E57C0CA7553263E7B6010B850FF2E48) is covered by an NDB signature, Win.Trojan.Zbot-30983. This signature targets bytes from the first stage’s unpacking loop as these bytes were seen to be consistent among all similar samples.


Win.Trojan.Zbot-30983:1:*:8b95a0f6ffff33c08a8415a7f6ffff83f00233858cf6ffff8b8da0f6ffff88840da7f6ffff{-20}410f95c0ff75203bc68d8d6cfdffff59e815ffffff{-75}8b95a0f6ffff33c08a8415a7f6ffff83f0028b8da0f6ffff88840da7f6ffff

This initial unpacking function opens the binary (itself), seeks and ftells for the size, mallocs a buffer, then reads its bytes into the buffer. Beginning at offset 0x4FD8 the function searches for the byte pattern:

   NN ?? (NN+1) ?? (NN+2) ?? (NN+3) ?? (NN+4)

Writing the same in Python we can identify the offset 0x51A9C, which places us 0x89D bytes from the end of the file. The matching pattern:

   9C 54 9D 91 9E FB 9F 69 A0

There is then a loop that copies the 0x956 bytes immediately following that pattern to a local buffer. It then xor decodes the first 0x84A bytes of that buffer with the 6th byte of the 9 bytes extracted above, 0xFB. That is the variable labeled as xor_byte in the above screenshot. Once this memory is decoded, it is executed.

The Pintool

Pin enables you to instrument binaries. That is, you can write code to execute between each instruction, basic block, or routine, you can instrument threads, as well, there is a lot more functionality that would be difficult to list here like hooking system calls. The goal of this Pintool was to simply execute this malware and retrieve the unpacked code.

To achieve this, I started with one of Pin’s examples which records memory reads and writes. I only cared about the writes, so I cut out the code for handling reads.

Any time an opcode for writing memory is detected, the program retrieves the write's target address. It then takes that write address and scans memory regions using VirtualQuery() in order to find the base address of the page that the write address belongs to. Once the owning page is found, that page's info is returned. The page's start and end addresses are stored in a map. Rather than storing every single address that was written to, we instead store ranges of memory, this saves a significant amount of space.

// Records a memory write
VOID RecordMemWrite(VOID * ip, VOID * addr) {
    map<VOID *, VOID *>::iterator i;

    for(i=writtenMap.begin(); i != writtenMap.end(); ++i) {
        if(addr >= i->first && addr < i->second) {
            return;
        }
    }

    WINDOWS::MEMORY_BASIC_INFORMATION *info = getAddrInfo(addr);
    if(info == NULL)
        return;

    writtenMap[info->BaseAddress] = ((UINT8 *)info->BaseAddress) + info->RegionSize;

    return;
}

In addition to recording what memory is written to, the tool checks the address of every basic block executed. If this address falls within one of the memory regions that was previously written to, that memory is dumped to file. The tool then removes the record of that write so that the memory will not be dumped to file again unless it is subsequently written to then executed. This avoids writing to the disk as every single basic block inside a memory region is executed.

VOID checkBBL(ADDRINT addr) {
    map<VOID *, VOID *>::iterator i;
    FILE *memdump;
    char fname[30];

    // Check if basic block (eip / rip) is in memory that was written to
    for(i=writtenMap.begin(); i != writtenMap.end(); ++i) {
        if(addr >= (ADDRINT)i->first && addr < (ADDRINT)i->second) {      
            // Dump memory to file
            sprintf(fname, "dumps\\%p.dump", i->first);

            memdump = fopen(fname, "wb");

            fwrite(i->first, sizeof(char), (size_t)((ADDRINT)i->second - (ADDRINT)i->first), memdump);

            fclose(memdump);

            // Remove write record so we don't dump at every bb
            writtenMap.erase(i->first);

            break;
        }
    }

    return;
}

The Result

Running the Pintool on the sample 2E57C0CA7553263E7B6010B850FF2E48, we get a total of 12 memory files.


Of the memory dumps highlighted above, the smallest two (0018D000 and 0018E000) contain the second stage of unpacking (first stage discussed above), and the two larger files are the third unpacking stage. In the third stage, there is one rather lengthy, hideous function. This function calls itself recursively in order to run through different stages. We see some anti-analysis from the strings vmtoolsd.exe, VBoxService.exe, and SbieDll.dll (Sandboxie). The first two are checked when the function is called with 6 as the first argument. Sandboxie is checked when it is called with a 5.


Eventually, the function calls itself with 9 and that leads to the last stage of unpacking. The final stage uses the RunPE method. It calls CreateProcess on InternetExplorer. It then calls WriteProcessMemory a few times in order to replace code in the newly created process. Finally it calls ResumeThread to begin execution.

The final Zbot payload is detected by a signature dating back to 2011.

Trojan.Spy.Zbot-142:1:*:4973576f77363450726f6365737300002200250073002200000000002200250073002200200025007300000075736572656e762e646c6c00437265617465456e7669726f6e6d656e74426c6f636b000044657374726f79456e7669726f6e6d656e74426c6f636b003a640d0a64656c20222573220d0a6966206578697374202225732220676f746f206400006200610074000000406563686f206f66660d0a25730d0a64656c202f4620222573220d0a000000002f006300200022002500730022

Conclusion

This Pintool was able to get me all of the stages of the unpacker, however, since the sample used RunPE as the final stage I had to dump that manually. The memory dumps did allow me to quickly identify where to break and reach the right functions. Jurriaan Bremer has done some work on unpacking RunPE malware with Pin by hooking the system calls that are used during this process. Another useful addition to this tool would be dumping the call stack when the unpacked code is called. This would allow rapid identification of the unpacking functions at each stage.

Pin is a powerful tool for dynamic malware analysis. This Pintool acts as good proof of concept to justify further work in this area. Setting up an unpacking environment with a powerful, generic unpacker will speed up analysis and classification of malware samples.

Add to Technorati Favorites Digg! This

Wednesday, April 2, 2014

Using the Immunity Debugger API to Automate Analysis

While analyzing malware samples I came across many simple but annoying problems that should be solved through automation. This post will cover how to automate a solution to a common problem that comes up when analyzing malware.


The application uses GetProcAddress() to get the address to a function located within a library. That address is stored in a variable and saved for later use. This becomes an issue while analyzing the application and coming across a call instruction that references a generic memory address. There is barely any information to indicate which function is being called. Although I could make some intelligent guesses about what is being called, it would be better to know the exact function.


The tool that I am going to use to automate this problem is Immunity Debugger. This debugger, like a lot of others, provides the capability to automate analysis through scripting.


Solving GetProcAddress()

To restate the problem, there are a number of functions being called that cannot be traced back to an actual function. To set this up, a call is made to GetProcAddress() with both the library and the function name are passed as parameters. The return value is the address of the provided function, and it is stored in a variable. Figures 1, 2, and 3 will are pulled directly from the disassembly in IDA.

 

Figure 1: Unknown Function Call
Figure 1 is the unknown function call. As the reader can tell, there is currently no way to know what is being called. Some details surrounding the function can be pulled together to help explain the purpose of the function call. However, that isn’t reliable. 

Figure 2: 4266FC Memory Address
Before going to Immunity, first I want to locate  the memory address in IDA. I accomplished this by double clicking the “dword_4266FC” XREF link in IDA to show the memory address where the function address is stored. Figure 2, shows us the details at .data:004266FC.


Okay, now I need to  track down where this variable is set. There are a multitude of ways to get the answer. I just opened up the XREFS to in IDA and found the function where the variable is set after a GetProcAddress() call. The function, sub_410610, is the culprit. This function contains multiple GetProcAddress() calls. Each call has a return value that is stored in a separate variable.



Figure 3: GetProcAddress()


In Figure 3, there are two of several calls that are made to GetProcAddress(). The return value of GetProcAddress() is the memory address of the specified function, and it is stored in the EAX register. Looking at the first instruction in Figure 3, the address for GetProcAddress() is moved into EDI. A few instructions below that is the call to GetProcAddress() (CALL EDI). The next thing to figure out is what happens to EAX after the function call. Within five instructions EAX is moved into dword_4266FC.

Since the function name is a parameter to GetProcAddress(), why can’t I just grab the name of the function from another spot in memory? Well, the malware author has come up with a method to obfuscate the function names.
 

Take a look at Figure 4. This is the beginning of a very long series of moving bytes around to construct a hex string that is an encoded form of the function name. Once everything is in order, these hex strings are ran through a decoding routine (sub_401610). Once the decoding is complete, the name is stored in a variable that is used for the GetProcAddress() call. In figure 3, that variable part of the ‘lea edx, [esp+0E4h+var_74’ instruction.



Figure 4: Obfuscated Function Names


To get the decoded strings, initially I started with the manual process of stepping through the debugger and recording the decoded strings. As soon as I started stepping, I stopped, and decided to write a script to complete this process.


Since it’s always a good idea to have a list of what needs to be accomplished:


  1. Hook the function that does the GetProcAddress() calls
  2. Get a list of where the GetProcAddress() calls are being made
  3. Look for where EAX is being stored in a variable
  4. Record the address of the variable
  5. Record the function name
  6. Dump this info into a file
  7. Use IDA Python to read the file and load the data into IDA


I used this list as a guideline for creating the script. One more thing I wanted to make sure doesn’t happen: I don’t want the script to slowly step through application while reading each instruction.


I created a PyCommand to solve this problem. PyCommands are plugins for Immunity Debugger that help automate various tasks. These commands are launched using the debuggers provided command box at the bottom of the window (Figure 5). PyCommands are saved in the Immunity Debugger\PyCommands path located in the application’s install directory. These commands are called using an “!” followed by the name of the file.


Figure 5: Command Box
One last thing. The documentation isn’t the best. Anybody using the API will need to use the following sources: the source code, current pyCommands deployed with the application, or one of several resources on the web. I’ve added reference links at the end of this post.


Hooking the Function

To start, I decided to use one of the hook classes provided by Immunity. These can be found in the libhook.py file. I went with the LobBpHook() class. This will hook a function of my choice, and pause execution inside the function.


To set up a hook, I need to create a main method, and a hook class with an init and run methods. Here is a skeleton of the hook I created.


class HookFunc(LogBpHook):

def __init__(self):

LogBpHook.__init__(self)
return

def run(self,regs):
<left blank for now>

def main(args):
if not args:
return "No arguments provided."

imm = immlib.Debugger()

hookAddr = int(args[0],16)
funcName = imm.getFunction(hookAddr).getName()
hook = HookFunc()
hook.add(funcName,hookAddr)
return funcName + " Hooked."

The HookFunc class isn’t a provided class, but one that I created. It is inheriting from the LogBpHook class. The init and run methods are required. The run method is what is going to happen once the hook is triggered.

The main method accepts an address as an argument. This is the address of the function I am going to hook. After the args are checked, the first thing that needs to be done is instantiate a Debugger object. This is stored in imm. Next, is the code used to add the hook.

The string argument needs to be converted  to a hex address. This is accomplished with the int() method. Next, I got the name of the function. After the HookFunc object was created the hook needs to be added. The hook.add(funcName,hookAddr) call adds the hook at the appropriate address.

Figure 6 shows us where execution pauses after the hook is triggered. The execution is paused inside the function that I wanted to hook.



Figure 6: Hook Breakpoint in Function
Other than adding code to the run() method, that is all it takes to create a hook.



Getting a List of the Calls Made Inside of the Function

 Because of the similarity between all of the calls to GetProcAddress(), scripting a solution to get the address of the calls was easily accomplished. It doesn’t do much for making the script work for multiple situations, but it solves this problem. Figure 3 shows  the CALL EDI instruction. This is used for every GetProcAddress() call within this function. It is also only used for the GetProcAddress() call. In addition, this function is just one long basic block. Based on this, I felt the easiest way to grab what I needed was to parse a list of the call instructions being executed and then grabbing the instructions following those calls.


funcAddr = imm.getCurrentAddress()

curFunc = imm.getFunction(funcAddr)

basicBlocks = curFunc.getBasicBlocks()
calls = basicBlocks[0].getCalls()

It’s not the best solution, but I use the first two lines to grab a function object of the current function. The function object contains all of the function data and is explained in the libanalyze.py file in the lib directory. Functions are typically split up into several basic blocks. Since this function is just one long basic block, I decided to just grab all of the function’s basic blocks with getBasicBlocks(). Once I get that data, I can grab a list of call instruction addresses.

The getCalls() method returns a list of call instructions over which I can iterate.

Before starting the next section, there is a problem that is easily overlooked (I know I did). The program execution is paused at the start of the function. If I attempt to read the memory addresses where the function addresses are stored, they will not contain the correct value. The instructions to populate those memory addresses have not been executed. In order to continue execution until the end of the function, I use the following method:

imm.runTillRet()

Figure 7 shows that the execution has stopped at the end of the function. I also had Immunity log the results of the above four method calls (Figure 8).


Figure 7: Breakpoint at End of Function

Figure 8: Current Address, Basic Blocks, Call Instructions


 Now that the variables will be populated when the script reads the memory address, I can proceed.


Getting Address where EAX is Stored and Saving it to a File

Once again, Figure 3, shows that the value in EAX is stored within at most seven instructions of the initial CALL ESI. In order to get the disassembly, I’ll need to iterate over the list of calls.

for c in calls:

     oc = imm.disasm(c)
    call = oc.getDisasm()

Each call will be disassembled and checked to see if EDI is in the instruction.

if 'EDI' in call:
    flag = 7
    i = 0
while i <= flag:
instr = imm.disasmForward(c,i).getDisasm()
if ',EAX' in instr:
<add code>
i++

If the instruction is found, then I’ll need to set up a loop to iterate over the next several instructions to locate an ‘,EAX’ instruction. Once located, I know that I have found the MOVinstruction. This is accomplished with a while loop and using the imm.disasmForward(address,number of lines) method. This method is described in the immlib.py file. I’ve attached the .getDisasm() to the end of the disasmForward(c,i) call to get the disassembly of that line. See Figure 9.

funcStrSaveAddr = '0x' + instr[instr.index('[')+1:instr.index(']')]
funcSaveAddr = int(funcStrSaveAddr,16)
calledFuncName = imm.getFunction(imm.readLong(funcSaveAddr)).getName()
imm.log("* " + funcStrSaveAddr + "," + calledFuncName + "\n")
f.write(funcStrSaveAddr + "," + calledFuncName+"\n")
break

The first instruction grabs the address that is described in the string. This can be can be accomplished  with Python string manipulation. Since the getDisas() method returns a string, this address needs to be converted to hexadecimal. Once again, the int(<string>,16) method converts it to hexadecimal.


Figure 9: CALL EDI and MOV Instructions
The third line pulls the function name that was called by GetProcAddress(). On the first run through, I used the following code to get the name of the function:


calledFuncName = imm.getFunction(funcSaveAddr).getName()


This returned the following values:



Figure 10: Function Name Return Values



Figure 10 shows the stored address along with the supposed name of the function at that address. That is obviously not the value that I need. Virus-20.00426718 is just a reference to the memory address where the function address I am looking for is stored.


Because funcSaveAddr() is just the address of the variable and not the value, I need to read the value  stored at that memory location. This is accomplished using the imm.readLong(funcSaveAddr) method:


calledFuncName = imm.getFunction(imm.readLong(funcSaveAddr)).getName()


This is a simple problem with a simple solution. I was tired and spent a little too long troubleshooting the issue.


The next two lines write both the address of the variable and the function name to both Immunity’s log window (imm.log()) and to a file. Figure 10 shows us the output to the log window. The file follows the following CSV format: address, function name.


Figure 11: Memory Address and Function Name
Figure 11 is a series of calls to set up network functionality later on in the application. Now it is ready to to be read into IDA.


Using IDA Script to Rename Variables
The IDC script is going to read in the lines of the file. On each read, it is going to split up the CSV data, and use that to rename the variable in IDA. Here is the script:

#include <idc.idc>
static main() {

auto fh,line, addr, name, actAddr;

//Open file
fh = fopen("getprocaddr.txt","r");

//Loop through file using readstr()
while ((line = readstr(fh)) != -1) {

 // Split CSV values
addr = line[0:strstr(line,',')];
name = line[strstr(line,',')+1:];

//Convert hex string to long
actAddr = xtol(addr);

 //Change the name of the variable
MakeNameEx(actAddr,name,0);
}
fclose(fh);
}


The script itself should be self explanatory. I've commented the relevant sections. It's fairly simple.


Figures 12, 13, and 14 shows the renamed sections and calls in IDA. Compare these with figures 1, 2, and 3. The code is now much easier to understand.

Figure 13: Location of Function Call
 
Figure 14: Location of Memory Address
Figure 15: EAX being Stored at Memory Address

From here I can continue to analyze the application without wasting time on manually stepping through with Immunity to identify the functions being called. 

Conclusion
There are multiple ways to solve the problem outlined in this blog post. I went with what was easiest for me. The Immunity script is nothing fancy. Hopefully, this blog helps others out there looking for ways to automate various mundane tasks.


Additional Resources:


Add to Technorati Favorites Digg! This