Thursday, March 6, 2014

Decoding Domain Generation Algorithms (DGAs) Part III - ZeusBot DGA Reproduction

Note:  The blog post is part 3 of 3.  The first two blog posts can be found:
Decoding Domain Generation Algorithms (DGAs): Part I
Decoding Domain Generation Algorithms (DGAs) Part II - Catching ZeusBot Injection into Explorer.exe

At this point, you can go ahead and close the two parent processes (since we are not interested in their functionality, for the sake of simply finding the DGA). So we know that we are interested in discovering how this traffic is generated. So let’s try to find out where it originates.
Earlier, using API Monitor, we saw that explorer was using several functions within WinINet.dll:



So let’s set a breakpoint (F2) on a function that deals with Internet traffic in WinINet.dll. For example, InternetConnect():

HINTERNET InternetConnect(
  _In_  HINTERNET hInternet,
  _In_  LPCTSTR lpszServerName,
  _In_  INTERNET_PORT nServerPort,
  _In_  LPCTSTR lpszUsername,
  _In_  LPCTSTR lpszPassword,
  _In_  DWORD dwService,
  _In_  DWORD dwFlags,
  _In_  DWORD_PTR dwContext
);

Hit ctrl+g to get this handy search window. Then type InternetConnectA (A for ANSI as opposed to Wide) to find the matching function and its address:



Click "Follow expression" and breakpoint, hit play and see what happens in Wireshark until your breakpoint is hit.

We see several attempts to send UDP traffic (again, outside of the scope of this post), but just wait about 45 seconds until you hit your breakpoint. It looks like it is querying Google. This is a common thing for malware to do in order to test connectivity before functioning.


Hit CTRL+F9 to return to the caller in our main executable.

Look at this address in IDA to figure out what’s going on around this function. Also you may want to rebase your program in IDA to the offset injected into Explorer:




















Note that this offset will be different each time this program is injected. Let’s examine the calling function address to see what’s going on.


In order to see what called this function,  hit ‘x’ on the function name to bring up a list of callers. Hit Enter on the first function and again on this function. I am doing this for the sake of some brevity, but this segment really just calls InternetConnect() and the HttpOpenandSendRequest() function, so move on up to the caller. I named this function Check_Connectivity_to_Google because that’s exactly what it does (Note: I’ve also seen it query Bing.com, the whole point is that it is a connectivity check). It actually will try twice before failing out:


Following that function, another function that is immediately following references a SystemTime structure, this is indicative of a DGA function. It actually is our DGA function, hence the name I gave it (DGA in Yellow below):


I annotated the register values and some instructions above the call to help shorten the post. Note the push of a value I named “remainder”. This value is very important and it is used throughout the DGA function. Let’s go ahead and see how that value is generated.
The instruction ‘div’ (executes EDX:EAX/ECX. EAX) in this function is “randomly” generated by a function that uses QueryPerformanceCounter() or GetTickCount() as a value passed to a function that performs a few shift and xor calls of its own:




Interestingly enough, if we Google the hex value 6C078965h, we come across a post where someone describes a “random” number generator they wrote that is very similar (just different registers) to this function, used to generate dice rolls for a DND game. (Notice in the loop above how var_PerformanceCount is at 31E3748. The DWORD 31E4104 is used as a maximum value for the loop counter. It is exactly 9BCh or 2492 (623*4) bytes off). Which aligns with this post as well:


I also see similar algorithms used for various other functionalities in other programs. Just Google it if you are curious.

This is likely based on an algorithm called Mersenne Twister (http://en.wikipedia.org/wiki/Mersenne_twister), used to generate random values:

public static function initializeRandGenerator(seed:uint):void{
    //the seed can be any positive integer
    MT = new Array();
    indexMT = 624;
    MT[0] = seed;
   
    var i:int;
    for(i = 1; i < 624; i++){
        MT[i] = uint(0xFFFFFFFF & (0x6C078965 * (MT[i-1] ^ (MT[i-1] >>> 30)) +i));
    }          
}

Either way, that value is reasonably unpredictable if you aren’t on the machine that is generating the number. This value is then sent through an XOR function:


In which the result is divided by 1000 (or mod 1000, eventually meaning 1000 possible seed values) and stored in EDX as we call the main DGA function:


There are two parts to this DGA, one where the domain name is generated and another where the Top Level Domain or TLD (for short) is generated. Before any of this occurs though, 7 bytes of data are generated to be passed to an MD5 function. The division by 1000 earlier is really just mod 1000, leaving 1000 possible domains.

These 7 bytes are stored as follows:
Last byte of year + 30h
Month
Day/7 *7
First Random Byte
Second Random Byte
00
00



For Example:

byte_array[] { 0x0E, 0x01, 0x0E, 0xAC, 0x00, 0x00, 0x00 };

First_Byte: Last Byte of year + 30h
2014 = 07DE -> DE + 30h = 0E

Second_Byte: Month
January = 01

Third_Byte: ( Day / 7 ) * 7
([ 17 / 7 ] * 7 = 14 -> 0E)

Fourth_Byte and Fifth_Byte: “Random” seed (0-1000) that is incremented by one
The low byte value is used for the TLD as well

Sixth_Byte and Seventh_Byte: 00 00

C Version:



int genDate(unsigned char *dateData){
    unsigned char year;
    unsigned char month;
    unsigned char day;
   
    time_t currTime = time(NULL);
    struct tm *localTime;
    localTime = localtime(&currTime);
    year = ((localTime->tm_year + 1900) & 0xff) + 0x30; // Add 30h to last byte of year
    month = localTime->tm_mon + 1; // Just plain month - Number of months since January + 1
    day = (localTime->tm_mday / 7) * 7;

    unsigned char combined[] = { year, month, day };
    memcpy(dateData, &combined, 4);

    return 1;
}


That data is then sent to an MD5 function at 0x31C54A7. That result is used to generate the domain name and the lower byte of the random number is used to generate the TLD:

 


I have annotated the instructions and written it in C:

#define HASHLEN 16

int genURL(unsigned char *hashValue, char *URL) {
        // Grab each byte of hash
        unsigned int j;
        unsigned int currentIndex = 0; // Pointer to current index in URL array

        for(j = 0 ; j < HASHLEN ; j++) {
            unsigned char cl = hashValue[j];
            unsigned char dl = cl;
            dl = (dl & 0x1F) + 0x61;
            cl = (cl >> 3) + 0x61;
            if (cl != dl) {
                if (dl <= 0x7A){
                    URL[currentIndex] = dl;
                    currentIndex++;
                }
                if (cl <= 0x7A){
                    URL[currentIndex] = cl;
                    currentIndex++;
                }
            }
        }
    return 1;
}

So now that we have the domain name, it’s time to grab the TLD, if we look just below the domain-name code, we can see the TLD-generation section. It is really simple actually, it just checks if the last byte of the random number against a few math functions. Depending on the result, we get one of six possible TLDs. Note the “shape” of the the overall function, this is usually indicative of a case statement or if/then sections:



C Version:
int genTLD (unsigned short rand, char *TLD) {

    unsigned char lobyte = (rand & 0xff);

    if ((rand % 6) == 0) {
        strncpy(TLD,".ru",4);
    }
    else if ((rand % 5) == 0) {
        strncpy(TLD,".biz",5);
    }
    else if ((lobyte & 3) == 00) {
        strncpy(TLD,".info",6);
    }
    else if ((rand % 3) == 0) {
        strncpy(TLD,".org",5);
    }
    else if ((lobyte & 1) == 0) {
        strncpy(TLD,".net",5);
    }
    else {
        strncpy(TLD,".com",5);
    }
   
    return 1;
}


All of this is done over and over again, generating 1000 possible domains. That “random” seed value is really just a starting point and all of the URLs are called eventually at a rate of just about 1 per second.

Putting it all together

After figuring out how this DGA is assembled, it is clear that it only changes once every 7 days or the beginning of a new year or month. There are still 1000 possible URLs over this period since two bytes are unknown and incremented/used one-by-one. That’s not bad if you are the attacker. If we do the math at just 1 domain per second, it takes roughly 16 minutes to call each domain. This means that the attacker needs to generate a list of 1000 domains for a particular day and pick one, then the attacker has to wait at most 17 minutes for a call out back to his/her server. Not bad.

In the end, a list can be generated in C for any date we want pretty easily:

/*
gcc -Wall dga.c -o dga -lcrypto -lssl

Use without args for todays date.
To generate domains for another date, simply run with:

./dga mmddyyyy

*/

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#include <openssl/md5.h>

#define HASHLEN 16

int GetMD5Hash (unsigned char *pbData, int size, unsigned char *hashValue);
int genURL (unsigned char *hashValue, char *URL);
int genTLD (unsigned short rand, char *TLD);
int genDate(unsigned char *dateData);

int genSuppliedDate(char *argv[], unsigned char *dateData) {
    int year;
    unsigned char month;
    unsigned char day;

    long int a = strtol(argv[1], NULL, 10); // grab int without atoi
    month = a / 1000000;
    day = a / 10000 % 100;
    year = a % 10000;

    year = (year & 0xff) + 0x30; // Add 30h to last byte of year
    day = (day / 7) * 7;
  
    unsigned char combined[] = { year, month, day };
    memcpy(dateData, &combined, 4);

    return 1;
}

int genDate(unsigned char *dateData){
    unsigned char year;
    unsigned char month;
    unsigned char day;
   
    time_t currTime = time(NULL);
    struct tm *localTime;
    localTime = localtime(&currTime);
    year = ((localTime->tm_year + 1900) & 0xff) + 0x30; // Add 30h to last byte of year
    month = localTime->tm_mon + 1; // Just plain ol' month - Number of months since January + 1
    day = (localTime->tm_mday / 7) * 7;

    unsigned char combined[] = { year, month, day };
    memcpy(dateData, &combined, 4);

    return 1;
}
// Generate TLD based on random seed based on low-byte
// of random seed
int genTLD (unsigned short rand, char *TLD) {

    unsigned char lobyte = (rand & 0xff);

    if ((rand % 6) == 0) {
        strncpy(TLD,".ru",4);
    }
    else if ((rand % 5) == 0) {
        strncpy(TLD,".biz",5);
    }
    else if ((lobyte & 3) == 00) {
        strncpy(TLD,".info",6);
    }
    else if ((rand % 3) == 0) {
        strncpy(TLD,".org",5);
    }
    else if ((lobyte & 1) == 0) {
        strncpy(TLD,".net",5);
    }
    else {
        strncpy(TLD,".com",5);
    }
   
    return 1;
}

// Generate URL based on MD5 hash
int genURL(unsigned char *hashValue, char *URL) {
        // Grab each byte of hash
        unsigned int j;
        unsigned int currentIndex = 0; // Pointer to current index in URL array

        for(j = 0 ; j < HASHLEN ; j++) {
            unsigned char cl = hashValue[j];
            unsigned char dl = cl;
            dl = (dl & 0x1F) + 0x61;
            cl = (cl >> 3) + 0x61;
            if (cl != dl) {
                if (dl <= 0x7A){
                    URL[currentIndex] = dl;
                    currentIndex++;
                }
                if (cl <= 0x7A){
                    URL[currentIndex] = cl;
                    currentIndex++;
                }
            }
        }
    return 1;
}

//  Compute the MD5 checksum for generated bytes
int GetMD5Hash(unsigned char *pbData, int size, unsigned char *hashValue) {
    unsigned char digest[HASHLEN];
   
    MD5_CTX ctx;
    MD5_Init(&ctx);
    MD5_Update(&ctx, pbData, 7);
    MD5_Final(digest, &ctx);
   
    memcpy(hashValue, digest, HASHLEN);

    return 1;
}  

int main( int argc, char *argv[] )
{
    unsigned char dateData[3]; // Store Date data after modification
    char URL[33] = {}; // Store resulting generated URL, 32 bytes is max possible length + \0
    char TLD[6]; // Store generated TLD
   
    if ( argc == 2 ) {
        if (!(genSuppliedDate(argv, dateData))){
            return 0; // if the date fails, error out
        }
    }
    // Generate first 3 bytes for hash
    else if (!(genDate(dateData))) { // if the date fails, error out
            printf("Failed to generate date hash data, exiting");
            return 0;
            // Failure
    }
   
   
    unsigned short i;
    for (i = 0; i < 1000; i++){ // need to +1 because it ranges from 0000-1000
        unsigned char hashValue[HASHLEN]; // MD5 Hash of given bytes
        memset(URL, 0, sizeof(URL)); // clear arrays
        memset(TLD, 0, sizeof(TLD)); // clear arrays

        // Reverse order of bytes when generating URLs,
        // This will match the order in how it URLs are
        // incremented in the malware itself
        unsigned char hibyte = (i >> 8) & 0xff;
        unsigned char lobyte = (i & 0xff);
        // unsigned short combined = lobyte << 8 | hibyte;
           
        unsigned char pbData[7] = { dateData[0], dateData[1], dateData[2], lobyte, hibyte, 0x00, 0x00}; // Given Bytes
           
        if (GetMD5Hash(pbData, sizeof(pbData), hashValue)) { // If MD5 Successful
            if (genURL(hashValue, URL)) {
                unsigned short k;
                // Print URL
                for(k = 0; k < strlen(URL); k++){
                        printf("%c", URL[k]);
                }
                if (genTLD(i, TLD)) { // random value (Before flipping it around for the hash)
                    // Print TLD
                    for(k = 0; k < strlen(TLD); k++){
                            printf("%c", TLD[k]);
                    }
                    printf("\n");
                }
            }
        }
    } // count loop
   return 0;
}


Add to Technorati Favorites Digg! This

6 comments:

Constantin Ionel Milos said...

Excellent analysis..

Travis Volk said...

Great dissection of this malware.

Dave said...

Thank you

Golden said...

Very Detailed ! thanks . btw , Could you provide us with the sample ?

AaLl86 said...

Hi Dave!
This is Andrea Allievi (alias AaLl86). It's a pleasure to meet you. Very good analysis... My compliments!

I will contribute in this blog as soon as possible... :-)

Andrea

Dave said...

Thanks a ton Andrea! I can't wait to see more of your work :)