Join the discord

Reverse engineering REALbasic applications

15 Jul, 2017 15:00
I was fiddling around with the application challenges of HackThisSite.org and it turned out the first few are compiled with REALbasic.
Throwing in in IDA didn't help a lot, so here's an article about dealing with the blast of the past that REALbasic is.

REALbasic was claimed to be the predecessor of Visual Basic 6, and like VB6 it end up in the garbage can.
Now, before I get booed and corrected, I know REALbasic is now Xojo, but the whole concept of a Basic framework, that is sold for literally 100$ yearly based license/subscription?
That's just ridiculous.

Ok, enough rant, let's get started.
I've already said I'm playing with HackThisSite's app challenges, so this analysis is based on the Application challenge #1.

The sample I've got, and probably that rule is valid for other apps written in REALbasic, have (roughly) the following topology:
sectiondescription
MZ-PEStandard PE executable header
.textHolding the OVERLAY loader and core framework API
.rsrcHolding the framework resources and the "PICKLE" resource
.rdata
.exc
.CRT
.data
.idata
.bss
.reloc
Unimportant for this analysis
OVERLAYPiggybacked user code and resources

The OVERLAY loader code loads a resource named "PICKLE" that has the following format:
PICKLE resource structuretypedef struct PICKLE {
    DWORD    dwOverlayBeginOffset;    // Physical offset of the OVERLAY data
    DWORD    dwPOKE;                  // Hardcoded to "POKE"
};

dwOverlayBeginOffset can either hold a zero, the physical offset to the OVERLAY data, or the hardcoded value of 0x4B434950 ("PICK")
PICKLE resource loader0046609E | 68 2B 8D 51 00           | push <app1win.Type>                         | 518D2B:"PICKLE"
004660A3 | 6A 65                    | push 65                                     |
004660A5 | FF 35 1C 92 52 00        | push dword ptr ds:[<hInstance>]             |
004660AB | FF 15 9C 58 54 00        | call dword ptr ds:[<&FindResourceA>]        |
004660B1 | 89 85 CC FB FF FF        | mov dword ptr ss:[ebp-434],eax              |
004660B7 | C7 45 EC 00 00 00 00     | mov dword ptr ss:[ebp-14],0                 |
004660BE | C7 45 E8 00 00 00 00     | mov dword ptr ss:[ebp-18],0                 |
004660C5 | 83 BD CC FB FF FF 00     | cmp dword ptr ss:[ebp-434],0                |
004660CC | 0F 84 95 00 00 00        | je app1win.466167                           |
004660D2 | FF B5 CC FB FF FF        | push dword ptr ss:[ebp-434]                 |
004660D8 | FF 35 1C 92 52 00        | push dword ptr ds:[<hInstance>]             |
004660DE | FF 15 A0 58 54 00        | call dword ptr ds:[<&LoadResource>]         |
004660E4 | 89 85 C8 FB FF FF        | mov dword ptr ss:[ebp-438],eax              |
004660EA | 83 BD C8 FB FF FF 00     | cmp dword ptr ss:[ebp-438],0                |
004660F1 | 74 74                    | je app1win.466167                           |
004660F3 | FF B5 C8 FB FF FF        | push dword ptr ss:[ebp-438]                 |
004660F9 | FF 15 A4 58 54 00        | call dword ptr ds:[<&LockResource>]         |
004660FF | 89 85 C4 FB FF FF        | mov dword ptr ss:[ebp-43C],eax              |
00466105 | 6A 04                    | push 4                                      |
00466107 | FF B5 C4 FB FF FF        | push dword ptr ss:[ebp-43C]                 |
0046610D | 8D 45 F0                 | lea eax,dword ptr ss:[ebp-10]               |
00466110 | 50                       | push eax                                    |
00466111 | E8 1A 82 FA FF           | call <app1win.?GetBytes>                    |

When the dwOverlayBeginOffset holds a offset value, a standard CreateFile->SteFilePointer->ReadFile routine is executed, to further parse the OVERLAY.

If the dwOverlayBeginOffset is set to zero or "POKE", the application will try to read the OVERLAY sections from external files:
Loading the OVERLAY from external files.text:004663D0        lea     eax, [ebp+dwSize_IMPORT]
.text:004663D3        push    eax                     ; int
.text:004663D4        push    offset szImport_dat     ; "import.dat"
.text:004663D9        call    ?ReadEntireFile
.text:004663DE        pop     ecx
.text:004663DF        pop     ecx
.text:004663E0        mov     [ebp+lpBuffer_IMPORT], eax
.text:004663E6        lea     eax, [ebp+dwSize_CODE]
.text:004663E9        push    eax                     ; int
.text:004663EA        push    offset szCode_dat       ; "code.dat"
.text:004663EF        call    ?ReadEntireFile
.text:004663F4        pop     ecx
.text:004663F5        pop     ecx
.text:004663F6        mov     lpBuffer_CODE, eax
.text:004663FB        lea     eax, [ebp+dwSize_DATA]
.text:004663FE        push    eax                     ; int
.text:004663FF        push    offset szData_dat       ; "data.dat"
.text:00466404        call    ?ReadEntireFile
.text:00466409        pop     ecx
.text:0046640A        pop     ecx
.text:0046640B        mov     lpBuffer_DATA, eax
.text:00466410        lea     eax, [ebp+dwSize_RSRC]
.text:00466413        push    eax                     ; int
.text:00466414        push    offset szRsrc_dat       ; "rsrc.dat"
.text:00466419        call    ?ReadEntireFile
.text:0046641E        pop     ecx
.text:0046641F        pop     ecx
.text:00466420        mov     [ebp+lpBuffer_RSRC], eax
.text:00466426        lea     eax, [ebp+dwSize_OPTIONS]
.text:00466429        push    eax                     ; int
.text:0046642A        push    offset szOptions_dat    ; "options.dat"
.text:0046642F        call    ?ReadEntireFile
.text:00466434        pop     ecx
.text:00466435        pop     ecx
.text:00466436        mov     [ebp+lpBuffer_OPTIONS], eax

The OVERLAY consists of 5 data blocks, that are always stored in this order:
OVERLAY structuretypedef struct _OVERLAY {
    DWORD    size_CODE;                     //
    byte     buff_CODE[size_CODE];          // The raw user code is stored here
    DWORD    size_DATA;                     //
    byte     buff_DATA[size_DATA];          // User defined data like strings, constants and variables is stored in here
    DWORD    size_IMPORT;                   //
    byte     buff_IMPORT[size_IMPORT];      // The framework API relocations are here
    DWORD    size_RSRC;                     //
    byte     buff_RSRC[size_RSRC];          // User defined resources are stored here
    DWORD    size_OPTIONS;                  //
    byte     buff_OPTIONS[size_OPTIONS];    // Unknown
}

CODE block is basically where the user code is stored. However, the code itself is in "raw" form.
Because the framework's API calls are not initialized, in IDA they look like this:
IDA disassembly of the CODE's entry pointOVERLAY:0055E004        push    ebp
OVERLAY:0055E005        mov     ebp, esp
OVERLAY:0055E007        push    ebx
OVERLAY:0055E008        call    $+5    ; Uninitialized REALBasic function
OVERLAY:0055E00D        jmp     short loc_55E026

Same applies for variables, function arguments, and probably other user defined data.

Something that worth mentioning is that the user code is chunked by 0x17 bytes like this:
IDA disassembly of the CODE's entry pointOVERLAY:0055E004 EntryPoint      proc near
OVERLAY:0055E004                 push    ebp
OVERLAY:0055E005                 mov     ebp, esp
OVERLAY:0055E007                 push    ebx
OVERLAY:0055E008                 call    $+5
OVERLAY:0055E00D                 jmp     short loc_55E026
OVERLAY:0055E00F                 db 11h, 4, 0, 0, 0FCh, 0FFh, 0, 0, 0, 0, 0, 0, 81h, 0C4h, 0, 0, 0, 0, 0E9h, 0C0h, 0, 0, 0
OVERLAY:0055E026 loc_55E026:
OVERLAY:0055E026                 call    sub_570C9C
OVERLAY:0055E02B                 jmp     short loc_55E044
OVERLAY:0055E02D                 db 11h, 4, 0, 0, 0FCh, 0FFh, 0, 0, 0, 0, 0, 0, 81h, 0C4h, 0, 0, 0, 0, 0E9h, 0A2h, 0, 0, 0
OVERLAY:0055E044 loc_55E044:
OVERLAY:0055E044                 mov     eax, 0
OVERLAY:0055E049                 push    eax
OVERLAY:0055E04A                 mov     ecx, 0
OVERLAY:0055E04F                 push    ecx
OVERLAY:0055E050                 mov     edx, 1Ch
OVERLAY:0055E055                 push    edx
OVERLAY:0055E056                 mov     ebx, 38h
OVERLAY:0055E05B                 push    ebx
OVERLAY:0055E05C                 call    $+5
OVERLAY:0055E061                 add     esp, 10h
OVERLAY:0055E067                 jmp     short loc_55E080
OVERLAY:0055E069                 db 11h, 4, 0, 0, 0FCh, 0FFh, 0, 0, 0, 0, 0, 0, 81h, 0C4h, 10h, 0, 0, 0, 0E9h, 66h, 0, 0, 0
OVERLAY:0055E080 loc_55E080:
OVERLAY:0055E080                 call    sub_57083F
OVERLAY:0055E085                 jmp     short loc_55E09E
OVERLAY:0055E087                 db 11h, 4, 0, 0, 0FCh, 0FFh, 0, 0, 0, 0, 0, 0, 81h, 0C4h, 0, 0, 0, 0, 0E9h, 48h, 0, 0, 0
OVERLAY:0055E09E loc_55E09E:
OVERLAY:0055E09E                 call    sub_570B19
OVERLAY:0055E0A3                 jmp     short loc_55E0BC
OVERLAY:0055E0A5                 db 11h, 4, 0, 0, 0FCh, 0FFh, 0, 0, 0, 0, 0, 0, 81h, 0C4h, 0, 0, 0, 0, 0E9h, 2Ah, 0, 0, 0
OVERLAY:0055E0BC loc_55E0BC:
OVERLAY:0055E0BC                 call    $+5
OVERLAY:0055E0C1                 jmp     short loc_55E0DA
OVERLAY:0055E0C3                 db 11h, 4, 0, 0, 0FCh, 0FFh, 0, 0, 0, 0, 0, 0, 81h, 0C4h, 0, 0, 0, 0, 0E9h, 0Ch, 0, 0, 0
OVERLAY:0055E0DA loc_55E0DA:
OVERLAY:0055E0DA                 mov     eax, 0
OVERLAY:0055E0DF                 mov     ecx, 1105h
OVERLAY:0055E0E4                 mov     [ecx], eax
OVERLAY:0055E0E6                 mov     ecx, eax
OVERLAY:0055E0E8                 cmp     eax, ecx
OVERLAY:0055E0EA                 jz      loc_55E115
OVERLAY:0055E0F0                 push    eax
OVERLAY:0055E0F1                 call    $+5
OVERLAY:0055E0F6                 add     esp, 4
OVERLAY:0055E0FC                 jmp     short loc_55E115
OVERLAY:0055E0FE                 db 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 81h, 0C4h, 4, 0, 0, 0, 0E9h, 0, 0, 0, 0
OVERLAY:0055E115 loc_55E115:
OVERLAY:0055E115                 mov     eax, large ds:1105h
OVERLAY:0055E11B                 push    eax
OVERLAY:0055E11C                 call    near ptr unk_55E7D3
OVERLAY:0055E121                 add     esp, 4
OVERLAY:0055E127                 jmp     short loc_55E140
OVERLAY:0055E129                 db 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 81h, 0C4h, 4, 0, 0, 0, 0E9h, 0, 0, 0, 0
OVERLAY:0055E140 loc_55E140:
OVERLAY:0055E140                 call    $+5
OVERLAY:0055E145                 jmp     short loc_55E15E
OVERLAY:0055E147                 db 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 81h, 0C4h, 0, 0, 0, 0, 0E9h, 0, 0, 0, 0
OVERLAY:0055E15E loc_55E15E:
OVERLAY:0055E15E                 mov     eax, 0
OVERLAY:0055E163                 push    eax
OVERLAY:0055E164                 pop     eax
OVERLAY:0055E165                 pop     ebx
OVERLAY:0055E166                 pop     ebp
OVERLAY:0055E167                 retn

These blocks of data between the code chunks doesn't seem to be used by the framework or user code.
I'm not sure if this some kind of obfuscation or something, but since REALbasic is a proprietary language they might be.

DATA block holds the user defined strings and what appears to be global variables.
The string storage has its own structured format:
REALbasic string structuretypedef struct rbString {
    DWORD    unknown1;
    char*    pszStringAddr;
    DWORD    unknown3;
    DWORD    unknown4;
    DWORD    unknown5;
}
followed by the string itself, in a zero terminated, 4 bytes aligned Pascal-style format (string length byte in the beginning).

The pszStringAddr is updated at runtime.
A typical DATA block looks like this:
DATA blockOVERLAY:00570CB5 prm_0000        rbString <1, offset szP_0000, 6, 4, 8000100h>
OVERLAY:00570CC9 szP_0000        db 4,'TEXT',0,0,0
OVERLAY:00570CD1 prm_0001        rbsString <1, offset szP_0001, 6, 4, 8000100h>
OVERLAY:00570CE5 szP_0001        db 4,'R*ch',0,0,0
OVERLAY:00570CED prm_0002        rbsString <1, offset szP_0002, 6, 4, 8000100h>
OVERLAY:00570D01 szP_0002        db 4,'text',0,0,0
;...                             
OVERLAY:00571DBA var_0000        dd 0
OVERLAY:00571DBE var_0001        dd 0
OVERLAY:00571DC2 var_0002        dd 0
;...

Before executing the CODE block, a series of initialization routines are executed.

First, the OPTIONS block is parsed here:
OPTIONS block parser routine.text:0046643C        push    [ebp+dwSize_OPTIONS]
.text:0046643F        push    [ebp+lpBuffer_OPTIONS]
.text:00466445        call    ?ParseOPTIONS

I wasn't able to find a executable holding any OPTIONS data, but after some reverse engineering, i find out the OPTIONS block has the following structure:
OPTIONS block structuretypedef struct _OPTIONS {
    // row 0
    byte     bType;                // Must be always 0x00
    DWORD    lenData_0;            // Length of the upcoming string data
    char     Data_0[lenData_0];    // Just a regular string
    // ...
    // row N
    // ...
}

What all these values do? I have no idea.

Next, the IMPORT data is parsed, where the actual code rebuilding is done here:
IMPORT block parser routine.text:00466450        push    [ebp+dwSize_IMPORT]
.text:00466453        push    [ebp+lpBuffer_IMPORT]
.text:00466459        call    ?ParseIMPORT

The IMPORT block has its own structure:
IMPORT block structurestruc _IMPORT {
    // row 0
    byte     bType;           // Type of the following data
    DWORD    dwOffset;        // Offset inside the CODE block
    {                         // if bType is equal to 1, there's additional data for parsing
        char    szpString;    // NULL terminated Pascal-like string
    }
    // row 1
    byte     bType;
    DWORD    dwOffset;
    // ...
    // row N
    // ...
}

The bType can hold a value between 1 and 5 and determine how the following dwOffset and szpString will be treated.

bTypehas szpStringdescription
1YESThe szpString is used to resolve a framework API call at dwOffset inside the CODE block
2NOThe DWORD value stored at CODE block's dwOffset is used as offset to a variable inside the DATA block
3NOThe DWORD value stored at CODE block's dwOffset is used to resolve a call inside the CODE block
4NOUnknown
5NOThe DWORD value stored at CODE block's dwOffset is used as offset to a parameter inside the DATA block

Except the bType == 1, the rest are pretty straight forward, so let's see how the code rebuilding thing works.

CODE hex dump0026F1D0  55 8B EC 53 E8 00 00 00 00 EB 17 11 04 00 00 FC  U.ìSè....ë.....ü  
0026F1E0  FF 00 00 00 00 00 00 81 C4 00 00 00 00 E9 C0 00  ÿ.......Ä....éÀ.  
0026F1F0  00 00 E8 71 2C 01 00 EB 17 11 04 00 00 FC FF 00  ..èq,..ë.....üÿ.  
0026F200  00 00 00 00 00 81 C4 00 00 00 00 E9 A2 00 00 00  ......Ä....é¢...  
0026F210  B8 00 00 00 00 50 B9 00 00 00 00 51 BA 1C 00 00  ¸....P¹....Qº...

CODE assembly0026F1D0 | 55                       | push ebp
0026F1D1 | 8B EC                    | mov ebp,esp
0026F1D3 | 53                       | push ebx
0026F1D4 | E8 00 00 00 00           | call 26F1D9    ; call $0
0026F1D9 | EB 17                    | jmp 26F1F2

IMPORT hex dump0026C700  01 05 00 00 00 0B 52 75 6E 74 69 6D 65 49 6E 69  ......RuntimeIni  
0026C710  74 00 05 47 00 00 00 05 4D 00 00 00 05 53 00 00  t..G....M....S..  
0026C720  00 01 59 00 00 00 17 52 75 6E 74 69 6D 65 52 65  ..Y....RuntimeRe  
0026C730  67 69 73 74 65 72 46 69 6C 65 54 79 70 65 00 01  gisterFileType..  
0026C740  B9 00 00 00 0A 52 75 6E 74 69 6D 65 52 75 6E 00  ¹....RuntimeRun.  
0026C750  02 DC 00 00 00 01 EE 00 00 00 19 52 75 6E 74 69  .Ü....î....Runti  
0026C760  6D 65 55 6E 68 61 6E 64 6C 65 64 45 78 63 65 70  meUnhandledExcep  
0026C770  74 69 6F 6E 00 02 13 01 00 00 01 19 01 00 00 13  tion............

At IMPORT start 0026C700 (offset 0) we have a bType == 1, so the type of data to actualize is a Framework API call.

Next, we have dwOffset == 0x00000005, so the API call is at CODE offset 5, where the call $0 is located.

Then there's the name of that API function name, formatted as zero terminated, Pascal-style string:
- 0x0B - defining the length of the string
- "RuntimeInit" - The API name
- 0x00 - string terminator byte
* Note that the string length byte doesn't include the string terminator NULL byte.

Now, here's where the things get a bit more complicated.
The API name is stored in the IMPORT block, but the entry point of that API function is stored in a relocation table inside .data section of the main binary.
And that makes perfect sense, because after all, the design of REALbasic goes, as pointed above: framework code in the front - user code in the back.

Because of that, finding the framework's API table inside the .data section can be challenging for automated analysis tools.

The place we are looking for has the following structure:
Framework Import Address Table structuretypedef struct _FWIAT {
    // API 0
    char*     pszAPIName_0;
    PVOID*    pAPIEntryPoint_0;
    byte      alignment_0[0x10];
    // API 1
    char*     pszAPIName_1;
    PVOID*    pAPIEntryPoint_1;
    byte      alignment_1[0x10];
    // ...
    // API N
    // ...
}
and right after that we should have a string table with the API names, that pAPIEntryPoint_X refers to.

In the current case this data is stored here:
Framework API table.data:00519580 _FWAT              dd offset aRuntimeinit                ; "RuntimeInit"
.data:00519584                    dd offset @RuntimeInit
.data:00519588                    align 10h
.data:00519590                    dd offset aDebugruntimein             ; "DebugRuntimeInit"
.data:00519594                    dd offset @DebugRuntimeInit
.data:00519598                    align 10h
.data:005195A0                    dd offset aRuntimeexit                ; "RuntimeExit"
.data:005195A4                    dd offset @RuntimeExit
.data:005195A8                    align 10h
.data:005195B0                    dd offset aRuntimeregiste             ; "RuntimeRegisterAppObject"
.data:005195B4                    dd offset @RuntimeRegisterAppObject
.data:005195B8                    align 10h
;...
.data:0051F350 aRuntimeinit       db 'RuntimeInit',0
.data:0051F35C aDebugruntimein    db 'DebugRuntimeInit',0
.data:0051F36D aRuntimeexit       db 'RuntimeExit',0
.data:0051F379 aRuntimeregiste    db 'RuntimeRegisterAppObject',0

The correct API entry point is obtained by comparing the API name from the DATA block and the API name from the framework API table.

Because a single framework API function can be used in multiple places, the API resolving is designed to be recursive.
Therefore, the place in the CODE block, where the address to the API function must be written, can hold either a 0 or another Offset, where the next CALL using that API function is located.

Let's take "RuntimeUnlockObject" as live example:
IMPORT block chunk0026C770  74 69 6F 6E 00 02 13 01 00 00 01 19 01 00 00 13  tion............  
0026C780  52 75 6E 74 69 6D 65 55 6E 6C 6F 63 6B 4F 62 6A  RuntimeUnlockObj  
0026C790  65 63 74 00 01 3D 01 00 00 0B 52 75 6E 74 69 6D  ect..=....Runtim  
0026C7A0  65 45 78 69 74 00 01 CC 01 00 00 11 52 75 6E 74  eExit..Ì....Runt

The location in the CODE block is 0x00000119, so we go there:
CODE002EF2E1 | 8B 05 05 11 00 00        | mov eax,dword ptr ds:[1105]
002EF2E7 | 50                       | push eax
002EF2E8 | E8 B2 06 00 00           | call 2EF99F    ; here
002EF2ED | 81 C4 04 00 00 00        | add esp,4
002EF2F3 | EB 17                    | jmp 2EF30C

Here, the DWORD where the entry point to the API call should be placed already contains the value of 0x000006B2.
So we can update this call, and after that, use the DWORD 0x000006B2 as offset to move here:
CODE002EF86F | B8 00 00 00 00           | mov eax,0
002EF874 | 89 85 F4 FF FF FF        | mov dword ptr ss:[ebp-C],eax
002EF87A | 8B 8D FC FF FF FF        | mov ecx,dword ptr ss:[ebp-4]
002EF880 | 51                       | push ecx
002EF881 | E8 DD 06 00 00           | call 2EFF63    ; here
002EF886 | 81 C4 04 00 00 00        | add esp,4
002EF88C | EB 17                    | jmp 2EF8A5

And continue further up to the point where the CALL address holds a zero.

There's also a custom cases, when the the API name inside the IMPORT block start with special symbol.

When the API name starts with "!X", the API is treated as imported function from a separate library.
The actual format in that case is:
!X<library_name_or_path>!<library_export_name>

library_name_or_path can contain full or relative file path or for system libraries, it can be just "kernel32" or "kernel32.dll".
library_export_name is just the export function name of the said library, that is later used with GetProcAddress to obtain a entry point.

A live example for this case can be:
Example of import functions from external library01 05 00 00 00 10 21 58 6B 65 72 6E 65 6C 33 32 21 53 6C 65 65 70 00
^  ^           ^  ^
|  |           |  "!Xkernel32!Sleep\x00"
|  |           Length of the upcoming string
|  dwOffset
bType

There's another special case for pType == 1, where the API name starts with "!" symbol only but I didn't bother analysing this one very good.

Finally the RSRC block is parsed here:
RSRC.text:004664BB        push    [ebp+var_44C]
.text:004664C1        push    [ebp+dwSize_RSRC]
.text:004664C4        push    [ebp+lpBuffer_RSRC]
.text:004664CA        call    ?ParseRSRC

After that the CODE block is directly executed as a shellcode:
RSRC.text:004664D2        mov     eax, lpBuffer_CODE
.text:004664D7        mov     [ebp+EntryPoint], eax
.text:004664DD        call    [ebp+EntryPoint]    ; Execute Overlay code

And that's all.
Now, I could use all that information to manually analyse REALbasic apps, or write a python script for IDA to do the code rebuilding for me.
It's a bit hacky, but whatever. Here it is:
realbasic_resolve.py# REALbasic OVERLAY resolver
# XpoZed @ http://nullsecurity.org
# 16, Jul, 2017

import idaapi, idc

def ResolveFrameworkProcs(addrFWATEntryPoint):
    i = 0
    o = addrFWATEntryPoint
    while(Dword(addrFWATEntryPoint) != o):
        MakeFunction(Dword(o+4), BADADDR)
        MakeNameEx(Dword(o+4), GetString(Dword(o), -1, ASCSTR_C), SN_NOWARN)

        o += 0x10
        i += 1

    return i

def GetProcEntryPoint(szProcName, addrFWATEntryPoint):
    o = addrFWATEntryPoint
    while(Dword(addrFWATEntryPoint) != o):
        if (GetString(Dword(o), -1, ASCSTR_C) == szProcName):
            return Dword(o+4)

        o += 0x10

    return BADADDR

def SegmentLocator(szSegmentName):
    ea = FirstSeg()
    while (ea != BADADDR):
        if (szSegmentName == SegName(ea)):
            return ea
        ea = NextSeg(ea)

    return BADADDR

def calcPadding(initSize, pad):
    return (initSize+pad)&(0xFFFFFFFF-pad+1);

def assignBlock(o, name):
    MakeDword(o);
    MakeNameEx(o, ("_%s_DataLength" % name), SN_NOWARN)
    size = Dword(o)
    data = BADADDR
    if (size > 0):
        data = o+4
        MakeUnknown(data, size, DOUNK_SIMPLE)
        MakeByte(data)
        MakeArray(data, size)
        MakeNameEx(data, ("_%s" % name), SN_NOWARN)
    o += (4 + size)

    print "%s_DataLength: %08X" % (name, size)
    print "%s: %08X" % (name, data)

    return (o, size, data)

def GetFWATEntryPoint():
    addrDATA_start = SegmentLocator(".data")
    addrDATA_end = NextSeg(addrDATA_start) 

    token = BADADDR    
    addrAPITable = addrDATA_start
    while(addrAPITable < addrDATA_end):
        if (GetString(addrAPITable, -1, ASCSTR_C) == "RuntimeInit"):
            token = addrAPITable
            break;
        addrAPITable += 1

    if (token == BADADDR):
        return BADADDR

    while(addrAPITable > addrDATA_start):
        if (Dword(addrAPITable) == token):
            return addrAPITable
            break
        addrAPITable -= 4

    return BADADDR

def parseOverlay(offset, IAT):
    offset, mCodeSection_DataLength, mCodeSection = assignBlock(offset, "mCodeSection")

    offset, mDataSection_DataLength, mDataSection = assignBlock(offset, "mDataSection")

    offset, mImportSection_DataLength, mImportSection = assignBlock(offset, "mImportSection")

    offset, mSymbolTableSection_DataLength, mSymbolTableSection = assignBlock(offset, "mSymbolTableSection")

    offset, mResourceSection_DataLength, mResourceSection = assignBlock(offset, "mResourceSection")

    id = AddStrucEx(-1, "struc_PRM", 0);
    AddStrucMember(id, "u1", 0x00, 0x20000400, -1, 4);
    AddStrucMember(id, "u2", 0x04, 0x20500400, 0, 4);
    AddStrucMember(id, "u3", 0x08, 0x20000400, -1, 4);
    AddStrucMember(id, "u4", 0x0C, 0x20000400, -1, 4);
    AddStrucMember(id, "u5", 0x10, 0x20000400, -1, 4);

    i = count_prm = count_var = 0

    while(i < mImportSection_DataLength):
        type = Byte(mImportSection+i);
        i += 1;

        offset = Dword(mImportSection+i);
        i += 4;

        # API procedures
        if type == 1:
            dwAPINameLen = Byte(mImportSection+i)
            i += 1

            szAPIName = GetString(mImportSection+i, dwAPINameLen+1, ASCSTR_C)
            i += (dwAPINameLen + 1)

            ProcEntryPoint = GetProcEntryPoint(szAPIName, IAT)
            if (ProcEntryPoint == BADADDR):
                print "Unable to locate", szAPIName
                continue

            next = offset
            while(next != 0):
                next = Dword(mCodeSection+next)
                PatchDword(mCodeSection+offset, ProcEntryPoint-(mCodeSection+offset+4))
                offset = next

        # Variables
        elif type == 2:
            MakeDword(mDataSection+Dword(mCodeSection+offset));
            MakeNameEx(mDataSection+Dword(mCodeSection+offset), ("var_%04d" % count_var), SN_NOWARN);
            PatchDword(mCodeSection+offset, mDataSection+Dword(mCodeSection+offset));
            count_var += 1

        # Local procedures
        elif type == 3:
            PatchDword(mCodeSection+offset, mCodeSection+Dword(mCodeSection+offset));

        # Parameters
        elif type == 5:
            strAddr = mDataSection+Dword(mCodeSection+offset)+0x14;

            MakeStructEx(mDataSection+Dword(mCodeSection+offset), -1, "struc_PRM");
            MakeNameEx(mDataSection+Dword(mCodeSection+offset), ("prm_%04d" % count_prm), SN_NOWARN);

            idaapi.make_ascii_string(strAddr, calcPadding(Byte(strAddr)+1, 4), ASCSTR_PASCAL)
            MakeNameEx(strAddr, ("str_%04d" % count_prm), SN_NOWARN)
            
            PatchDword(mDataSection+Dword(mCodeSection+offset)+4, strAddr);
            PatchDword(mCodeSection+offset, mDataSection+Dword(mCodeSection+offset));
			
            count_prm += 1

        else:
            print "type: %08X; offset: %08X" % (type, offset)
            break

	MakeUnknown(mCodeSection, mCodeSection_DataLength, DOUNK_SIMPLE)
    MakeCode(mCodeSection)

    MakeFunction(mCodeSection, BADADDR)
    MakeNameEx(mCodeSection, "_main_overlay", SN_NOWARN)

    return

def main():
    # Locate the FWAT
    addrFWATEntryPoint = GetFWATEntryPoint();
    if (addrFWATEntryPoint == BADADDR):
        print "Unable to locate Framework API table. Guess you'll have to hardcode it."	
        return

    # Locate the OVERLAY segment
    addrOverlay_Begin = SegmentLocator("OVERLAY")
    if (addrOverlay_Begin == BADADDR):
        print "Unable to locate OVERLAY segment. Did you load it?"
        return

    # Resolve Framework procedures
    NumberOfFrameworkProcs = ResolveFrameworkProcs(addrFWATEntryPoint)
    print "Procedures resolved: %d" % NumberOfFrameworkProcs

    # Parse the OVERLAY
    parseOverlay(addrOverlay_Begin, addrFWATEntryPoint)

    print "Done."
	
    return

if __name__ == "__main__":
    main()

Here's a before and after of the entry point in IDA's decompiler:
IDA decompiled beforeint main_overlay()
{
  sub_570C9C(5627917);
  sub_57083F(0);
  sub_570B19();
  v1105 = 0;
  ((void (__cdecl *)(_DWORD))unk_55E7D3)(0);
  return 0;
}

IDA decompiled afterint main_overlay()
{
  RuntimeInit();
  sub_570C9C();
  RuntimeRegisterFileType(&prm_0002, &prm_0001, &prm_0000, 0);
  sub_57083F();
  sub_570B19();
  RuntimeRun();
  var_0001 = 0;
  RuntimeUnlockObject(0);
  RuntimeExit();
  return 0;
}

Comments

* You have an opinion? Let us all hear it!

Guest 16 Jun, 2022 00:36
Pretty nice, but I hope you can provide a bit more info around this.
XpoZed 05 Mar, 2021 11:56
If i remember correctly, IDA 7 and above had major API update.
You can port my code to the new format, by following this reference: https://www.hex-rays.com/products/ida/support/ida74_idapython_no_bc695_porting_guide.shtml
Guest 04 Mar, 2021 15:37
Hello.First many thank for tutorial.But scripts not working in my ida 7.2
XpoZed 07 Nov, 2020 22:37
I'm assuming people know how to use IDA
Guest 04 Nov, 2020 18:47
The script wants the OVERLAY section to be loaded by the user but I don't think you cover how to do this?
Guest 03 Mar, 2019 16:28
Pretty nice description, I bow to you sir. Bartosz from PELock.
© nullsecurity.org 2011-2024 | legal | terms & rules | contacts