SonicWall Capture Labs Research team recently observed a new variant of GuLoader (a.k.a Cloudeye). GuLoader is a shellcode-based downloader, known for its numerous anti-analysis techniques and control flow obfuscation. In latest variant of GuLoader it introduces new ways to raise exceptions that hamper complete analysis process and its execution under controlled environment.
In this blog post, we will discuss
- Unpacking of GuLoader’s shellcodes.
- Understanding a new anti-debug technique deployed by GuLoader.
- Deep dive into GuLoader’s custom Vectored Exception Handler.
- Writing an IDAPython script to deobfuscate the control flow of shellcode and to make GuLoader’s analysis easy and fast.
GuLoader is an advanced downloader first discovered in 2019 and since then it kept evolving, adding new anti-debugging techniques with its every new variant. It downloads malicious payload including AgentTesla, Azorult and Ramcos RAT etc. Currently GuLoader is spreading through malspam campaign and packed using NSIS installer.
UNPACKING GULOADER’S SHELLCODE
GuLoader’s shellcode is executed after three layers.
Recent variant of GuLoader is spreading as NSIS Installer consisting of NSIS script, DLL plugin and encrypted shellcode’s file. We need NSIS variant of 7-zip to extract the NSIS script, as typical installer of 7-zip is unable to extract the NSIS script.
Fig 1. Extracted files from NSIS installer.
File “Hangarer.Man” contains shellcode of Layer 2 and an encrypted shellcode of Layer 3 which is main shellcode of GuLoader.
System.dll is DLL file which exports multiple functions. An Exported function named as “Call” is called by NSIS script. This function is responsible to allocate and execute Layer 2 shellcode.
Fig 2. NSIS script calling Exported function Call.
Call function allocates memory space and copies content of file Hangarer.Man from offset 0x409 till last byte. It then calls CallWindowProcW API. First parameter of CallWindowProcW is lpPrevWndFunc. lpPrevWndFunc is callback function, which is set to address of allocated memory space, which results in indirect execution of the Layer 2’s shellcode.
Malware immediately decrypts the third layer which located at offset 0x1c9 in layer 2 and starts its execution.
Fig 3. Decryption of layer 3.
It is the final GuLoader shellcode. This shellcode has complex obfuscation, consisting of junk code, indirect function calls, dynamic API resolution, obfuscated arithmetic value calculations, using stack to decrypt strings, fake instructions, anti-debug, anti-vm, anti-analysis, anti-dump, anti-API hook, anti-emulation techniques.
- During analysis of this variant, we have identified a significant enhancement in GuLoader’s one of most effective anti-debug technique that it’s custom Vectored Exception Handler.
- Malware raises exceptions by executing cleverly crafted series of instructions. Also it uses same instructions multiple times in shellcode to make it hard for reverser to perform static and dynamic analysis and to consume lot of time.
- Ultimate goal of GuLoader for using this anti-debug technique is to achieve runtime control flow obfuscation.
GuLoader incorporates various evasions techniques. Mentioning them below in order in which they are get executed.
- Scan the virtual memory for the strings related to analysis tools.
- Uses Heaven Gate technique to redirect it’s execution under x64 OS.
- Check QEMU emulator related strings.
- Patch DbgBreakPoint and DbgUiRemoteBreakin API used by debuggers.
- Uses EnumWindows API to enumerates windows.
- Uses NtSetInformationThread API with ThreadHideDebugger(0x11).
- Uses EnumDeviceDrivers and GetDeviceDriverBaseNameA APIs.
- Uses MsiEnumProductsA and MsiGetProductInfoA APIs.
- Uses OpenSCManagerA and EnumServicesStatusA APIs.
- Use NtQueryInformationProcess API with DebugPort(0x7).
An Overview of the Payload’s Execution Sequence
- Create suspended child process of itself.
- In newly created process, it creates a section using genuine file to avoid AVs suspicious scanning. In this case it was using mshtml.dll.
- Injects complete main shellcode in child process.
- Repeats executing the mentioned evasion techniques one more time.
- After successful bypass , it decrypts the c2 URL and download encrypted payload from c2.
- Generate the payload decryption key. In analyzed sample, key length was 0x303 bytes.
- GuLoader allocates approximately 60MB of memory space for the payload of size few KBs. It decrypts an encrypted payload, use process hollowing to inject decrypted payload into child process and resolves its Import Address Table.
- Lastly, it starts payload execution using the ZwCreateThreadEx API.
Fig 4. Snippet of Payload decryption function.
All of the above-mentioned evasion techniques and payload execution sequences are already explained in detail in SonicWall Capture Labs Research team’s blog.
NEW ENHANCED ANTI-DEBUG TECHNIQUE
In the below section we will discuss
- Exception types and implementation.
- Decoding Vectored Exception Handler function.
- Writing IDAPython script to restore deobfuscate control flow.
EXCEPTION TYPES & IMPLEMENTATION
This variant of GuLoader has added two new additional exceptions EXCEPTION_ACCESS_VIOLATION & EXCEPTION_SINGLE_STEP compared to last variant it has only one exception EXCEPTION_BREAKPOINT exception. We will discuss each exception and understand its pattern to write a script.
EXCEPTION_ACCESS_VIOLATION (code 0xC0000005)
When malware intentionally tries to write to an inaccessible memory address, exception EXCEPTION_ACCESS_VIOLATION is raised.
Here, malware constructs zero by series of arithmetic calculations. Then it tries to access memory address pointed by it, which raises exception as zero is inaccessible memory address.
Fig 5. EXCEPTION_ACCESS_VIOLATION instructions pattern.
As we can see the constant values and operations (mov, xor, sub) are keeps varying for each exception raised.
EXCEPTION_SINGLE_STEP (code 0x80000004)
The FLAGS register is the status register that contains the current state of a x86 CPU. The trap flag is 8th bit of FLAGS register. When the trap flag is set, the system is instructed to single step, it will execute one instruction and then stop. Then contents of registers and memory locations can be examined by Vectored Exception Handler; if they are correct, the system can execute the next instruction.
The x86 processor has no instruction to directly set or reset the trap flag. Malware uses combination of (PUSHFD/POPFD) instructions to set trap flag.
These operations are done by.
- Pushing the flag register on the stack (PUSHFD) .
- Modifying the trap flag bit (uses 0x100)
- Popping the flag register back off the stack (POPFD).
When malware is running without debugger, when SINGLE_STEP exception is raised and handled by Vectored Exception Handler.
Fig 6. SINGLE_STEP exception instructions pattern.
However while using debugger, no exception can be seen being raised as trap flag is always gets reset after each debugger event is delivered.
EXCEPTION_BREAKPOINT (code 0x80000003)
INT3 instruction (0xCC opcode) is used as software breakpoint in debuggers, that’s why when program is running under debugger, control remains to the debugger after it encounter INT3.
When malware is running without debugger exception EXCEPTION_BREAKPOINT is raised, and control is transferred to the Vectored Exception Handler.
Fig 7. BREAKPOINT exception instruction patterns.
Now we have understood how exceptions are being raised by malware. Next will see how malware uses these exceptions to change the control flow at runtime using its custom Vectored Exception Handler.
DECODING VECTORED EXCEPTION HANDLER
An application can register a function to handle all exceptions for the application. Vectored handlers are called in the order that they were added.
GuLoader call RtlAddVectoredExceptionHandler API to add its custom Vectored Exception Handler. RtlAddVectoredExceptionHandler accepts two parameters.
Fig 8. Structures of EXCEPTION RECORD & CONTEXT.
As we can see in below image, pointer of structure EXCEPTION_POINTERS is being passed as an argument to Vectored Exception Handler(VEH). Using structure EXCEPTION_POINTERS, VEH can access all the information regarding raised exceptions and reading the values of all the registers of processor using structure CONTEXT.
Fig 9. Pseudocode of Custom Vectored Exception Handler.
When EXCEPTION_ACCESS_VIOLATION and SINGLE_STEP_EXCEPTION exceptions are raised, handler perform following steps:
- It checks whether memory address being currently accessed is zero or not. If it is not zero, then it returns 0 and ultimately crashes down.
But how handler gets the address of currently accessed memory location? So it uses ExceptionInformantion member of Exception Record to get this additional information about exception.
- Checks if any hardware breakpoints have been set by checking status of the debug registers(DR0 to DR7). If found it set the ContextRecord to 0 which leads malware to crash.
- If successfully pass the check, it then transforms the EIP to new address using logic Context->Eip += ByteAt(Eip + 2) ^ 0x6A where
- Value 2 depicts size of instruction (mov, jg, jne etc.) where exception is raised.
- 0x6A is byte key to transform EIP. (It differs sample to sample)
Fig 10. Debug register check.
When EXCEPTION_BREAKPOINT exception is raised, handler perform following steps:
- Checks whether hardware breakpoints have been set by checking status of the debug registers(DR0 to DR7). If found it set the ContextRecord to 0 which leads malware to crash.
- Scans for applied software breakpoint i.e. CC byte in loop.
- If successfully pass the check, it then transforms the EIP to new address using logic Context->Eip += ByteAt(Eip + 1) ^ 0x6A where
- Value 1 depicts size of instruction (CC) where exception is raised.
- 0x6A is byte key to transform EIP.
WRITING IDA PYTHON SCRIPT
IDAPython is an IDA Pro plugin that integrates the python programming language, allowing scripts to run in IDA Pro. IDA provides different modules to work on disassembly of instructions.
The python script finds instructions pattern that raise an exception and patch them by jump instruction with transformed EIP offset as a target.
After running the python script in IDA, we get clean, easy to analyze, deobfuscated code of GuLoader’s shellcode. Also finds out that GuLoader’s VEH has been called more than 1100 times.
Fig 11. Obfuscated code(A), deobfuscated code(B), GuLoader’s entire shellcode graph view(C).
GuLoader malware introduces new techniques very often which takes much time and efforts of malware analysts to fully analyzed it. We have completely analyzed GuLoader’s custom Vectored Exception Handler and understood how it works.
We have written python script to defeat GuLoader shellcode control flow obfuscation and saving time and efforts of malware analyst.
We expect further development in GuLoader anti-analysis, anti-debug techniques in upcoming days.
SonicWall Capture Labs provides protection against this threat via the SonicWall Capture ATP w/RTDMI.
SHA256 : 55130719554a0b3dcbf971c646e6e668b663b796f4be09816d405cc15a16d7d6
C2 URL : hxxp[:]//lena[.]utf[.]by/wp-content/plugins/f8eb81f6deba45169c3b41c05c4590ad/y/mm/mmd/kdRrHFMqRUIujuOy126[.]bin
Final Payload (Azorult stealer): d5af42b118d0597c6b71831f2b2ebc8294eca907481d53939563fce7c0f14767