reversing you-know-what #2 - more threads, page encryption

this is a continuation of the previous post i made about the AT.

LdrInitializeThunk continued

now let’s see how their hook actually works.

to find their hook address, you can either look at the instrumentation callback or follow their jmp inside KiUserExceptionDispatcher; since they still hook the function.

if you don’t know what an instrumentation callback is, everytime a syscall is done and the kernel is ready to return control back to the userspace program, windows usually checks for an existing IC/instrumentation callback. if there is an existing address in IC, rip will point to it before returning control back to the program. as you can imagine this is very powerful since you can basically hook syscalls. (shoutout crowdstrike falcon)

there are quite a list of ways one can do to trace what is happening inside it. logically speaking, we can safely assume that they will access the start addr of the thread that triggered the hook itself - based on that, we can easily find the main logic of the function.

what we can do then, is use a vmm/hypervisor to step through the function, then break on any instructions that access the startaddr of the target thread. this is the trace from the vmm:

+ found: KERNEL32.LoadLibraryA
+ found: KERNELBASE.LoadLibraryA
+ found: KERNEL32.LoadLibraryW
+ found: KERNELBASE.LoadlibraryW
+ found: KERNEL32.LoadLibraryExA
....
+ found: ntdll.TpReleaseCleanupGroupMembers
- // hmm..
+ found: KERNELBASE.CtrlRoutine

shown above, you can see they first check if the start address if any loadlibrary function, before checking it against 2 weird methods.

technically, this AT cant actually block every single thread, since windows could create threads externally over some places; which is why they do allow threads from the threadpool api - which does explain TpReleaseCleanupGroupMembers and CtrlRoutine.

if we go to where its found, we can see a cmp instr comparing something on the stack with the r13 register, which stores our thread startaddr:

cmp [rbp+00000560], r13
jne base+28C964B
jmp base+28C9636

this value they are comparing with r13 is, in reality, just decrypted above that line - with the result stored at [rbp+0x560] - interestingly holding the address of TppWorkerThread. the function however isnt exported, so you dont get to see the actual name in the hv trace.

so - if the thread start address is TppWorkerThread, the AT whitelists the thread and lets it continue instead of killing it. interesting!

page encryption

now, lets look at how the AT handles encryption. as i said previously, if you open the AT/binary in IDA, you’ll notice that the entire .text section is encrypted, only decrypting when it necessary. how do they do this then? the commoner might ask.

well, they do this by setting the entire region with the NO_ACCESS flag. when .text is ran, it triggers an exception, which then is handled by their exception routine.

their routine then checks the exception type and exception address to see what they have to do and if they are being toyed with. if the exception is legitimate, they do a few checks and begin decrypting .text. AT developers though have placed a few troll traps in specific pages in the binary that aren’t actual code but designed to attack analysis.

trying to screw with it and triggering a trap results in your account getting flagged and banned in 10 minutes. (yes it has been weaponized before).

to prevent someone from abusing a possible race condition in the time window, they use 2 seperate sectors of memory. the first sector, exec, is read-only and contains the encrypted page. the second view can be written to, meaning they can write and decrypt the pages to the second one without modifying the first sector, effectively stopping race conditions.

so how could we try and analyze this? the obvious entry point is the exception hook. they will obv access the exception RIP, so we simply log everything that accesses the instruction ptr. pulling this tiny trick off shows a bunch of comps with hardcoded values:

49 81 FC 06010000 cmp r12,00000106
4C 89 85 680F0000 mov [rbp+00000F68],r8
0F84 AFA80300 je base+2323DF3
C7 85 50040000 14058043 mov dword ptr [rbp+00000450],43800514
8B 85 50040000 mov eax,dword ptr [rbp+00000450]
49 81 FC 08010000 cmp r12,00000108
0F84 92A80300 je base+2323DF3
C7 85 50040000 14058043 mov dword ptr [rbp+00000450],43800514
8B 85 50040000 mov eax,dword ptr [rbp+00000450]
49 81 FC 77040000 cmp r12,00000477
0F84 75A80300 je base+2323DF3
C7 85 50040000 14058043 mov dword ptr [rbp+00000450],43800514
8B 85 50040000 mov eax,dword ptr [rbp+00000450]
49 81 FC 79040000 cmp r12,00000479
0F84 58A80300 je base+2323DF3
C7 85 50040000 14058043 mov dword ptr [rbp+00000450],43800514
8B 85 50040000 mov eax,dword ptr [rbp+00000450]
49 81 FC 1C2C0000 cmp r12,00002C1C
0F84 3BA80300 je base+2323DF3
C7 85 50040000 14058043 mov dword ptr [rbp+00000450],43800514
8B 85 50040000 mov eax,dword ptr [rbp+00000450]
49 81 FC 1E2C0000 cmp r12,00002C1E
0F84 1EA80300 je base+2323DF3
C7 85 50040000 14058043 mov dword ptr [rbp+00000450],43800514
8B 85 50040000 mov eax,dword ptr [rbp+00000450]
49 81 FC E52D0000 cmp r12,00002DE5
0F84 01A80300 je base+2323DF3

as you see above, you can see r12 is likely the relative offset of the page, and it is compared against something.

this “something” are actually offsets of the trap pages, which is calculated like so:

now that we know about the basics and know what traps we should avoid, lets try to attack it.

page decryption attack

now obviously the most convenient way is to do it on runtime/dynamically. however, they restrict using the same reading location on a bunch of pages, making this a bit more ugly.

another way is to create a thread on every encrypted page, which by the analysis above will force the AT to decrypt it. however, when the thread continues after the AT finishes decryption, itll crash as it leads to UB.

interestingly, back then, the AT used NtContinue to recover from the decryption procedure, resuming some thread with the ctx they give. obviously we can always hook NtContinue and follow through, so to prevent hooking it, they used iret (interrupt return) and set their own ctx.

this instruction is still in the init dispatcher, so we can technically patch it. simply replacing it with a jmp and some other minor patches to control the ctx for thread resumption could likely do it.

to avoid hitting a crash, simply forcing rip to a ret instruction will do, as it flows back to the call that originally ran the code.

to prevent this from returning to legitimate code, we can probably set rcx/rdx to some arbitrary value, and then easily pass it to CreateRemoteThreadand chain the actual call.

in pseudocode, it would be:


 if (rcx.context == 0xCAFEBABE) // arbitrary magic number
{
        return_addr = rsp.context; 
        rip.context = return_addr; 
        rsp.context += 8; // step
}

writing a simple decryptor using this would be slightly difficult, but not impossible to implement:

lea     rdx, aTextchatmessag ; "textChatMessage"
mov     rcx, rax
call    sub_143829140
mov     rdi, rax
...

next time ill use a hypervisor to truly try and intercept everything from hv tracing.