Implementation along with my [ShellcodeFluctuation](https://github.com/mgeeky/ShellcodeFluctuation) brings Offensive Security community sample implementations to catch up on the offering made by commercial C2 products, so that we can do no worse in our Red Team toolings. 💪
13
13
14
14
15
+
### Implementation has changed
16
+
17
+
Current implementation differs heavily to what was originally published. This is because I realised that there is a way simpler approach to terminate thread's call stack and hide shellcode's related frames by simply writing `0` to the return address of our handler:
The previous implementation, utilising `StackWalk64` can be accessed in this [commit c250724](https://github.com/mgeeky/ThreadStackSpoofer/tree/c2507248723d167fb2feddf50d35435a17fd61a2).
32
+
33
+
15
34
## How it works?
16
35
17
36
This program performs self-injection shellcode (roughly via classic `VirtualAlloc` + `memcpy` + `CreateThread`).
skipped 8 lines
26
45
3. Hook `kernel32!Sleep` pointing back to our callback.
27
46
4. Inject and launch shellcode via `VirtualAlloc` + `memcpy` + `CreateThread`. A slight twist here is that our thread starts from a legitimate `ntdll!RltUserThreadStart+0x21` address to mimic other threads
28
47
5. As soon as Beacon attempts to sleep, our `MySleep` callback gets invoked.
29
-
6. Stack Spoofing begins.
30
-
7. Firstly we walk call stack of our current thread, utilising `ntdll!RtlCaptureContext` and `dbghelp!StackWalk64`
31
-
8. We save all of the stack frames that match our `seems-to-be-beacon-frame` criterias (such as return address points back to a memory being `MEM_PRIVATE` or `Type = 0`, or memory's protection flags are not `R/RX/RWX`)
32
-
9. We terate over collected frames (gathered function frame pointers `RBP/EBP` - in `frame.frameAddr`) and overwrite _on-stack_ return addresses with a fake `::CreateFileW` address.
33
-
10. Finally a call to `::SleepEx` is made to let the Beacon's sleep while waiting for further communication.
34
-
11. After Sleep is finished, we restore previously saved original function return addresses and execution is resumed.
48
+
6. Overwrite last return address on the stack to `0` which effectively should finish the call stack.
49
+
7. Finally a call to `::SleepEx` is made to let the Beacon's sleep while waiting for further communication.
50
+
8. After Sleep is finished, we restore previously saved original function return addresses and execution is resumed.
35
51
36
52
Function return addresses are scattered all around the thread's stack memory area, pointed to by `RBP/EBP` register. In order to find them on the stack, we need to firstly collect frame pointers, then dereference them for overwriting:
37
53
skipped 16 lines
54
70
55
71
This in turn, when thread stack spoofing is enabled:
56
72
57
-
![spoofed](images/spoofed.png)
58
-
59
-
Above we can see a sequence of `kernel32!CreateFileW` being implanted as return addresses. That's merely an example proving that we can manipulate return addresses.
60
-
To better enhance quality of this call stack, one could prepare a list of addresses and then use them while picking subsequent frames for overwriting.
61
-
62
-
For example, a following chain of addresses could be used:
73
+
![spoofed](images/spoofed2.png)
63
74
75
+
Above we can see that the last frame on our call stack is our `MySleep` callback. That immediately brings opportunities for IOCs hunting for threads having call stacks not unwinding into following two commonly expected system entry points:
64
76
```
65
-
KernelBase.dll!WaitForSingleObjectEx+0x8e
66
-
KernelBase.dll!WaitForSingleObject+0x52
67
77
kernel32!BaseThreadInitThunk+0x14
68
78
ntdll!RtlUserThreadStart+0x21
69
79
```
70
80
71
-
When thinking about AVs, EDRs and other automated scanners - we don't need to care about how much legitimate our thread's call stack look, since these scanners only care whether a frame points back to a `SEC_IMAGE` memory pages, meaning it was a legitimate DLL/EXE call (and whether these DLLs are trusted/signed themselves). Thus, we don't need to bother that much about these chain of `CreateFileW` frames.
81
+
However a brief examination of my system shown, that there are plenty of threads having call stacks not unwinding to the above handlers:
82
+
83
+
![legit call stack](images/legit-call-stack.png)
84
+
85
+
The above screenshot shows unmodified, unhooked, thread of Total Commander x64.
86
+
87
+
Why should we care about carefully faking our call stack when there are processes exhibiting traits that we can simply mimic?
88
+
72
89
73
90
74
91
## How do I use it?
skipped 30 lines
105
122
4. Create a new user stack with `RtlCreateUserStack` / `RtlFreeUserStack` and exchange stacks from a Beacons thread into that newly created one
106
123
107
124
125
+
## Implementing a true Thread Stack Spoofer
126
+
127
+
Hours-long conversation with [namazso](https://twitter.com/namazso) teached me, that in order to aim for a proper thread stack spoofer we would need to reverse x64 call stack unwinding process.
128
+
Firstly, one needs to carefully acknowledge the stack unwinding process explained in (a) linked below. The system when traverses Thread call stack on x64 architecture will not simply rely on return addresses scattered around the thread's stack, but rather it:
129
+
130
+
1. takes return address
131
+
2. attempts to identify function containing that address (with [RtlLookupFunctionEntry](https://docs.microsoft.com/en-us/windows/win32/api/winnt/nf-winnt-rtllookupfunctionentry))
132
+
3. That function returns `RUNTIME_FUNCTION`, `UNWIND_INFO` and `UNWIND_CODE` structures. These structures describe where are the function's beginning address, ending address, and where are all the code sequences that modify `RBP` or `RSP`.
133
+
4. System needs to know about all stack & frame pointers modifications that happened in each function across the Call Stack to then virtually _rollback_ these changes and virtually restore call stack pointers when a call to the processed call stack frame happened (this is implemented in [RtlVirtualUnwind](https://docs.microsoft.com/ru-ru/windows/win32/api/winnt/nf-winnt-rtlvirtualunwind))
134
+
5. The system processes all `UNWIND_CODE`s that examined function exhbits to precisely compute the location of that frame's return address and stack pointer value.
135
+
6. Through this emulation, the System is able to walk down the call stacks chain and effectively "unwind" the call stack.
136
+
137
+
In order to interfere with this process we wuold need to _revert it_ by having our reverted form of `RtlVirtualUnwind`. We would need to iterate over functions defined in a module (let's be it `kernel32`), scan each function's `UNWIND_CODE` codes and closely emulate it backwards (as compared to `RtlVirtualUnwind` and precisely `RtlpUnwindPrologue`) in order to find locations on the stack, where to put our fake return addresses.
138
+
139
+
[namazso](https://twitter.com/namazso) mentions the necessity to introduce 3 fake stack frames to nicely stitch the call stack:
140
+
141
+
1. A "desync" frame (consider it as a _gadget-frame_) that unwinds differently compared to the caller of our `MySleep` (having differnt `UWOP` - Unwind Operation code). We do this by looking through all functions from a module, looking through their UWOPs, calculating how big the fake frame should be. This frame must have UWOPS **different** than our `MySleep`'s caller.
142
+
2. Next frame that we want to find is a function that unwindws by popping into `RBP` from the stack - basically through `UWOP_PUSH_NONVOL` code.
143
+
3. Third frame we need a function that restores `RSP` from `RBP` through the code `UWOP_SET_FPREG`
144
+
145
+
The restored `RSP` must be set with the `RSP` taken from wherever control flow entered into our `MySleep` so that all our frames become hidden, as a result of third gadget unwinding there.
146
+
147
+
In order to begin the process, one can iterate over executable's `.pdata` by dereferencing `IMAGE_DIRECTORY_ENTRY_EXCEPTION` data directory entry.
if (frameUwop.UnwindOpcode != myFrameUwop.UnwindOpcode)
181
+
{
182
+
// Found candidate function for a desynch gadget frame
183
+
184
+
}
185
+
}
186
+
```
187
+
188
+
The process is a bit convoluted, yet boils down to reverting thread's call stack unwinding process by substituting arbitrary stack frames with carefully selected other ones, in a ROP alike approach.
189
+
190
+
This PoC does not follows replicate this algorithm, because my current understanding allows me to accept the call stack finishing on an `EXE`-based stack frame and I don't want to overcompliate neither my shellcode loaders nor this PoC. Leaving the exercise of implementing this and sharing publicly to a keen reader. Or maybe I'll sit and have a try on doing this myself given some more spare time :)
191
+
192
+
193
+
**More information**:
194
+
195
+
a) [x64 exception handling - Stack Unwinding process explained](https://docs.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-160)
196
+
b) [Sample implementation of `RtlpUnwindPrologue` and `RtlVirtualUnwind`](https://github.com/mic101/windows/blob/master/WRK-v1.2/base/ntos/rtl/amd64/exdsptch.c)
197
+
c) [`.pdata` section](https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#the-pdata-section)
198
+
d) [another sample implementation of `RtlpUnwindPrologue`](https://github.com/hzqst/unicorn_pe/blob/master/unicorn_pe/except.cpp#L773)
199
+
200
+
108
201
## Example run
109
202
110
203
Use case:
skipped 10 lines
121
214
Example run that spoofs beacon's thread call stack: