STRLCPY/ThreadStackSpoofer

readme
Mariusz B. / mgeeky committed 3 years ago

18575189

1 parent 36c73681

Revision indexing in progress... (symbol navigation in revisions will be accurate after indexed)

Total 1 files

■ ■ ■ ■ ■ ■

README.md

		skipped 30 lines
31	31		The previous implementation, utilising `StackWalk64` can be accessed in this [commit c250724](https://github.com/mgeeky/ThreadStackSpoofer/tree/c2507248723d167fb2feddf50d35435a17fd61a2).
32	32
33	33
	34	+	## Demo
	35	+
	36	+	This is how a call stack may look like when it is NOT spoofed:
	37	+
	38	+	![not-spoofed](images/not-spoofed.png)
	39	+
	40	+	This in turn, when thread stack spoofing is enabled:
	41	+
	42	+	![spoofed](images/spoofed2.png)
	43	+
	44	+	Above we can see that the last frame on our call stack is our `MySleep` callback. That immediately brings opportunities for IOCs hunting for threads having call stacks not unwinding into following two commonly expected system entry points:
	45	+	```
	46	+	kernel32!BaseThreadInitThunk+0x14
	47	+	ntdll!RtlUserThreadStart+0x21
	48	+	```
	49	+
	50	+	However a brief examination of my system shown, that there are plenty of threads having call stacks not unwinding to the above handlers:
	51	+
	52	+	![legit call stack](images/legit-call-stack.png)
	53	+
	54	+	The above screenshot shows unmodified, unhooked, thread of Total Commander x64.
	55	+
	56	+	Why should we care about carefully faking our call stack when there are processes exhibiting traits that we can simply mimic?
	57	+
	58	+
34	59		## How it works?
35	60
36	61		This program performs self-injection shellcode (roughly via classic `VirtualAlloc` + `memcpy` + `CreateThread`).
		skipped 25 lines
62	87		This precise logic is provided by `walkCallStack` and `spoofCallStack` functions in `main.cpp`.
63	88
64	89
65		-	## Demo
	90	+	## Example run
	91	+
	92	+	Use case:
66	93
67		-	This is how a call stack may look like when it is NOT spoofed:
	94	+	```
	95	+	C:\> ThreadStackSpoofer.exe <shellcode> <spoof>
	96	+	```
68	97
69		-	![not-spoofed](images/not-spoofed.png)
	98	+	Where:
	99	+	- `<shellcode>` is a path to the shellcode file
	100	+	- `<spoof>` when `1` or `true` will enable thread stack spoofing and anything else disables it.
70	101
71		-	This in turn, when thread stack spoofing is enabled:
72	102
73		-	![spoofed](images/spoofed2.png)
	103	+	Example run that spoofs beacon's thread call stack:
74	104
75		-	Above we can see that the last frame on our call stack is our `MySleep` callback. That immediately brings opportunities for IOCs hunting for threads having call stacks not unwinding into following two commonly expected system entry points:
76	105		```
77		-	kernel32!BaseThreadInitThunk+0x14
78		-	ntdll!RtlUserThreadStart+0x21
79		-	```
	106	+	PS D:\dev2\ThreadStackSpoofer> .\x64\Release\ThreadStackSpoofer.exe .\tests\beacon64.bin 1
	107	+	[.] Reading shellcode bytes...
	108	+	[.] Hooking kernel32!Sleep...
	109	+	[.] Injecting shellcode...
	110	+	[+] Shellcode is now running.
	111	+	[>] Original return address: 0x1926747bd51. Finishing call stack...
80	112
81		-	However a brief examination of my system shown, that there are plenty of threads having call stacks not unwinding to the above handlers:
	113	+	===> MySleep(5000)
82	114
83		-	![legit call stack](images/legit-call-stack.png)
	115	+	[<] Restoring original return address...
	116	+	[>] Original return address: 0x1926747bd51. Finishing call stack...
84	117
85		-	The above screenshot shows unmodified, unhooked, thread of Total Commander x64.
	118	+	===> MySleep(5000)
86	119
87		-	Why should we care about carefully faking our call stack when there are processes exhibiting traits that we can simply mimic?
88		-
	120	+	[<] Restoring original return address...
	121	+	[>] Original return address: 0x1926747bd51. Finishing call stack...
	122	+	```
89	123
	124	+	---
90	125
91	126		## How do I use it?
92	127
		skipped 7 lines
100	135		- Clear out any leftovers from Reflective Loader to avoid in-memory signatured detections
101	136		- Unhook everything you might have hooked (such as AMSI, ETW, WLDP) before sleeping and then re-hook afterwards.
102	137
	138	+
	139	+	---
103	140
104	141		## Actually this is not (yet) a true stack spoofing
105	142
106		-	As it's been pointed out to me, the technique here is not _yet_ truly holding up to its name for being a _stack spoofer_. Since we're merely overwriting return addresses on the thread's stack, we're not spoofing the remaining areas of the stack itself. Moreover we leave a sequence of `::CreateFileW` addresses which looks very odd and let the thread be unable to unwind its stack. That's because `CreateFile` was meant to solely act as an example, we're making the stack non-unwindable but still obscuring references to our shellcode memory pages.
	143	+	As it's been pointed out to me, the technique here is not _yet_ truly holding up to its name for being a _stack spoofer_. Since we're merely overwriting return addresses on the thread's stack, we're not spoofing the remaining areas of the stack itself. Moreover we're leaving our call stack _unwindable_ meaking it look anomalous since the system will not be able to properly walk the entire call stack frames chain.
107	144
108	145		However I'm aware of these shortcomings, at the moment I've left it as is since I cared mostly about evading automated scanners that could iterate over processes, enumerate their threads, walk those threads stacks and pick up on any return address pointing back to a non-image memory (such as `SEC_PRIVATE` - the one allocated dynamically by `VirtuaAlloc` and friends). A focused malware analyst would immediately spot the oddity and consider the thread rather unusual, hunting down our implant. More than sure about it. Yet, I don't believe that nowadays automated scanners such as AV/EDR have sorts of heuristics implemented that would _actually walk each thread's stack_ to verify whether its un-windable `¯\_(ツ)_/¯` .
109	146
110	147		Surely this project (and commercial implementation found in C2 frameworks) gives AV & EDR vendors arguments to consider implementing appropriate heuristics covering such a novel evasion technique.
111	148
112		-	The research on the subject is not yet finished and hopefully will result in a better quality _Stack Spoofing_ in upcoming days. Nonetheless, I'm releasing what I got so far in hope of sparkling inspirations and interest community into further researching this area.
	149	+	In order to improve this technique, one can aim for a true _Thread Stack Spoofer_ by inserting carefully crafted fake stack frames established in an reverse-unwinding process.
	150	+	Read more on this idea below.
113	151
114		-	Next areas for improving the outcome are to research how we can _exchange_ or copy stacks with one of the following ideas:
115	152
116		-	1. utilising [`GetCurrentThreadStackLimits`](https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-getcurrentthreadstacklimits)/`NtQueryInformationThread`) from a legitimate thread running `kernel32!Sleep(INFINITE)`
117		-
118		-	2. manipulating our Beacon's thread `TEB/TIB` structures and fields such as `TebBaseAddress`, `NT_TIB.StackBase / NT_TIB.StackLimit` by swapping them with values taken from another legitimate thread.
119		-
120		-	3. playing with `RBP/EBP` and `RSP/ESP` pointers on a paused Beacon's thread to change stacks in a similar manner to ROP chains - by swapping values of these registers while Beacon's thread is suspended.
121		-
122		-	4. Create a new user stack with `RtlCreateUserStack` / `RtlFreeUserStack` and exchange stacks from a Beacons thread into that newly created one
123		-
124		-
125		-	## Implementing a true Thread Stack Spoofer
	153	+	### Implementing a true Thread Stack Spoofer
126	154
127	155		Hours-long conversation with [namazso](https://twitter.com/namazso) teached me, that in order to aim for a proper thread stack spoofer we would need to reverse x64 call stack unwinding process.
128	156		Firstly, one needs to carefully acknowledge the stack unwinding process explained in (a) linked below. The system when traverses Thread call stack on x64 architecture will not simply rely on return addresses scattered around the thread's stack, but rather it:
		skipped 68 lines
197	225		- c) [`.pdata` section](https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#the-pdata-section)
198	226		- d) [another sample implementation of `RtlpUnwindPrologue`](https://github.com/hzqst/unicorn_pe/blob/master/unicorn_pe/except.cpp#L773)
199	227
200		-
201		-	## Example run
202		-
203		-	Use case:
204		-
205		-	```
206		-	C:\> ThreadStackSpoofer.exe <shellcode> <spoof>
207		-	```
208		-
209		-	Where:
210		-	- `<shellcode>` is a path to the shellcode file
211		-	- `<spoof>` when `1` or `true` will enable thread stack spoofing and anything else disables it.
212		-
213		-
214		-	Example run that spoofs beacon's thread call stack:
215		-
216		-	```
217		-	PS D:\dev2\ThreadStackSpoofer> .\x64\Release\ThreadStackSpoofer.exe .\tests\beacon64.bin 1
218		-	[.] Reading shellcode bytes...
219		-	[.] Hooking kernel32!Sleep...
220		-	[.] Injecting shellcode...
221		-	[+] Shellcode is now running.
222		-	[>] Original return address: 0x1926747bd51. Finishing call stack...
223		-
224		-	===> MySleep(5000)
225		-
226		-	[<] Restoring original return address...
227		-	[>] Original return address: 0x1926747bd51. Finishing call stack...
228		-
229		-	===> MySleep(5000)
230		-
231		-	[<] Restoring original return address...
232		-	[>] Original return address: 0x1926747bd51. Finishing call stack...
233		-	```
	228	+	---
234	229
235	230		## Word of caution
236	231
		skipped 53 lines

readme