-
Автор темы
- #1
Обратите внимание, пользователь заблокирован на форуме. Не рекомендуется проводить сделки.
The Perfect Injector: Abusing Windows Address Sanitization And CoW
Только ваша активность мотивирует меня пастить для вас топ темы
This is an injector abusing the implementation of certain features in Windows operating system to make your dll / injection hard to detect by anti-cheats. It will map your image in a special way that makes the memory pages invisible to Windows APIs, not debuggable from user-mode and execute it without creating or interfering with execution of a thread.
It does not require a DSE/PG disabled session, a kernel driver running after initialization or a handle. Implementation details and source code can be found both below and at my blog:
It has a few detection vectors but it is very simple to turn this into an excellent project if you understand how it works.
Usage:It does not require a DSE/PG disabled session, a kernel driver running after initialization or a handle. Implementation details and source code can be found both below and at my blog:
Пожалуйста, авторизуйтесь для просмотра ссылки.
.It has a few detection vectors but it is very simple to turn this into an excellent project if you understand how it works.
pInjector.exe ProcessName.exe "dll path" (flags if appropriate)
Flags:
NoLoadLib - Uses GetModuleHandleA instead of LoadLibraryA
WaitKey - Waits for F2 key before injecting instead of injecting instantly when the process launches
Not supported:
- KVA Shadowing
- Wow64
- SEH / VEH ( and it cannot be either as this memory region does not seem like a valid one to Windows )
- TLS
- Import mapping
- Any other cancerous PE details
I have only tested it for Windows 7 and Windows 10. When the target process dies / injection fails, use the F1 key to abort injection instead of closing the pInjector directly as it will leave a permanent mark on current session's kernel32 if you do not close it properly after the waiting for threads phase.
Пожалуйста, авторизуйтесь для просмотра ссылки.
Пожалуйста, авторизуйтесь для просмотра ссылки.
By the end of this post, I aim to make an injector unlike any other: one that by design makes your DLL not debuggable from UM, makes your pages invisible to NtQueryVirtualMemory and NtReadVirtualMemory, and lets you execute code in target process without even having a valid handle; and while doing this I want it to be compatible with Patchguard, have no kernel driver loaded while the target is running and require no handle at all.
Now, this may seem like a stupidly complicated goal, however, it is in fact really simple because Windows will be helping us.
Anyone who has opened ntoskrnl.exe in IDA probably noticed these checks:
Код:
__int64 __usercall MiReadWriteVirtualMemory@<rax>(ULONG_PTR BugCheckParameter1@<rcx>, unsigned __int64 a2@<rdx>, unsigned __int64 a3@<r8>, __int64 a4@<r9>, __int64 a5, int a6)
{
...
if ( v10 < a3 || v9 > 0x7FFFFFFEFFFFi64 || v10 > 0x7FFFFFFEFFFFi64 )
return 0xC0000005i64;
...
}
__int64 __fastcall MmQueryVirtualMemory(__int64 a1, unsigned __int64 a2, __int64 a3, unsigned __int64 a4, unsigned __int64 a5, unsigned __int64 *a6)
{
...
if ( v12 > 0x7FFFFFFEFFFFi64 )
return 0xC000000Di64;
...
}
Here's what makes them so interesting: these constants are hard-coded by the operating systems and are NOT what the processor actually uses to decide whether a page is accessible from cpl3 or not.
In case you are not familiar with page tables, here's how virtual memory works:
The first 12 bits (&0xFFF) of a virtual address indicates the offset from the resolved page, the next four 9 bit combinations (&0x1FF000, &0x3FE00000, &0x7FC0000000, &0xFF8000000000) indicate the indices of the entry in the page table, page directory, page directory pointer and page map level4 respectively. These entries, apart from linking to the lower level also contain certain flags like write disable, execute disable, etc; as you can see from the definitions below.
Код:
#pragma pack(push, 1)
typedef union CR3_
{
uint64_t value;
struct
{
uint64_t ignored_1 : 3;
uint64_t write_through : 1;
uint64_t cache_disable : 1;
uint64_t ignored_2 : 7;
uint64_t pml4_p : 40;
uint64_t reserved : 12;
};
} PTE_CR3;
typedef union VIRT_ADDR_
{
uint64_t value;
void *pointer;
struct
{
uint64_t offset : 12;
uint64_t pt_index : 9;
uint64_t pd_index : 9;
uint64_t pdpt_index : 9;
uint64_t pml4_index : 9;
uint64_t reserved : 16;
};
} VIRT_ADDR;
typedef uint64_t PHYS_ADDR;
typedef union PML4E_
{
uint64_t value;
struct
{
uint64_t present : 1;
uint64_t rw : 1;
uint64_t user : 1;
uint64_t write_through : 1;
uint64_t cache_disable : 1;
uint64_t accessed : 1;
uint64_t ignored_1 : 1;
uint64_t reserved_1 : 1;
uint64_t ignored_2 : 4;
uint64_t pdpt_p : 40;
uint64_t ignored_3 : 11;
uint64_t xd : 1;
};
} PML4E;
typedef union PDPTE_
{
uint64_t value;
struct
{
uint64_t present : 1;
uint64_t rw : 1;
uint64_t user : 1;
uint64_t write_through : 1;
uint64_t cache_disable : 1;
uint64_t accessed : 1;
uint64_t dirty : 1;
uint64_t page_size : 1;
uint64_t ignored_2 : 4;
uint64_t pd_p : 40;
uint64_t ignored_3 : 11;
uint64_t xd : 1;
};
} PDPTE;
typedef union PDE_
{
uint64_t value;
struct
{
uint64_t present : 1;
uint64_t rw : 1;
uint64_t user : 1;
uint64_t write_through : 1;
uint64_t cache_disable : 1;
uint64_t accessed : 1;
uint64_t dirty : 1;
uint64_t page_size : 1;
uint64_t ignored_2 : 4;
uint64_t pt_p : 40;
uint64_t ignored_3 : 11;
uint64_t xd : 1;
};
} PDE;
typedef union PTE_
{
uint64_t value;
VIRT_ADDR vaddr;
struct
{
uint64_t present : 1;
uint64_t rw : 1;
uint64_t user : 1;
uint64_t write_through : 1;
uint64_t cache_disable : 1;
uint64_t accessed : 1;
uint64_t dirty : 1;
uint64_t pat : 1;
uint64_t global : 1;
uint64_t ignored_1 : 3;
uint64_t page_frame : 40;
uint64_t ignored_3 : 11;
uint64_t xd : 1;
};
} PTE;
#pragma pack(pop)
Код:
Pte->user & Pde->user & Pdpte->user & Pml4e->user
Код:
Va >= 0xFFFFFFFF80000000
Код:
BOOL ExposeKernelMemoryToProcess( MemoryController& Mc, PVOID Memory, SIZE_T Size, uint64_t EProcess )
{
Mc.AttachTo( EProcess );
BOOL Success = TRUE;
Mc.IterPhysRegion( Memory, Size, [ & ] ( PVOID Va, uint64_t Pa, SIZE_T Sz )
{
auto Info = Mc.QueryPageTableInfo( Va );
Info.Pml4e->user = TRUE;
Info.Pdpte->user = TRUE;
Info.Pde->user = TRUE;
if ( !Info.Pde || ( Info.Pte && ( !Info.Pte->present ) ) )
{
Success = FALSE;
}
else
{
if ( Info.Pte )
Info.Pte->user = TRUE;
}
} );
Mc.Detach();
return Success;
}
PVOID Memory = AllocateKernelMemory( CpCtx, KrCtx, Size );
ExposeKernelMemoryToProcess( Controller, Memory, Size, Controller.CurrentEProcess );
ZeroMemory( Memory, Size );
(I am using
Пожалуйста, авторизуйтесь для просмотра ссылки.
before so if you want to see how the linear translation or the resolving of page table entries are implemented you can check that out.)0x2: Abusing Copy-on-Write
Now that we are done with hiding the memory, all that is left to do is actually execute it and to do that we will be abusing Copy-on-Write this time.
CoW is a technique used by operating systems to save memory by making processes share certain physical memory regions until they actually get edited.
We know that ntdll.dll gets loaded for every process and its code (.text) region is rarely modified if at all, so why allocate physical memory for it again and again for hundreds of processes? That is exactly why modern operating systems use the technique called CoW.
The implementation is very simple:
- When a PE file gets mapped, if it was mapped to some other process too and its VA is free on the current process as well, simply copy the PFN and set the flag to make it read-only.
- When a PageFault occurs due to an instruction trying to write on the page, allocate new physical memory, set the PFN of the PTE and remove the read-only flag.
How can we hijack a thread with this?
Well, let's pick a commonly called function and hook it: TlsGetValue.
Now, the PML4E changes from process to process so the kernel memory we exposed are not accessible from all processes, so we need to find a padding in KERNEL32.dll to check for the pid before we just jump to our stub in our lovely kernel page.
The pid check will be very simple:
Код:
std::vector<BYTE> PidBasedHook =
{
0x65, 0x48, 0x8B, 0x04, 0x25, 0x30, 0x00, 0x00, 0x00, // mov rax, gs:[0x30]
0x8B, 0x40, 0x40, // mov eax,[rax+0x40] ; pid
0x3D, 0xDD, 0xCC, 0xAB, 0x0A, // cmp eax, TargetPid
0x0F, 0x85, 0x00, 0x00, 0x00, 0x00, // jne 0xAABBCC
0x48, 0xB8, 0xAA, 0xEE, 0xDD, 0xCC, 0xBB, 0xAA, 0x00, 0x00, // mov rax, KernelMemory
0xFF, 0xE0 // jmp rax
};
In the execution stub, we will have to do some tricks as well. We only want one thread to execute our code, we want to unhook TlsGetValue before we continue execution and I noticed that sometimes the changes in physical memory didn't instantly have an effect on instructions executed and we want to make sure they are applied, so we will implement three checks at the beginning of the stub.
Код:
std::vector<BYTE> Prologue =
{
0x00, 0x00, // data
0xF0, 0xFE, 0x05, 0xF8, 0xFF, 0xFF, 0xFF, // lock inc byte ptr [rip-n]
// wait_lock:
0x80, 0x3D, 0xF0, 0xFF, 0xFF, 0xFF, 0x00, // cmp byte ptr [rip-m], 0x0
0xF3, 0x90, // pause
0x74, 0xF5, // je wait_lock
0x48, 0xB8, 0xAA, 0xEE, 0xDD, 0xCC, 0xBB, 0xAA, 0x00, 0x00, // mov rax, 0xAABBCCDDEEAA
// data_sync_lock:
0x0F, 0x0D, 0x08, // prefetchw [rax]
0x81, 0x38, 0xDD, 0xCC, 0xBB, 0xAA, // cmp dword ptr[rax], 0xAABBCCDD
0xF3, 0x90, // pause
0x75, 0xF3, // jne data_sync_lock
0xF0, 0xFE, 0x0D, 0xCF, 0xFF, 0xFF, 0xFF, // lock dec byte ptr [rip-n]
0x75, 0x41, // jnz continue_exec
0x53, // --- start executing DllMain ---
Now that we have all tricks set-up, the implementation of the actual injector is very simple:
- Load vulnerable driver
- Map physical memory to user-mode
- Search for certain offsets (UniqueProcessId, DirectoryTableBase, ActiveProcessLinks)
- Save current EProcess and CR3 values for user-mode use
- Allocate enough kernel pool memory for our injector stub and image
- Unload vulnerable driver
- Map our image to the kernel memory (Fix .relocs and create a stub that gets the imports for us as I cannot bother reading EProcess->Peb)
- Wait for target process
- Expose the kernel page to target process
- Hook TlsGetValue system-wide and make it check for pid before jumping to our stub at kernel memory
- Wait for Stub->SpinningThreadCount to be non zero
- Unhook TlsGetValue, set Stub->Free = TRUE
- Profit.
Source code:
Forgive me for the hasty image mapping implementation, and the debug code left if there is any.
This is meant to be a PoC rather than a ready to go pasta.
PS for nasty boy: Pasted from
Пожалуйста, авторизуйтесь для просмотра ссылки.
Forgive me for the hasty image mapping implementation, and the debug code left if there is any.
This is meant to be a PoC rather than a ready to go pasta.
PS for nasty boy: Pasted from
Последнее редактирование: