A Compact Linux Detours Library (x86_64)
Function detouring is a powerful hooking technique that allows for the interception of C/C++ functions.
cdl86
aims to be a compact detours library for x86_64 Linux.
Overview
Note: This article details the linux specific details of the library. Windows support has since been added.
See: https://github.com/lunarjournal/cdl86
Microsoft Research currently maintains a library known as MS Detours. It allows for the interception of Windows API calls within the memory address space of a process.
This might be useful in certain situtations such as if you are writing a D3D9
(DirectX) hook and you need to intercept cetain graphics routines. This is
commonly done for ESP and wallhacks where the Z depth buffer needs to be
disabled for certain character models, for D3D9 this might involve hooking
DrawIndexedPrimitive
.
1
2
3
4
5
6
7
8
9
10
HRESULT WINAPI hkDrawIndexedPrimitive(LPDIRECT3DDEVICE9 pDevice, ...args)
{
// Check play model strides, primitive count, etc
...
pDevice->SetRenderState(D3DRS_ZENABLE, false);
...
// Call original function and return
oDrawIndexedPrimitive(...)
return ...
}
In order to disable the Z buffer in this example we need access to a valid
LPDIRECT3DDEVICE9
context within the running process. This is where detours
comes in handy. Generally, the procedure to hook a specific function is as
follows:
- Declare a function pointer with target function signature.
1
typedef HRESULT (WINAPI* tDrawIndexedPrimitive)(LPDIRECT3DDEVICE9 pDevice, ...args);
- Define detour function with same function signature
1
HRESULT WINAPI hkDrawIndexedPrimitive(LPDIRECT3DDEVICE9 pDevice, ...args)
- Assign the function pointer the target functions address in memory. In this case a VTable entry.
1
2
#define DIP 0x55
tDrawIndexedPrimitive oDrawIndexedPrimitive = (oDrawIndexedPrimitive)SomeVTable[DIP];
- Call DetourFunction
1
DetourFunction((void**)&oDrawIndexedPrimitive, &hkhkDrawIndexedPrimitive)
DetourFunction
then uses the oDrawIndexedPrimitive
function pointer and
modifies the instructions at the target function in order to transfer control
flow to the detour function function.
At this point any calls to DrawIndexedPrimitive
within the LPDIRECT3DDEVICE9
class will be rerouted to hkDrawIndexedPrimitive
. You can see that this is a
very powerful concept and gives us access to the calees function arguments. As
demonstrated, it is possible to hook both C and C++ functions.
The difference generally is that the first argument to a C++ function is a
hidden this
pointer. Therefore you can define a C++ detour in C with this
extra argument.
Detours is great, but it is only available for Windows. The aim of the cdl86
project is to create a simple, compact detours library for x86_64 Linux. What
follows is a brief explanation on how the library was designed.
Detour methods
Two different approaches to method detouring were investigated and implemented in the cdl86 C library. First let’s have a look at a typical function call for a simple C program. We will be using GDB to inspect the resulting dissasembly.
1
2
3
4
5
6
7
8
9
10
11
#include <stdio.h>
int add(int x, int y)
{
return x + y;
}
int main()
{
printf("%i", add(1,1));
return 0;
}
Compile with:
1
gcc main.c -o main
and then debug with GDB:
1
gdb main
To list all the functions in the binary, supply info functions
to the gdb
command prompt.
1
2
3
4
5
6
7
0x0000000000001100 __do_global_dtors_aux
0x0000000000001140 frame_dummy
0x0000000000001149 add
0x0000000000001161 main
0x00000000000011a0 __libc_csu_init
0x0000000000001210 __libc_csu_fini
0x0000000000001218 _fini
Let’s disassemble the main function with disas /r main
:
1
2
3
4
5
6
7
8
Dump of assembler code for function main:
0x0000000000001161 <+0>: f3 0f 1e fa endbr64
0x0000000000001165 <+4>: 55 push %rbp
0x0000000000001166 <+5>: 48 89 e5 mov %rsp,%rbp
0x0000000000001169 <+8>: be 01 00 00 00 mov $0x1,%esi
0x000000000000116e <+13>: bf 01 00 00 00 mov $0x1,%edi
0x0000000000001173 <+18>: e8 d1 ff ff ff callq 0x1149 <add>
0x0000000000001178 <+23>: 89 c6 mov %eax,%esi
callq
has one operand which is the address of the function being called. It
pushes the current value of %rip
(next instruction after call) onto the stack
and then transfers control flow to the target function.
You may have also noticed the presence of the endbr64
intruction. This
instruction is specific to Intel processors and is part of Intel’s Control-Flow
Enforcement Technology
(CET).
CET is designed to provide hardware protection against ROP (Return-orientated
Programming) and similar methods which manipulate control flow using existing
byte code.
It’s two main features are:
- A shadow stack for tracking return addresses.
- Indirect branch tracking, which endbr64 is a part of.
Intel CET however does not prevent us from modifying control flow directly by inserting instructions into memory.
JMP Patching
The first method of function detouring we will explore is by inserting a JMP
instruction at the beginning of the target function to transfer control over to
the detour function. It should be noted that in order to preserve the stack we
need to use a JMP
(specifically jmpq
) instruction rather than a CALL.
Since there is no way to pass a 64-bit address to the jmpq
instruction we will
have to first store the address we want to jump to into a register. We need to
choose a register that is not part of the __cdecl
(defualt) calling
convention. %rax
happens to be a register that is not part of the __cdecl
userspace calling convention and so for simplicity we use this register in our
design.
The following is a disassembly of the instructions required for a JMP
to a
64-bit immediate address:
1
2
0x0000555555561389 <+0>: 48 b8 b1 13 56 55 55 55 00 00 movabs $0x5555555613b1,%rax
0x0000555555561393 <+10>: ff e0 jmpq *%rax
You can see that 12 bytes are required to encode the movabs
instruction (which
moves the detour address into %rax
) as well as the jmpq
instruction.
Immediate values are stored in little endian (LE) encoding.
So we can therefore conclude that we need to patch at least 12 bytes in memory at the location of our target function. These 12 bytes however are important and we cannot simply discard them. It turns out that we actually place these bytes at the start of what i will call a ‘trampoline function’, it’s layout is as follows:
1
2
3
trampoline <0x23215412>:
(original instruction bytes which were patched)
JMP (target + JMP patch length)
Simply put, the trampoline function behaves as the original, unpatched function.
As shown above it consists of the target function’s original instruction bytes
as well as a call to the target function, offset by the JMP
patch length.
The tampoline generation code for cdl86
is shown below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
uint8_t *cdl_gen_trampoline(uint8_t *target, uint8_t *bytes_orig, int size)
{
uint8_t *trampoline;
int prot = 0x0;
int flags = 0x0;
/* New function should have read, write and
* execute permissions.
*/
prot = PROT_READ | PROT_WRITE | PROT_EXEC;
flags = MAP_PRIVATE | MAP_ANONYMOUS;
/* We use mmap to allocate trampoline memory pool. */
trampoline = mmap(NULL, size + BYTES_JMP_PATCH, prot, flags, -1, 0);
memcpy(trampoline, bytes_orig, size);
/* Generate jump to address just after call
* to detour in trampoline. */
cdl_gen_jmpq_rax(trampoline + size, target + size);
return trampoline;
}
You can see that the allocation of the trampoline function occurs through a call
to mmap
with the PROT_READ | PROT_WRITE | PROT_EXEC
memory protection flags.
Therefore it should also be noted that the correct memory permissions should be
set for both the target function before modification as well as the trampoline
function, after allocation. Here is a snippet from the cdl86
library for
setting memory attributes:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
/* Set R/W memory protections for code page. */
int cdl_set_page_protect(uint8_t *code)
{
int perms = 0x0;
int ret = 0x0;
/* Read, write and execute perms. */
perms = PROT_EXEC | PROT_READ | PROT_WRITE;
/* Calculate page size */
uintptr_t page_size = sysconf(_SC_PAGE_SIZE);
ret = mprotect(code - ((uintptr_t)(code) % page_size), page_size, perms);
return ret;
}
The general procedure to place the JMP
hook is as follows:
- Determine the minimum number of bytes required for a JMP patch
- Create trampoline function
- Set memory permissions (read, write, execute)
- Generate
JMP
to detour at target function - Fill unused bytes with NOP
- Assign trampoline address to target function pointer.
Let’s have a look at all of this in action using GDB. I will be using the
basic_jmp.c
test case in the cdl86
library. The source code for this test case is shown
below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include "cdl.h"
typedef int add_t(int x, int y);
add_t *addo = NULL;
int add(int x, int y)
{
printf("Inside original function\n");
return x + y;
}
int add_detour(int x, int y)
{
printf("Inside detour function\n");
return addo(5,5);
}
int main()
{
struct cdl_jmp_patch jmp_patch = {};
addo = (add_t*)add;
printf("Before attach: \n");
printf("add(1,1) = %i\n\n", add(1,1));
jmp_patch = cdl_jmp_attach((void**)&addo, add_detour);
if(jmp_patch.active)
{
printf("After attach: \n");
printf("add(1,1) = %i\n\n", add(1,1));
printf("== DEBUG INFO ==\n");
cdl_jmp_dbg(&jmp_patch);
}
cdl_jmp_detach(&jmp_patch);
printf("\nAfter detach: \n");
printf("add(1,1) = %i\n\n", add(1,1));
return 0;
}
We compile the following source file with (modified from makefile):
1
gcc -I../ -g basic_jmp.c ../cdl.c ../lib/libudis86/*.c -g -o basic_jmp
Then load into GDB using:
1
gdb basic_jmp
Once GDB has loaded, we insert a breakpoints at lines 24 and 27 using the command:
1
2
break 24
break 27
We start execution of the program with:
1
run
GDB will then inform you that the first breakpoint has been triggered. For this first breakpoint we are interested in the add() function’s assembly before the hook has taken place. To inspect this assembly, provide:
1
disas /r add
1
2
3
4
5
6
Dump of assembler code for function add:
0x0000555555561389 <+0>: f3 0f 1e fa endbr64
0x000055555556138d <+4>: 55 push %rbp
0x000055555556138e <+5>: 48 89 e5 mov %rsp,%rbp
0x0000555555561391 <+8>: 48 83 ec 10 sub $0x10,%rsp
0x0000555555561395 <+12>: 89 7d fc mov %edi,-0x4(%rbp)
This is the disassembly of the unaltered target function. 12 bytes for the JMP
patch will have to be written at this address. Therefore the first 4
instructions will need to be written to the trampoline function followed by a
JMP to address 0x0000555555561395
and that’s all we need for the trampoline!
Now the fun part! Let’s continue execution to the next breakpoint, where our
JMP
hook will be placed.
1
continue
Let’s examine the disassembly of our add()
function once again:
1
2
3
4
5
Dump of assembler code for function add:
0x0000555555561389 <+0>: 48 b8 b1 13 56 55 55 55 00 00 movabs $0x5555555613b1,%rax
0x0000555555561393 <+10>: ff e0 jmpq *%rax
0x0000555555561395 <+12>: 89 7d fc mov %edi,-0x4(%rbp)
0x0000555555561398 <+15>: 89 75 f8 mov %esi,-0x8(%rbp)
0x5555555613b1
is the address of our detour/intercept function. Let’s examine
the disassembly of our detour function:
1
disas /r 0x5555555613b1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Dump of assembler code for function add_detour:
0x00005555555613b1 <+0>: f3 0f 1e fa endbr64
0x00005555555613b5 <+4>: 55 push %rbp
0x00005555555613b6 <+5>: 48 89 e5 mov %rsp,%rbp
0x00005555555613b9 <+8>: 48 83 ec 10 sub $0x10,%rsp
0x00005555555613bd <+12>: 89 7d fc mov %edi,-0x4(%rbp)
0x00005555555613c0 <+15>: 89 75 f8 mov %esi,-0x8(%rbp)
0x00005555555613c3 <+18>: 48 8d 3d 53 5c 00 00 lea 0x5c53(%rip),%rdi
0x00005555555613ca <+25>: e8 b1 fd ff ff callq 0x555555561180 <puts@plt>
0x00005555555613cf <+30>: 48 8b 05 ba bc 01 00 mov 0x1bcba(%rip),%rax
0x00005555555613d6 <+37>: be 05 00 00 00 mov $0x5,%esi
0x00005555555613db <+42>: bf 05 00 00 00 mov $0x5,%edi
0x00005555555613e0 <+47>: ff d0 callq *%rax
0x00005555555613e2 <+49>: c9 leaveq
0x00005555555613e3 <+50>: c3 retq
We can see that a call to our trampoline function is made to the address given
by referencing the QWORD (out function pointer) at address 0x55555557d090
,
let’s deference it:
1
print /x *(long unsigned int*)(0x55555557d090)
1
$20 = 0x7ffff7ffb000
So the function pointer is pointing to address 0x7ffff7ffb000
which is our
trampoline function, let’s dissasemble it:
1
x/10i 0x7ffff7ffb000
1
2
3
4
5
6
7
8
9
10
0x7ffff7ffb000: endbr64
0x7ffff7ffb004: push %rbp
0x7ffff7ffb005: mov %rsp,%rbp
0x7ffff7ffb008: sub $0x10,%rsp
0x7ffff7ffb00c: movabs $0x555555561395,%rax
0x7ffff7ffb016: jmpq *%rax
0x7ffff7ffb018: add %al,(%rax)
0x7ffff7ffb01a: add %al,(%rax)
0x7ffff7ffb01c: add %al,(%rax)
0x7ffff7ffb01e: add %al,(%rax)
You can see that our trampoline contains the first 4 instructions that were
replaced when the JMP
patch was placed in our target function. You can see a
jmp back to address 0x555555561395
which was disassembled earlier. This should
give you an idea of how the control flow modification is achieved.
INT3 Patching
There is another method of function detouring which involves placing INT3
breakpoints at the start of the target function in memory. INT3
breakpoints
are encoded with the 0xCC
opcode:
1
2
3
4
5
6
/* Generate int3 instruction. */
uint8_t *cdl_gen_swbp(uint8_t *code)
{
*(code + 0x0) = 0xCC;
return code;
}
So rather than placing a JMP
patch to the detour we simply write the byte
0xCC
to the target function being careful to NOP the unused bytes. Once the
RIP
register reaches an address of an INT3
breakpoint the Linux kernel sends
a SIGTRAP
signal to the process.
We can register our own signal handler but we need some additional info on the
signal such as context information. A context is the state of a program’s
registers and stack. We need this info to compare the breakpoints RIP
value to
any active global software breakpoints.
This is how the signal handler is registered in cdl86
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
struct sigaction sa = {};
/* Initialise cdl signal handler. */
if (!cdl_swbp_init)
{
/* Request signal context info which
* is required for RIP register comparison.
*/
sa.sa_flags = SA_SIGINFO | SA_ONESHOT;
sa.sa_sigaction = (void *)cdl_swbp_handler;
sigaction(SIGTRAP, &sa, NULL);
cdl_swbp_init = true;
}
...
Note the use of SA_SIGINFO
to get context information. The software breakpoint
handler is then defined as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
void cdl_swbp_handler(int sig, siginfo_t *info, struct ucontext_t *context)
{
int i = 0x0;
bool active = false;
uint8_t *bp_addr = NULL;
/* RIP register point to instruction after the
* int3 breakpoint so we subtract 0x1.
*/
bp_addr = (uint8_t *)(context->uc_mcontext.gregs[REG_RIP] - 0x1);
/* Iterate over all breakpoint structs. */
for (i = 0; i < cdl_swbp_size; i++)
{
active = cdl_swbp_hk[i].active;
/* Compare breakpoint addresses. */
if (bp_addr == cdl_swbp_hk[i].bp_addr)
{
/* Update RIP and reset context. */
context->uc_mcontext.gregs[REG_RIP] = (greg_t)cdl_swbp_hk[i].detour;
setcontext(context);
}
}
}
Note that if a match of the RIP value to any known breakpoints occurs the RIP
value for the current context is updated and the new context applied using
setcontext()
. A trampoline function similar to our JMP
patch is allocated
and serves the same purpose.
Code Injection
cdl86
assumes that you are operating in the address space of the target
process. Therefore code injection is often required in practice and requires the
use of an
injector.
Once a shared library (.so
) has been injected you can use the following code
to get the base address of the main executable module:
1
2
3
4
5
6
7
8
9
10
11
#include <link.h>
#include <inttypes.h>
int __attribute__((constructor)) init()
{
...
struct link_map *lm = dlopen(0, RTLD_NOW);
printf("base = %" PRIx64 , lm->l_addr);
...
}
Or find the address of a function by symbol name:
1
2
void* dl_handle = dlopen(NULL, RTLD_LAZY);
void* add_ptr = dlsym(dl_handle, "add");
API
The API for the cdl86
library is shown below:
1
2
3
4
5
6
struct cdl_jmp_patch cdl_jmp_attach(void **target, void *detour);
struct cdl_swbp_patch cdl_swbp_attach(void **target, void *detour);
void cdl_jmp_detach(struct cdl_jmp_patch *jmp_patch);
void cdl_swbp_detach(struct cdl_swbp_patch *swbp_patch);
void cdl_jmp_dbg(struct cdl_jmp_patch *jmp_patch);
void cdl_swbp_dbg(struct cdl_swbp_patch *swbp_patch);
Source code
You can find the cdl86
source code
here.
This project was inspired by some reverse engineering research I did for my
undergraduate thesis.