Tiny C Binaries
By default, following the linking stage, GCC generates ELF binaries that contain redundant section data that increase executable size.
ELF Binaries
The standard file format for executable object code on Linux is ELF (Executable and Linkable Format), it is the successor to the older COFF UNIX file format.
ELF Binaries consist of two sections, the ELF header and file data (object code). The ELF header format for 64-bit binaries is shown in the table below:
Offset | Field | Description | Value |
---|---|---|---|
0x00 | e_ident[EI_MAG0] | magic number | 0x7F |
0x04 | e_ident[EI_CLASS] | 32/64 bit | 0x2=64bit |
0x05 | e_ident[EI_DATA] | endianness | 0x1=little 0x2=big |
0x06 | e_ident[EI_VERSION] | elf version | 0x1=original |
0x07 | e_ident[EI_OSABI] | system ABI | 0x00= System V 0x02= NetBSD 0x03= Linux 0x09= FreeBSD |
0x08 | e_ident[EI_ABIVERSION] | ABI Version | * ignored for static-linked binaries * vendor specific for dynamic-linked binaries |
0x09 | e_ident[EI_PAD] | undefined | * padded with zeros |
0x10 | e_type | object type | 0x00= ET_NONE 0x01= ET_REL 0x02= ET_EXEC 0x03= ET_DYN 0x04= ET_CORE |
0x12 | e_machine | system ISA | 0x3E= amd64 0xB7= ARM (v8/64) |
0x14 | e_version | elf version | 0x1=original |
0x18 | e_entry | entry point | 64-bit entry point address |
0x20 | e_phoff | header table offset | 64-bit program header table offset |
0x28 | e_shoff | section table offset | 64-bit section header table offset |
0x30 | e_flags | undefined | vendor specific or pad with zeros |
0x34 | e_ehsize | elf header size | 0x40= 64bits, 0x20= 32bits |
0x36 | e_phentsize | header table size | - |
0x38 | e_phnum | #(num) entries in header table | - |
0x3A | e_shentsize | section table size | - |
0x3C | e_shnum | #(num) entries in section table | - |
0x3E | e_shstrndx | section names index into section table | - |
0x40 | End of 64-bit ELF |
These data fields are used by the Linux PL (program loader) to resolve the entry point for code execution along with various fields such as the ABI version, ISA type, as well as section listings.
A sample hello world program is shown below and was compiled with GCC using gcc
main.c -o example
1
2
3
4
5
6
#include <stdio.h>
int main(int agrc, char *argv[]){
printf("Hello, World!");
return 0;
}
This produced an output executable of almost ~17 KB ! If you’ve ever programmed in assembly you might be surprised at the rather large file size for such a simple program.
GNU-binutils objdump
allows us to inspect the full list of ELF sections with
the -h
flag.
After running objdump -h example
on our sample binary we see that there are a
large number of GCC derived sections: .gnu.version
and .note.gnu.property
attached to the binary image. The question becomes how much data these
additional sections are consuming and to what degree can we ‘strip’ out
redundant data.
GNU-binutils comes with a handy utility called strip
, which attempts to remove
unused ELF sections from a binary. Running strip -s example
results only in a
slightly reduced file of around ~14.5 KB. Clearly, we need to strip much
more!
Size Optimisation
GCC contains a large number of optimisation flags, these include the common :
-O2 -O3 -Os
flags as well as many more less widely used compile time options,
which we will explore further. However, since we have not yet compiled with any
optimisation thus far, and as a first step we recompile the above example with
-Os
, to optimise for size;
And we see no decrease in size! This is expected behaviour however, since the
-Os
flag does not consider all redundant section data for removal, on the
contrary the additional section information placed by GCC in the output binary
is considered useful at this level of optimisation.
In addition, the use of printf
binds object code from the standard library
into the final output executable and so we will instead call through to the
Linux kernel directly to print to the standard output stream.
Linux syscalls
System calls on Linux are invoked with the x86_64 syscall
opcode and syscall
parameters follow a very specific order on 64-bit architectures. For x86_64
(System V ABI - Section
A.2.1), the order
of parameters for linux system calls is as follows:
# | description | register (64-bit) |
---|---|---|
1 | syscall number | rax |
2 | arg 1 | rdi |
3 | arg 2 | rsi |
4 | arg 3 | rdx |
5 | arg 4 | r10 |
6 | arg 5 | r8 |
7 | arg 6 | r9 |
Arguments at user mode level (cdecl calling convention), however, are parsed in the following order:
# | description | register (64-bit) |
---|---|---|
1 | arg 1 | rdi |
2 | arg 2 | rsi |
3 | arg 3 | rdx |
4 | arg 4 | rcx |
5 | arg 5 | r8 |
6 | arg 6 | r9 |
To call through to the linux kernel from C, an assembly wrapper was required to translate user mode arguments (C formal parameters) into kernel syscall arguments:
1
2
3
4
5
6
7
8
9
syscall:
mov rax,rdi
mov rdi,rsi
mov rsi,rdx
mov rdx,rcx
mov r10,r8
mov r8,r9
syscall
ret
We may then make a call to this assembly routine from C using the following function signature:
1
2
3
4
5
6
7
8
void* syscall(
void* syscall_number,
void* param1,
void* param2,
void* param3,
void* param4,
void* param5
);
To write to the standard output stream we invoke syscall 0x1
, which handles
file output. A useful x86_64 Linux syscall table can be found
here.
Syscall 0x1
takes three arguments and has the following signature:
sys_write( unsigned int fd, const char *buf, size_t count)
A file called base.c was created, implementing both syscall and print wrappers:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// base.c
typedef unsigned long int uintptr;
typedef long int intptr;
void* syscall(
void* syscall_number,
void* param1,
void* param2,
void* param3,
void* param4,
void* param5
);
static intptr print(void const* data, uintptr nbytes)
{
return (intptr)
syscall(
(void*)1, /* sys_write */
(void*)(intptr)1, /* STD_OUT */
(void*)data,
(void*)nbytes,
0,
0
);
}
int main(int agrc, char *argv[]){
print("Hello, World", 12)
return 0;
}
In order to instruct GCC to prevent linking in standard library object code, the
-nostdlib
flag should be passed at compile time. There is one caveat however,
in that certain symbols, such as _start
, which handle program startup and the
parsing of the command line arguments to main
, will be left up to us to
implement, otherwise we will segfault :-/
However, this is quite trivial and luckily program initialisation is well defined by – System V ABI - Section 3.4.
Initially it is specified that register rsp
hold the argument count, while the
address given by rsp+0x8
hold an array of 64-bit pointers to the argument
strings.
From here the argument count and string pointer array index can be passed to
rdi
and rsi
respectively, the first two parameters of main()
. Upon exit,
a call to syscall 0x3c
is then made to handle program termination gracefully.
Both the syscall and program startup assembly wrappers (written in GAS) were
placed in a file called boot.s
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
/* boot.s */
.intel_syntax noprefix
.text
.globl _start, syscall
_start:
xor rbp,rbp /* rbp = 0 */
pop rdi /* rdi = argc, rsp= rsp + 8 */
mov rsi,rsp /* rsi = char *ptr[] */
and rsp,-16 /* align rsp to 16 bytes */
call main
mov rdi,rax /* rax = main return value */
mov rax,60 /* syscall= 0x3c (exit) */
syscall
ret
syscall:
mov rax,rdi
mov rdi,rsi
mov rsi,rdx
mov rdx,rcx
mov r10,r8
mov r8,r9
syscall
ret
Finally gcc was invoked with gcc base.c boot.s -nostdlib -o base
Wait what!? We still get a ~14kb executable after all that work? Yep, and although we have optimised the main object code for our example, we have not yet stripped out redundant ELF code sections which contribute a majority of the file size.
Custom Linker Script
Although it is possible to strip some redundant sections from an ELF binary
using strip
, it is much more efficient to use a custom linker script.
A linker script specifies precisely which ELF sections to include in the output
binary, which means we can eliminate almost all redundancy. Care, however,
must be taken to ensure that essential segments such as .text
, .data
,
.rodata*
are not discarded during linking to avoid a segmentation fault.
The linker script that I came up with is shown below (x86_64.ld):
1
2
3
4
5
6
7
8
9
10
OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64",
"elf64-x86-64")
OUTPUT_ARCH(i386:x86-64)
ENTRY(_start)
SECTIONS
{
. = 0x400000 + SIZEOF_HEADERS;
.text : { *(.text) *(.data*) *(.rodata*) *(.bss*) }
}
The linker script sets the virtual base address of the output binary to 0x400000 and retains only the essential code segments.
Custom linker scripts are parsed to GCC with the -T
switch and the resulting
binary was compiled with: gcc -T x86_64.ld base.c boot.s -nostdlib -o base
This produced an output executable of around ~2.7 KB
This is much better, but there is still some room for improvement using additional GCC compile time switches.
GCC Flags
We have thus far managed to shrink our executable size down to ~2.7KB from our initial file size of ~17kb by stripping redundant section data using a custom linker script and removing standard library object code.
However, GCC has several compile time flags that can further help in removing unwanted code sections, these include:
flag | description |
---|---|
-ffunction-sections | place each function into own section |
-fdata-sections | place each data item into own section |
-Wl,–gc-sections | strip unused sections (linker) |
-fno-unwind-tables | remove unwind tables |
-Wl,–build-id=none | remove build-id section |
-Qn | remove .ident directives |
-Os | optimize code for size |
-s | strip all sections |
Compiling our example again with: gcc -T x86_64.ld base.c boot.s -nostdlib -o
base -ffunction-sections -fdata-sections -Wl,--gc-sections -fno-unwind-tables
-Wl,--build-id=none -Qn -Os -s
This produces an output executable with a size of ~1.5KB but we can still go further!
Additionally, you can include the -static
switch to ensure a static binary.
This results in an output executable of ~640 bytes.
SSTRIP
Despite all our optimisation thus far, there are still a few redundant code and data sections in our dynamically linked output executable. Enter sstrip…
sstrip is a useful utility that
attempts to identify which sections of an ELF binary are to be loaded into
memory during program execution. Based off this, all unused code and data
sections are then subsequently removed. It is comparable to strip
but performs
section removal more aggressively.
Running ./sstrip base
we get our final executable binary with a size of ~830
bytes !
At this point it would probably be best to switch to assembly to get smaller file sizes, however the goal of this journal was to create small executables written in C and I think we’ve done quite well to reduce in size from ~17kb down to ~830 bytes!
As a final comment you might be wondering if we could have simply run sstrip
from our 17kb executable in the first place and the answer would be, no.
I tried doing this and ended up with a binary image of around ~12 KB so it seems the sstrip needs a bit of additional assistance in the form our our manual optimisations to get really tiny binaries!
Source Code
Source code used in this journal is available at: https://github.com/lunarjournal/tinybase