Tiny C Binaries

Written on October 24, 2020

By default, following the linking stage, GCC generates ELF binaries that contain redundant section data that increase executable size.

  1. ELF Binaries
  2. Size Optimization
  3. Linux Syscalls
  4. Custom Linker Script
  5. GCC flags
  6. SSTRIP
  7. Source Code

ELF Binaries

The standard file format for executable object code on Linux is ELF (Executable and Linkable Format), it is the successor to the older COFF UNIX file format.

ELF Binaries consist of two sections, the ELF header and file data (object code). The ELF header format for 64-bit binaries is shown in the table below:

Offset Field Description Value
0x00 e_ident[EI_MAG0] magic number 0x7F
0x04 e_ident[EI_CLASS] 32/64 bit 0x2=64bit
0x05 e_ident[EI_DATA] endianness 0x1=little
0x2=big
0x06 e_ident[EI_VERSION] elf version 0x1=original
0x07 e_ident[EI_OSABI] system ABI 0x00= System V
0x02= NetBSD
0x03= Linux
0x09= FreeBSD
0x08 e_ident[EI_ABIVERSION] ABI Version * ignored for static-linked binaries
* vendor specific for dynamic-linked binaries
0x09 e_ident[EI_PAD] undefined * padded with zeros
0x10 e_type object type 0x00= ET_NONE
0x01= ET_REL
0x02= ET_EXEC
0x03= ET_DYN
0x04= ET_CORE
0x12 e_machine system ISA 0x3E= amd64
0xB7= ARM (v8/64)
0x14 e_version elf version 0x1=original
0x18 e_entry entry point 64-bit entry point address
0x20 e_phoff header table offset 64-bit program header table offset
0x28 e_shoff section table offset 64-bit section header table offset
0x30 e_flags undefined vendor specific or pad with zeros
0x34 e_ehsize elf header size 0x40= 64bits, 0x20= 32bits
0x36 e_phentsize header table size -
0x38 e_phnum #(num) entries in header table -
0x3A e_shentsize section table size -
0x3C e_shnum #(num) entries in section table -
0x3E e_shstrndx section names index into section table -
0x40     End of 64-bit ELF

These data fields are used by the Linux PL (program loader) to resolve the entry point for code execution along with various fields such as the ABI version, ISA type, as well as section listings.

A sample hello world program is shown below and was compiled with GCC using gcc main.c -o example

#include <stdio.h>

int main(int agrc, char *argv[]){
printf("Hello, World!");
return  0;
}

This produced an output executable of almost ~17 KB ! If you’ve ever programmed in assembly you might be surprised at the rather large file size for such a simple program.

GNU-binutils objdump allows us to inspect the full list of ELF sections with the -h flag.

After running objdump -h example on our sample binary we see that there are a large number of GCC derived sections: .gnu.version and .note.gnu.property attached to the binary image. The question becomes how much data these additional sections are consuming and to what degree can we ‘strip’ out redundant data.

enter image description here

GNU-binutils comes with a handy utility called strip, which attempts to remove unused ELF sections from a binary. Running strip -s example results only in a slightly reduced file of around ~14.5 KB. Clearly, we need to strip much more! :-o

Size Optimisation

GCC contains a large number of optimisation flags, these include the common : -O2 -O3 -Os flags as well as many more less widely used compile time options, which we will explore further. However, since we have not yet compiled with any optimisation thus far, and as a first step we recompile the above example with -Os, to optimise for size;

And we see no decrease in size! This is expected behaviour however, since the -Os flag does not consider all redundant section data for removal, on the contrary the additional section information placed by GCC in the output binary is considered useful at this level of optimisation.

In addition, the use of printf binds object code from the standard library into the final output executable and so we will instead call through to the Linux kernel directly to print to the standard output stream.

Linux syscalls

System calls on Linux are invoked with the x86_64 syscall opcode and syscall parameters follow a very specific order on 64-bit architectures. For x86_64 (System V ABI - Section A.2.1), the order of parameters for linux system calls is as follows:

# description register (64-bit)
1 syscall number rax
2 arg 1 rdi
3 arg 2 rsi
4 arg 3 rdx
5 arg 4 r10
6 arg 5 r8
7 arg 6 r9

Arguments at user mode level (cdecl calling convention), however, are parsed in the following order:

# description register (64-bit)
1 arg 1 rdi
2 arg 2 rsi
3 arg 3 rdx
4 arg 4 rcx
5 arg 5 r8
6 arg 6 r9

To call through to the linux kernel from C, an assembly wrapper was required to translate user mode arguments (C formal parameters) into kernel syscall arguments:

syscall:
	mov rax,rdi
	mov rdi,rsi
	mov rsi,rdx
	mov rdx,rcx
	mov r10,r8
	mov r8,r9
	syscall
	ret

We may then make a call to this assembly routine from C using the following function signature:

void* syscall(
	void* syscall_number,
	void* param1,
	void* param2,
	void* param3,
	void* param4,
	void* param5
);

To write to the standard output stream we invoke syscall = 0x1, which handles file output. A useful x86_64 Linux syscall table can be found here. Syscall 0x1 takes three arguments and has the following signature:

sys_write( unsigned int fd, const char *buf, size_t count)

A file called base.c was created, implementing both syscall and print wrappers:

// base.c
typedef  unsigned  long  int uintptr;
typedef  long  int intptr;

void* syscall(
	void* syscall_number,
	void* param1,
	void* param2,
	void* param3,
	void* param4,
	void* param5
);

static intptr print(void  const* data, uintptr nbytes)
{
	return (intptr)
		syscall(
		(void*)1, /* sys_write */
		(void*)(intptr)1, /* STD_OUT */
		(void*)data,
		(void*)nbytes,
		0,
		0
		);
}

int main(int agrc, char *argv[]){
	print("Hello, World", 12)
	return 0;
}

In order to instruct GCC to prevent linking in standard library object code, the -nostdlib flag should be passed at compile time. There is one caveat however, in that certain symbols, such as _start , which handle program startup and the parsing of the command line arguments to main , will be left up to us to implement, otherwise we will segfault :-/

However, this is quite trivial and luckily program initialisation is well defined by – System V ABI - Section 3.4.

Initially it is specified that register rsp hold the argument count, while the address given by rsp+0x8 hold an array of 64-bit pointers to the argument strings.

From here the argument count and string pointer array index can be passed to rdi and rsi respectively, the first two parameters of main() . Upon exit, a call to syscall =0x3c is then made to handle program termination gracefully.

Both the syscall and program startup assembly wrappers (written in GAS) were placed in a file called boot.s :

/* boot.s */
.intel_syntax noprefix
.text
.globl _start, syscall

_start:
	xor rbp,rbp /* rbp = 0 */
	pop rdi /* rdi = argc, rsp= rsp + 8 */
	mov rsi,rsp /* rsi = char *ptr[] */
	and rsp,-16 /* align rsp to 16 bytes */
	call main
	mov rdi,rax /* rax = main return value */
	mov rax,60 /* syscall= 0x3c (exit) */
	syscall
	ret

syscall:
	mov rax,rdi
	mov rdi,rsi
	mov rsi,rdx
	mov rdx,rcx
	mov r10,r8
	mov r8,r9
	syscall
	ret

Finally gcc was invoked with gcc base.c boot.s -nostdlib -o base

enter image description here

Wait what!? We still get a ~14kb executable after all that work? Yep, and although we have optimised the main object code for our example, we have not yet stripped out redundant ELF code sections which contribute a majority of the file size.

Custom Linker Script

Although it is possible to strip some redundant sections from an ELF binary using strip, it is much more efficient to use a custom linker script.

A linker script specifies precisely which ELF sections to include in the output binary, which means we can eliminate almost all redundancy. Care, however, must be taken to ensure that essential segments such as .text, .data, .rodata* are not discarded during linking to avoid a segmentation fault.

The linker script that I came up with is shown below (x86_64.ld):

OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64",
	      "elf64-x86-64")
OUTPUT_ARCH(i386:x86-64)
ENTRY(_start)

SECTIONS
{
    . = 0x400000 + SIZEOF_HEADERS;
    .text : { *(.text) *(.data*) *(.rodata*) *(.bss*) }
}

The linker script sets the virtual base address of the output binary to 0x400000 and retains only the essential code segments.

Custom linker scripts are parsed to GCC with the -T switch and the resulting binary was compiled with: gcc -T x86_64.ld base.c boot.s -nostdlib -o base

This produced an output executable of around ~2.7 KB

This is much better, but there is still some room for improvement using additional GCC compile time switches.

GCC Flags

We have thus far managed to shrink our executable size down to ~2.7KB from our initial file size of ~17kb by stripping redundant section data using a custom linker script and removing standard library object code.

However, GCC has several compile time flags that can further help in removing unwanted code sections, these include:

flag description
-ffunction-sections place each function into own section
-fdata-sections place each data item into own section
-Wl,–gc-sections strip unused sections (linker)
-fno-unwind-tables remove unwind tables
-Wl,–build-id=none remove build-id section
-Qn remove .ident directives
-Os optimize code for size
-s strip all sections

Compiling our example again with: gcc -T x86_64.ld base.c boot.s -nostdlib -o base -ffunction-sections -fdata-sections -Wl,--gc-sections -fno-unwind-tables -Wl,--build-id=none -Qn -Os -s

This produces an output executable with a size of ~1.5KB but we can still go further!

Additionally, you can include the -static switch to ensure a static binary. This results in an output executable of ~640 bytes.

SSTRIP

Despite all our optimisation thus far, there are still a few redundant code and data sections in our dynamically linked output executable. Enter sstrip…

sstrip is a useful utility that attempts to identify which sections of an ELF binary are to be loaded into memory during program execution. Based off this, all unused code and data sections are then subsequently removed. It is comparable to strip but performs section removal more aggressively.

Running ./sstrip base we get our final executable binary with a size of ~830 bytes !

At this point it would probably be best to switch to assembly to get smaller file sizes, however the goal of this journal was to create small executables written in C and I think we’ve done quite well to reduce in size from ~17kb down to ~830 bytes!

enter image description here

As a final comment you might be wondering if we could have simply run sstrip from our 17kb executable in the first place and the answer would be, no.

I tried doing this and ended up with a binary image of around ~12 KB so it seems the sstrip needs a bit of additional assistance in the form our our manual optimisations to get really tiny binaries!

Source Code

Source code used in this journal is available at: https://git.lunar.sh/space_hen/tinybasee