<< ../

# ELF64 hello world

I like to understand how things work under the hood. Assembly programming is very close to the metal but, even then, the assembler hides some complexity to make our lives easier. Also the linker does lots of work I don’t completely understand.

So I though I would make an executable by hand, where every bit in it is understood. The resulting ELF64 executable is very small (160 bytes) compared with the executables generated by C compilers and even assemblers.

I found the wikipedia article on the ELF format to be quite handy, although I also had to look up other resources (all of them at the end).

The executable has only one program segment that contains both code and data. There are no sections. Segments and sections are different things1 2.

I also read this article about making the smallest possible ELF32 executable that returns 42. Lots of tricks are used in order to make the executable as small a possible, even some that abuse the ELF format spec, like overlapping different sections. We are not going to go as far, just try to make the simplest hello world. Also, I’m going to make an ELF64 instead. There is also this post in StackOverflow where they also try to make the smallest possible ELF64.

We could make our executable by typing it in a hex editor. But there are other more convenient ways.

In the resources just mentioned, they use NASM with a special flag -f bin. With this flag, only what is specified in the .asm source file will appear in the output binary file (when we compile normally, NASM creates for you the ELF headers and so on). This is what one of this .asm files would look like (copied from the forementioned StackOverflow post):

bits 64
org 0x08048000

ehdr:                                     ; Elf64_Ehdr
db  0x7F, "ELF", 2, 1, 1, 0   ;   e_ident
times 8 db  0
dw  2                         ;   e_type
dw  62                        ;   e_machine
dd  1                         ;   e_version
dq  _start                    ;   e_entry
dq  phdr - $$; e_phoff dq 0 ; e_shoff dd 0 ; e_flags dw ehdrsize ; e_ehsize dw phdrsize ; e_phentsize dw 1 ; e_phnum dw 0 ; e_shentsize dw 0 ; e_shnum dw 0 ; e_shstrndx ehdrsize equ  - ehdr phdr: ; Elf64_Phdr dd 1 ; p_type dd 7 ; p_flags dq 0 ; p_offset dq$$                        ;   p_vaddr
dq  ; p_paddr dq filesize ; p_filesz dq filesize ; p_memsz dq 0x1000 ; p_align phdrsize equ  - phdr _start: mov di,42 ; only the low byte of the exit code is kept, ; so we can use di instead of the full edi/rdi xor eax,eax mov al,60 ; shorter than mov eax,60 syscall ; perform the syscall filesize equ  -


And we can compile it with:

nasm -f bin min64.asm -o min64


But instead of using NASM for generating the executable, I opted to use a small C program that writes a binary file. Because in the future I want to make a compiler in C that doesn’t require an assembler.

#include <stdio.h>
#include <string.h>
#include <stdint.h>

typedef uint8_t u8;
typedef uint16_t u16;
typedef uint32_t u32;
typedef uint64_t u64;

typedef struct Elf64Header {
char elfMagicNumber[4]; // "\x7fELF"
u8 bitAmount; // 1: 32-bit, 2: 64-bit
u8 endian; // 1: little endian, 2: big endian
u8 elfVersion1; // must be 1
u8 osAbi; // 0: System V, 3: Linux
u8 abiVersion; // in statically linked executables has no effect. In dynamically linked executables, if OS_ABI==3, defines dynamic linker features
u8 unused[7];
u16 objFileType; // ET_EXEC=2, ET_DYN=3 (DYN is used for PIC)
u16 arch; // 0x3E: AMD64
u32 elfVersion2; // the elf version again, must be 1
u64 entryPointOffset; // entry point from where the process should start executing
u64 phtOffset; // start of the program header table
u64 shtOffset; // start of the section header table
u32 processorFlags; // processor-specific flags
u16 headerSize; // the size of this header
u16 phtEntrySize; // the size of one PHT entry
u16 numPhtEntries; // num entries in the PHT
u16 shtEntrySize; // the size of one SHT entry
u16 numShtEntries; // num entries in the SHT
u16 namesSht; // the index of the SHT entry that contains the section names
} Elf64Header;

typedef struct Elf64_PhtEntry {
u32 segmentType; // 1: loadable segment, 2: dynamic linking info
u32 flags; // segment-dependent flags (position for 64-bit structure)
u64 offset; // offset of the segment in the file image
u64 vaddr; // virtual address of the segment in memory
u64 paddr; // on systems where the physical address is relevant, reserved for the physical address of the segment
u64 sizeInFile; // size of the segment in the file image
u64 sizeInMem; // size of the segment in memory
u64 align; // 0 and 1 specify no alignment. Otherwise should be a positive, integral power of 2, with 'vaddr' equating 'offset' modulus 'p_align'
} Elf64_PhtEntry;

// https://godbolt.org/z/vh8aEe
const unsigned char asmCode[] = {
0xb8, 0x01, 0x00, 0x00, 0x00, // mov rax, 1 (syscall: write)
0xbf, 0x01, 0x00, 0x00, 0x00, // mov rdi, 1 (stdout)
0x48, 0x8d, 0x35, 0x10, 0x0, 0x0, 0x0, // lea rsi, [rel helloStr]
0xba, 0x07, 0x00, 0x00, 0x00, // mov rex, sizeof(helloStr)
0x0f, 0x05, // syscall
0xb8, 0x3c, 0x00, 0x00, 0x00, // mov eax, 3c
0x48, 0x31, 0xff, // xor rdi, rdi
0x0f, 0x05 // syscall
};
const char helloStr[] = {'h', 'e', 'l', 'l', 'o', '\n'};

int main()
{
const int headersSize = sizeof(Elf64Header) + sizeof(Elf64_PhtEntry);
const int fileSize = headersSize + sizeof(asmCode) + sizeof(helloStr);

Elf64Header header = {
.elfMagicNumber = {0x7F, 'E', 'L', 'F'},
.bitAmount = 2, // 64-bit
.endian = 1, // little endian
.elfVersion1 = 1,
.osAbi = 0, // system V
.abiVersion = 0,
.unused = {0,0,0,0,0,0,0},
.objFileType = 3,
.arch = 0x3E,
.elfVersion2 = 1,
.entryPointOffset = 0x400000 + headersSize,
.phtOffset = sizeof(Elf64Header),
.shtOffset = 0,
.processorFlags = 0,
.headerSize = 64,
.phtEntrySize = sizeof(Elf64_PhtEntry),
.numPhtEntries = 1,
.shtEntrySize = 0,
.numShtEntries = 0, //1,
.namesSht = 0
};

Elf64_PhtEntry phtEntry = {
.segmentType = 1, // 1: PT_LOAD
.flags = 0x7, // 0: execute, 1: write, 2: read
.offset = 0,
// this one is quite weird. It looks like the entire executable need to be inside
// some segment, including the Elf64Header
.vaddr = 0x400000, // linux likes this address for x64
.paddr = 0x400000,
.sizeInFile = fileSize,
.sizeInMem = fileSize,
.align = 0x1000
};

printf("%ld\n", sizeof(Elf64Header));
printf("%ld\n", sizeof(Elf64_PhtEntry));
printf("%x\n", fileSize-7);

FILE* file = fopen("raw_exe", "w");

fwrite(&header, 1, sizeof(header), file);
fwrite(&phtEntry, 1, sizeof(phtEntry), file);
fwrite(asmCode, 1, sizeof(asmCode), file);
fwrite(helloStr, 1, sizeof(helloStr), file);

fclose(file);
}


Inside the Program header, I struggled a lot with the p_offset field (and I’m not the only one). If you didn’t notice, the value of p_offset is 0. But the spec says:

This member gives the offset from the beginning of the file at which the first byte of the segment resides

And our executable layout looks like this:

Elf64Header 64 bytes
Program Header Entry 56 bytes
Program Segment

So shouldn’t p_offset be 64+56=120?

That would make sense but there is a little problem. The ELF spec says the following about p_align:

This member holds the value to which the segments are aligned in memory and in the file. Loadable process segments must have congruent values for p_vaddr and p_offset, modulo the page size. Values of zero and one mean no alignment is required. Otherwise, p_align should be a positive, integral power of two, and p_vaddr should equal p_offset, modulo p_align.

The key is in “p_vaddr should equal p_offset, modulo p_align”. In our case, p_vaddr is 0x400000 which is the usual address for x64 processes3. If we had p_offset==120, the alignment rule would be violated, since we have a p_align of 0x1000.

And the obvious question is: why we don’t just decrease the alignment? … The Linux kernel has a minimum alignment that is usually the page size. I believe that the page size is usually 4KB. So we would have to add padding after the headers to align up to 4KB. That would make the executable much bigger; therefore we opted to just include the headers in the segment, and set the entry point to 0x400000 + 120.

But there is something better we can do. We said that 0x400000 is the usual load address for x64 processes. But it turns out it’s not mandatory. We can set p_vaddr to 0x400078 so it matches the p_offset, modulo 0x1000. In this way, the executable is small and we don’t load the headers unnecessarily.

In order to figure this out I had to ask in StackOverflow.

Here’s the final code

Other interesting resources:

>> Home