Assembly language programming is something most programmers, hackers, and security researchers understand the basic principles of, but not all developers have direct experience implementing. In prior decades when C compilers were not as efficient, and higher level languages like BASIC couldn’t keep up an acceptable level of performance for certain tasks, being able to get down close to the hardware level by writing assembly code (essentially human-readable machine code) was a useful skill.
Nowadays there’s a reasonable argument that in most cases assembly isn’t necessary for software development, and in many cases modern compilers can output more efficient assembly code than the vast majority of programmers. Still, if you are serious about understanding how computers work at a low level, and therefore how potential vulnerabilities and exploits can occur on different hardware and operating systems, a basic working knowledge of assembly is essential.
With assembly, we can get right down to the level of writing data to registers in a processor. If programming for bare metal, writing assembly can help us write our own kernals and system calls. We won’t be doing bare metal today though. Instead, I’m hoping this tutorial will be accessible to a number of coders who might not have had a ton of experience with low level developement. For this tutorial we will be writing a basic hello world program for x86-64 Intel processors on computers running MacOS. There are a number of great tutorials out there on x86-64 assembly but many of them target computers running a version of Linux, which while I love, simply isn’t as popular or widely used as MacOS or Windows.
The term “Assembly Language” generally refers to some form of human-readable machine code. But unlike C/C++, or high level scripting languages like python and javascript, there are several types of distinct assembly languages. That’s because there are several types of hardware architectures for processors. Along with this, there are different forms of syntax that have been invented by the various hardware vendors and there is no unified single standard. Similarly, there can be operating system specific code in assembly. This may sound strange since we've already stated that assembly is largely tied to the hardware, but unless you are writing for bare metal, assembly code will typically include system calls that are OS specific. Of course, system calls don’t come from a vacuum. They can be written themselves in assembly (or C and other languages) by the creators of the operating system, though that is beyond the scope of this tutorial. If you are interested in learning about creating kernals and basic operating systems, here is a relatively accessible tutorial on writing a bare metal operating system for Raspberry Pi 4 hardware. Let’s now get started with the code.
Follow the steps below to write and assemble a 'Hello World' program. You can also get the code and assembling instructions from the GitHub repo for this tutorial.
Step 1: Create a project directory for our Hello World program. I use a directory called “Projects” for most of my coding related work:
.global start
.intel_syntax noprefix
start:
# Write "Hello World"
mov rax, 0x2000004 # system call 4 (write code)
mov rdi, 1
lea rsi, hello_world[rip]
mov rdx, 12 # Set register rdx to 12
syscall # envoke syscall
# Exit program
mov rax, 0x2000001 # system call 1 (exit code)
mov rdi, 99 # set the exit code to 99
syscall
hello_world: # Definition of hello_world
.asciz "Hello World\n"
At the beginning we have some boilerplate code:
.global start
.intel_syntax noprefix
mov rax, 0x2000004 # system call 4 (write code)
mov rdi, 1
lea rsi, hello_world[rip]
mov rdx, 12 # Set register rdx to 12
syscall # envoke syscall
mov rax, 0x2000001 # system call 1 (exit code)
mov rdi, 99 # set the exit code to 99
syscall