Why this guide?
A few days ago prior my college exams, I decided to make personal notes for x64 assembly, although I did, thought to make a small blog on all the notes, later I started to work on this guide and all of a sudden encountered this really great guide on x86 assembly for reverse engineering by Sami Alaoui, I would say this is an excellent guide, so this blog is sort of inspired from the above although this guide will not contain much of C code to disassembly but just some basic terminologies in x64, making it easy for beginners, probably on this next set of blogs I will be doing it, or may be an entire set of blogs about reading and understanding disassembly for fun, but anyways let’s dive in!
- Hello World
- Jumps(Conditionals & Unconditionals)
- Subroutine & Calls
As, equivalent to other programming languages we will first write a simple hello world program and understand what are those weird terminologes.
section .data message db "My First line in x86_64", 10 section .text global _start _start: mov rax, 1 mov rdi, 1 mov rsi, message mov rdx, 23 syscall mov rax, 60 mov rdi, 0 syscall
So, we wrote our first
Hello World , now what are those
These above terms are known as Registers, I feel this definition is one of the most crystal clear explanations of what registers actually are in terms of assembly language. Click Here. Once you are familiar with the definition of the term
Registers, we look forward to some of them:
Bonus : Wait, what is this (E) prefix and (D), (w) , (B) suffixes etc ..?
-> The prefix E stands for
Extended versions of 16-bit registers, that is we are given extra 16 bits along with the 16-bit registers, (X) also can be termed as E
xtended or implying 16 as in hexadecimal, the X suffixed registers are the extension of 8-bit registers, that is we are given extra 8 bits along with the 8-bit registers, the (L) suffix in 8-bit registers mean low. Also along with these some additional suffixes you might encounter (B) which also mean”LOW” , please remember L is not long here, the suffix (D) mean double, and last but not the least (W) stands for word, as you probably be knowing a word takes up 16 bits and a Double takes up 32 bits. Hopefully the confusion regarding the suffix and prefixes are clear :)
What is this
mov stuff ?
-> MOV here is an instruction that moves contents from the first operand to second operand the contents can sometimes be memory addresses, registers contents, or sometimes value such as in the above code snippet we see
mov rax, 1 . Let us simplify it a bit:
mov eax, ecx ;moves contents of ecx into eax mov [some_memory_address], eax ; moves the contents of eax into [some_memory_address] mov eax, [esi+2] ; move two bytes at memory address *(esi+2) into eax.
There are lots of other instructions which are present. You can check this out for a dedicated place for other instructions for x86_64 architecture set.
What are these .data & .text section
-> The text section is the region which contains the instructions and the
.data section is the region where data elements are stored or the non-stack based variables and constants lie.
Cited from Stack Overflow
What is this term syscall and why are we moving random values like
message to the registers ?
-> In layman words using a syscall instruction means we are asking the operating system to perform some task as requested may be like read, write, sendfile, or get the current process ID of a running process, there are lots of them which can be held accountable of doing these tasks. Now let us understand this from the above example:
mov rax, 60 ;first argument mov rdi, 0 ;first and only argument syscall ;invoking of syscall
Syscalls apparently take arguments, now these registers are accountable for holding these arguments but which of them and whow can you know them?
Now as per our small program, we can see that the
RAX stores the syscall ID which is
60 here, a bit of googling and we land up that the name of this syscall is
exit() syscall which is responsible for termination of the program and it takes only argument that is the exit error code .
void _exit(int status)
Here, the value which is stored in the
rdi register is zero, so here it just exists with no error :-) . Wait we did not explain the first part of program where the syscall ID is
1 , I will leave this upto you to figure out the arguments, just as a small hint the syscall with ID = 1 is
What actually is the
-> This is a label a part of code which is assigned the current value of the active location, during assembly. I would also suggest checking this definition out which I found as one of the easy to understand definition regarding labels. Labels are mostly of two types
Symbolic Labels: The ones which consists of an identifier or symbols in layman terms a
namefollowed by a colon
:, they are defined only once. For example we have the label in our code
_startwhich is a symbolic label.
Numeric Labels : The ones which contains of single digit from the range [0-9] also followed by a colon.
### What if we rename the label start to something else ?
Yes, when we are writing our first program, we might be curious why not change the _start to something else ? But wait it pops up with an error
cannot find entry symbol _start , check this out to know why ? Check Out.
Finally , we completed understanding each and every part of our first basic program , now we will move ahead with understanding Jumps, calls, comparision, subroutines, stack, macros !
section .data message db "We will now go through Jumps", 10 jumped_message dw "Jump has successfully taken place", 10 rcxval db "Test", 10 section .text global _start _start: mov rax, 1 mov rdi, 1 mov rsi, message mov rdx, 29 syscall xor eax, eax cmp eax, 0 je _jumpexecute _exit: mov rax, 60 mov rdi, 0 syscall _jumpexecute: mov rax, 1 mov rdi, 1 mov rsi, jumped_message mov rdx, 35 syscall mov eax, 0 cmp eax, 0 je _valueofrcx _valueofrcx: mov rax, 1 mov rdi, 1 lea rcx, [rcxval] mov rsi,rcx mov rdx, 5 syscall mov eax, 0 cmp eax, 0 je _exit
Before we move ahead, we will modify our previous hello world program a bit, assemble it with NASM and then link it which will print us the message:
We will now go through Jumps Jump has successfully taken place Test
So, before moving forward to understanding the above code, let us understand
EFLAGS . Basically just like other registers
EFLAGS denote status in form of boolean values, and this boolean values does get assigned to these special registers based on conditions in program like sometimes it denotes arithmetic carry other times it denotes if the result of the operation evaluates to zero.
eflags 0x246 [ PF ZF IF ]
Let us take an instance of the above snippet from GDB which we found while debugging the above program, here
PF stands Parity flag, as the result of previous operation and the set of bits is even, had it been odd the PF flag wouldn’t have been set, then comes
ZF which stands for Zero Flag, which means the previous arithmetic result is zero in simple terms here in the above program this is mainly being used because depending on the the zero flag’s value the flow of program, just in case in our program during
je _jumpexecute ZF is set to 1 so that it jumps to some
_label . Then finally,
IF which which denotes that it will recognize interrupt requests from the peripherals. The flags are 32-bits wide and their successor EFLAGS are 32 bit in wide, but wait ? aren’t there any successor of EFLAGS? Like
RFLAGS may be ?
-Yes, you are correct, but we mostly access the 32 bits, and the lower 16 bits obviously, for
RFLAGS which stands for
Reserved are reserved for future CPUs, which means software should never set these bits and software should not rely on the value of these bits (so that software doesn’t break on future CPUs if/when new features are added to the ISA and the bits are actually used for something new).
Now, let’s move forward to something new, in the previous code snippets we have encountered
xor but wait? what’s
and what is
lea rcx, [rcxval] ? LEA stands for Load Effective Address, in layman words loads the address of the value on the right to the one on the left register. Let us take up an example of two labels:
_jumpexecute: mov rax, 1 mov rdi, 1 mov rsi, jumped_message ;jumped_message = Jump has successfully taken place mov rdx, 35 syscall mov eax, 0 cmp eax, 0 je _valueofrcx
_valueofrcx: mov rax, 1 mov rdi, 1 lea rcx, [rcxval] ; &rcxval = rcx mov rsi,rcx mov rdx, 5 syscall mov eax, 0 cmp eax, 0 je _exit
The both above labels do the same thing printing a small message, but in the first snippet
mov rsi, jumped_message will actually move the address into rsi, as an argument of
lea rcx, [rcxval] this moves the effective value into rcx , then the address is dereferenced and stored inside
rsi or in simple words the content of rcx is now inside the
rsi using the instruction
mov rsi, rcx . I would suggest you to try out
mov rcx, [somevariable] &
mov rcx, somevariable! The first will dereference the address of
somevariable and store the contents inside
rcx, whereas the second one will store the address inside
Citation: Stack Overflow
In the above paragraph, we described a bit about zero flags, and yes do really affect the flow of the program, we see
je _exit , what actually is
JE stands for Jump if equal, which means the
ZF is set to
1 and how this takes place if the value on the
destination minus value on the
source yields to
0 as it would lead to setting of ZF and finally the jump will take place to a certain label named
_exit. Let us understand in a more simple way :
mov eax, 0
This moves 0 into eax, so the value of eax is now 0 after this instruction, then
cmp eax, 0
As, eax is already equal to
0 & we are comparing with
000000- 000000 definitely will yield
0 which sets the zero flag to
1 as it’s true and flow of program now is passed onto
Let us now check this code out, which demonstrates setting of
OF or Overflow flag and execution of program is passed if and only if Overflow Flag is set to
section .data value db 40 jumped_message db "JUMPED", 10 section .text global _start _start: mov al, 46 inc al add al, 79 add al, [value] mov ah, al add al, ah jo _jumpexecute ; OF = 1 _jumpexecute: mov rax, 1 mov rdi, 1 mov rsi, jumped_message mov rdx, 7 syscall mov eax, 0 cmp eax, 0 je _exit _exit: mov rax, 60 mov rdi, 0 syscall
In the above code the overflow flag is set, because an arithmetic overflow has occured in the operation, and then the
jo indicated that if overflow flag is set then
jump to _jumpexecute and print the message JUMPED!!
If you wish to check out more awesome examples, check this out!
This type of jumps are performed using the
JMP instruction, they are mostly used to jump to an address or an label without depending on certain conditions like is it set to zero and many others.
1 section .data 2 ;message db "Hello World!" ,10 3 greet dw "Hey there, What is your name? ", 10 4 text2 dw "My name is, ", 10 5 askage dw "Hey what's your age? ", 10 6 soage dw "So your age is, ", 10 7 section .bss 8 yourname resb 20 9 age resb 5 10 section .text 11 global _start 12 _start: 13 call _greet2 14 call _getthename 15 call _text2 16 call _printname 17 call _asktheage 18 call _getage 19 call _text3 20 call _printage 21 call _exit 22 _greet2: 23 mov rax, 1 24 mov rdi, 1 25 mov rsi, greet 26 mov rdx, 31 27 syscall 28 ret 29 _getthename: 30 mov rax, 0 31 mov rdi, 0 32 mov rsi, yourname 33 mov rdx, 20 34 syscall 35 ret 36 _text2: 37 mov rax, 1 38 mov rdi, 1 39 mov rsi, text2 40 mov rdx, 11 41 syscall 42 ret 43 _printname: 44 mov rax, 1 45 mov rdi, 1 46 mov rsi, yourname 47 mov rdx, 20 48 syscall 49 ret 50 _asktheage: 51 mov rax, 1 52 mov rdi, 1 53 mov rsi,askage 54 mov rdx,22 55 syscall 56 ret 57 _getage: 58 mov rax, 0 59 mov rdi, 0 60 mov rsi,age 61 mov rdx, 5 62 syscall 63 ret 64 _text3: 65 mov rax, 1 66 mov rdi, 1 67 mov rsi, soage 68 mov rdx, 17 69 syscall 70 ret 71 _printage: 72 mov rax, 1 73 mov rdi, 1 74 mov rsi, age 75 mov rdx, 5 76 syscall 77 ret 78 _exit: 79 mov rax, 60 80 mov rdi, 0 81 syscall
Subroutines & Calls:
The above program is quite lengthy compared to the previous ones, also we have a new term
ret , so what actually are this. Basically, these are known as procedures or subroutines,
and the terminator
ret transfers the program flow to the
call once the block of code is executed, just as an example, if we see the
_start label, then we
call _greet2 this subroutine passes the flow of the program to :
_greet2: mov rax, 1 mov rdi, 1 mov rsi, greet mov rdx, 31 syscall ret
Then after the message “Hey there, What is your name?” is printed it’s returned to
_start label and further execution is continued. Therefore just for understanding
call is somehow an unconditional jump to the specific subroutine we actually want our code to jump to. Just to not leave the reader confused, if you see the
_getage subroutine, you can see a different value i.e
0 is moved inside rax which means we are using
sys_read to read an input from the user and store it and finally print it using the
section .data ;message db "Hello World", 10 initialvariable db 0, 10 section .text global _start _start: call _addregisters call _substractregisters call _exit _addregisters: mov rax, 25 add rax, 25 add rax,2 mov [initialvariable], rax mov rax, 1 mov rdi, 1 mov rsi, initialvariable mov rdx, 2 syscall ret _substractregisters: mov rax, 2 sub [initialvariable] , rax mov rax, 1 mov rdi, 1 mov rsi, initialvariable mov rdx, 2 syscall ret _exit: mov rax, 60 mov rdi, 0 syscall
Finally, after understanding label and calls in the above program, we add two subroutines, one for adding up values and other for substracting the values, as usual the variable is declared zero as it’s initial value then three subroutines
_exit are called, now :
_addregisters: mov rax, 25 add rax, 25 add rax,2 mov [initialvariable], rax mov rax, 1 mov rdi, 1 mov rsi, initialvariable mov rdx, 2 syscall ret
The value 25 is moved inside
rax , then using
add instructions more 25 is added, and finally 2 is added to it, now rax holds the value 52 which is a decimal number, and that equals to the character
4 , check out the table here, therefore then using the write syscall, we print the value of initialvariable which turns out to be 4. Then using the terminator
ret we pass the control of the program back to the
_start label , now:
_substractregisters: mov rax, 2 sub [initialvariable] , rax mov rax, 1 mov rdi, 1 mov rsi, initialvariable mov rdx, 2 syscall ret
This part is just quite simple, the value 2 is moved inside
rax then, it is susbtracted from the value in the
initialvariable which is
52 - 2 = 50, therefore now the variable holds 50, which is then printed out using the write syscall, which turns out to be printing
2 which is character equivalent and then finally terminating it with the
A stack is a data structure which is used to store and remove data in the memory using
POP instructions, the first to push an address or register or operand and the second to remove it from the top of the stack, the
RSP points towards the stack, the
RIP stores the next instruction to be performed and the
RBP points towards the base of the stack, stack grows in reverse direction but what does this mean? In layman words the
Last Input will be the
First Output means the first one to be removed just assume it this way :
1 _______________ 2 -------------- 3 -------------- 4 -------------- 5 ---------------
Here the last input which is
5 will get removed from the top the stack before the others, this was just a basic example of working of stack.
section .data message db "Hello World", 10 section .text global _start %macro printmessage 1 mov rax, 1 mov rdi, 1 mov rsi, message mov rdx, 11 syscall %endmacro %macro exit 0 mov rax, 60 mov rdi, 0 syscall %endmacro _start: printmessage 4 exit
In the above code snippet we see some weird terms such as
%macro so what are they?
Let us understand macros in this way, imagine we could bundle a bunch of instructions and use it in various parts of program without using
call ,and using that single
instruction only if you are looking for something similar macro does the exact work, let us take a small example from the above code:
%macro printmessage 1 mov rax, 1 mov rdi, 1 mov rsi, message mov rdx, 11 syscall %endmacro
Here, the printmessage macro is being declared, which does the work of printing the message “Hello World” , we just need to use the instruction or in simple words name of the macro in the start label ! You can define your own macro this way:
%macro <nameofmacro> <numberofargs> your bunch of instructions %endmacro ;This can be used inside your code just need to use the nameofmacro.
Therefore, we have come to the end of this blog, this were just some very basic guide which could help people to understand x64 assembly using NASM and may be sometimes helpful to reverse ELF binaries by making it a bit easy to understand the disassembly, although I just covered 40% of the entire x64 beginners guide but I think this is enough to get someone started. Hopefully you will find this helpful, if you find any issues in this blog please reach me out at Discord ElementalX#3463 , I would be more than happy to improve and correct myself. Thanks to this awesome youtube channel who gave me a layout to write this blog. Good day ahead! Thank you for your time.