Trying to fit that x64 in one
Why this guide?
A few days ago prior my college exams, I decided to make personal notes for x64 assembly, although I did, thought to make a small blog on all the notes, later I started to work on this guide and all of a sudden encountered this really great guide on x86 assembly for reverse engineering by Sami Alaoui, I would say this is an excellent guide, so this blog is sort of inspired from the above although this guide will not contain much of C code to disassembly but just some basic terminologies in x64, making it easy for beginners, probably on this next set of blogs I will be doing it, or may be an entire set of blogs about reading and understanding disassembly for fun, but anyways let’s dive in!
Contents
- Hello World
- Registers
- Syscalls
- Sections
- Labels
- FLAGS
- LEA
- Jumps(Conditionals & Unconditionals)
- Subroutine & Calls
- Arithmetic
- Stack
- Macros
Hello World
As, equivalent to other programming languages we will first write a simple hello world program and understand what are those weird terminologes.
section .data
message db "My First line in x86_64", 10
section .text
global _start
_start:
mov rax, 1
mov rdi, 1
mov rsi, message
mov rdx, 23
syscall
mov rax, 60
mov rdi, 0
syscall
So, we wrote our first Hello World
, now what are those rax
, rdi
, rsi
??
These above terms are known as Registers, I feel this definition is one of the most crystal clear explanations of what registers actually are in terms of assembly language. Click Here. Once you are familiar with the definition of the term Registers
, we look forward to some of them:
Register(64-BIT) | 32-bit | 16-bit | 8-bit |
---|---|---|---|
RAX | EAX | AX | AL |
RBX | EBX | BX | BL |
RCX | ECX | CX | CL |
RDX | EDX | DX | DL |
RSI | ESI | SI | SIL |
RDI | EDI | DI | DIL |
RBP | EBP | BP | BPL |
RSP | ESP | SP | SPL |
R8 | R8D | R8W | R8B |
R9 | R9D | R9W | R9B |
R10 | R10D | R10W | R10B |
R11 | R11D | R11W | R11B |
R12 | R12D | R12W | R12B |
R13 | R13D | R13W | R13B |
R14 | R14D | R14W | R14B |
R15 | R15D | R15W | R15B |
Bonus : Wait, what is this (E) prefix and (D), (w) , (B) suffixes etc ..?
-> The prefix E stands for E
xtended versions of 16-bit registers, that is we are given extra 16 bits along with the 16-bit registers, (X) also can be termed as Ex
tended or implying 16 as in hexadecimal, the X suffixed registers are the extension of 8-bit registers, that is we are given extra 8 bits along with the 8-bit registers, the (L) suffix in 8-bit registers mean low. Also along with these some additional suffixes you might encounter (B) which also mean”LOW” , please remember L is not long here, the suffix (D) mean double, and last but not the least (W) stands for word, as you probably be knowing a word takes up 16 bits and a Double takes up 32 bits. Hopefully the confusion regarding the suffix and prefixes are clear :)
Citations :
What is this mov
stuff ?
-> MOV here is an instruction that moves contents from the first operand to second operand the contents can sometimes be memory addresses, registers contents, or sometimes value such as in the above code snippet we see mov rax, 1
. Let us simplify it a bit:
mov eax, ecx ;moves contents of ecx into eax
mov [some_memory_address], eax ; moves the contents of eax into [some_memory_address]
mov eax, [esi+2] ; move two bytes at memory address *(esi+2) into eax.
There are lots of other instructions which are present. You can check this out for a dedicated place for other instructions for x86_64 architecture set.
What are these .data & .text section
-> The text section is the region which contains the instructions and the .data
section is the region where data elements are stored or the non-stack based variables and constants lie.
Cited from Stack Overflow
What is this term syscall and why are we moving random values like 1
, message
to the registers ?
-> In layman words using a syscall instruction means we are asking the operating system to perform some task as requested may be like read, write, sendfile, or get the current process ID of a running process, there are lots of them which can be held accountable of doing these tasks. Now let us understand this from the above example:
mov rax, 60 ;first argument
mov rdi, 0 ;first and only argument
syscall ;invoking of syscall
Syscalls apparently take arguments, now these registers are accountable for holding these arguments but which of them and whow can you know them?
Argument | Registers |
---|---|
ID | RAX |
1st Argument | RDI |
2nd Argument | RSI |
3rd Argument | RDX |
4th Argument | R10 |
5th Argument | R8 |
6th Argument | R9 |
Now as per our small program, we can see that the RAX
stores the syscall ID which is 60
here, a bit of googling and we land up that the name of this syscall is exit()
syscall which is responsible for termination of the program and it takes only argument that is the exit error code .
void _exit(int status)
Here, the value which is stored in the rdi
register is zero, so here it just exists with no error :-) . Wait we did not explain the first part of program where the syscall ID is 1
, I will leave this upto you to figure out the arguments, just as a small hint the syscall with ID = 1 is sys_write
.
What actually is the _start
?
-> This is a label a part of code which is assigned the current value of the active location, during assembly. I would also suggest checking this definition out which I found as one of the easy to understand definition regarding labels. Labels are mostly of two types
-
Symbolic Labels: The ones which consists of an identifier or symbols in layman terms a
name
followed by a colon:
, they are defined only once. For example we have the label in our code_start
which is a symbolic label. -
Numeric Labels : The ones which contains of single digit from the range [0-9] also followed by a colon.
Citations: Docs
### What if we rename the label start to something else ?
Yes, when we are writing our first program, we might be curious why not change the _start to something else ? But wait it pops up with an error cannot find entry symbol _start
, check this out to know why ? Check Out.
Finally , we completed understanding each and every part of our first basic program , now we will move ahead with understanding Jumps, calls, comparision, subroutines, stack, macros !
section .data
message db "We will now go through Jumps", 10
jumped_message dw "Jump has successfully taken place", 10
rcxval db "Test", 10
section .text
global _start
_start:
mov rax, 1
mov rdi, 1
mov rsi, message
mov rdx, 29
syscall
xor eax, eax
cmp eax, 0
je _jumpexecute
_exit:
mov rax, 60
mov rdi, 0
syscall
_jumpexecute:
mov rax, 1
mov rdi, 1
mov rsi, jumped_message
mov rdx, 35
syscall
mov eax, 0
cmp eax, 0
je _valueofrcx
_valueofrcx:
mov rax, 1
mov rdi, 1
lea rcx, [rcxval]
mov rsi,rcx
mov rdx, 5
syscall
mov eax, 0
cmp eax, 0
je _exit
Before we move ahead, we will modify our previous hello world program a bit, assemble it with NASM and then link it which will print us the message:
We will now go through Jumps
Jump has successfully taken place
Test
Explanation:
So, before moving forward to understanding the above code, let us understand EFLAGS
. Basically just like other registers EFLAGS
denote status in form of boolean values, and this boolean values does get assigned to these special registers based on conditions in program like sometimes it denotes arithmetic carry other times it denotes if the result of the operation evaluates to zero.
FLAGS:
eflags 0x246 [ PF ZF IF ]
Let us take an instance of the above snippet from GDB which we found while debugging the above program, here PF
stands Parity flag, as the result of previous operation and the set of bits is even, had it been odd the PF flag wouldn’t have been set, then comes ZF
which stands for Zero Flag, which means the previous arithmetic result is zero in simple terms here in the above program this is mainly being used because depending on the the zero flag’s value the flow of program, just in case in our program during je _jumpexecute
ZF is set to 1 so that it jumps to some _label
. Then finally, IF
which which denotes that it will recognize interrupt requests from the peripherals. The flags are 32-bits wide and their successor EFLAGS are 32 bit in wide, but wait ? aren’t there any successor of EFLAGS? Like RFLAGS
may be ?
-Yes, you are correct, but we mostly access the 32 bits, and the lower 16 bits obviously, for RFLAGS
which stands for Reserved
are reserved for future CPUs, which means software should never set these bits and software should not rely on the value of these bits (so that software doesn’t break on future CPUs if/when new features are added to the ISA and the bits are actually used for something new).
Citation :
LEA
Now, let’s move forward to something new, in the previous code snippets we have encountered mov
, xor
but wait? what’s Lea
?
and what is lea rcx, [rcxval]
? LEA stands for Load Effective Address, in layman words loads the address of the value on the right to the one on the left register. Let us take up an example of two labels:
_jumpexecute:
mov rax, 1
mov rdi, 1
mov rsi, jumped_message ;jumped_message = Jump has successfully taken place
mov rdx, 35
syscall
mov eax, 0
cmp eax, 0
je _valueofrcx
VS
_valueofrcx:
mov rax, 1
mov rdi, 1
lea rcx, [rcxval] ; &rcxval = rcx
mov rsi,rcx
mov rdx, 5
syscall
mov eax, 0
cmp eax, 0
je _exit
The both above labels do the same thing printing a small message, but in the first snippet mov rsi, jumped_message
will actually move the address into rsi, as an argument of sys_write
whereas, lea rcx, [rcxval]
this moves the effective value into rcx , then the address is dereferenced and stored inside rsi
or in simple words the content of rcx is now inside the rsi
using the instruction mov rsi, rcx
. I would suggest you to try out mov rcx, [somevariable]
& mov rcx, somevariable
! The first will dereference the address of somevariable
and store the contents inside rcx
, whereas the second one will store the address inside rcx
.
Citation: Stack Overflow
Jumps
In the above paragraph, we described a bit about zero flags, and yes do really affect the flow of the program, we see je _exit
, what actually is je
?
Conditional Jumps
JE stands for Jump if equal, which means the ZF
is set to 1
and how this takes place if the value on the destination
minus value on the source
yields to 0
as it would lead to setting of ZF and finally the jump will take place to a certain label named _exit
. Let us understand in a more simple way :
mov eax, 0
This moves 0 into eax, so the value of eax is now 0 after this instruction, then
cmp eax, 0
As, eax is already equal to 0
& we are comparing with 0
therefore 000000- 000000
definitely will yield 0
which sets the zero flag to 1
as it’s true and flow of program now is passed onto _exit
.
Let us now check this code out, which demonstrates setting of OF
or Overflow flag and execution of program is passed if and only if Overflow Flag is set to 1
section .data
value db 40
jumped_message db "JUMPED", 10
section .text
global _start
_start:
mov al, 46
inc al
add al, 79
add al, [value]
mov ah, al
add al, ah
jo _jumpexecute ; OF = 1
_jumpexecute:
mov rax, 1
mov rdi, 1
mov rsi, jumped_message
mov rdx, 7
syscall
mov eax, 0
cmp eax, 0
je _exit
_exit:
mov rax, 60
mov rdi, 0
syscall
In the above code the overflow flag is set, because an arithmetic overflow has occured in the operation, and then the jo
indicated that if overflow flag is set then jump to _jumpexecute and print the message JUMPED
!!
If you wish to check out more awesome examples, check this out!
Unconditional Jumps
This type of jumps are performed using the JMP
instruction, they are mostly used to jump to an address or an label without depending on certain conditions like is it set to zero and many others.
1 section .data
2 ;message db "Hello World!" ,10
3 greet dw "Hey there, What is your name? ", 10
4 text2 dw "My name is, ", 10
5 askage dw "Hey what's your age? ", 10
6 soage dw "So your age is, ", 10
7 section .bss
8 yourname resb 20
9 age resb 5
10 section .text
11 global _start
12 _start:
13 call _greet2
14 call _getthename
15 call _text2
16 call _printname
17 call _asktheage
18 call _getage
19 call _text3
20 call _printage
21 call _exit
22 _greet2:
23 mov rax, 1
24 mov rdi, 1
25 mov rsi, greet
26 mov rdx, 31
27 syscall
28 ret
29 _getthename:
30 mov rax, 0
31 mov rdi, 0
32 mov rsi, yourname
33 mov rdx, 20
34 syscall
35 ret
36 _text2:
37 mov rax, 1
38 mov rdi, 1
39 mov rsi, text2
40 mov rdx, 11
41 syscall
42 ret
43 _printname:
44 mov rax, 1
45 mov rdi, 1
46 mov rsi, yourname
47 mov rdx, 20
48 syscall
49 ret
50 _asktheage:
51 mov rax, 1
52 mov rdi, 1
53 mov rsi,askage
54 mov rdx,22
55 syscall
56 ret
57 _getage:
58 mov rax, 0
59 mov rdi, 0
60 mov rsi,age
61 mov rdx, 5
62 syscall
63 ret
64 _text3:
65 mov rax, 1
66 mov rdi, 1
67 mov rsi, soage
68 mov rdx, 17
69 syscall
70 ret
71 _printage:
72 mov rax, 1
73 mov rdi, 1
74 mov rsi, age
75 mov rdx, 5
76 syscall
77 ret
78 _exit:
79 mov rax, 60
80 mov rdi, 0
81 syscall
Subroutines & Calls:
The above program is quite lengthy compared to the previous ones, also we have a new term ret
, so what actually are this. Basically, these are known as procedures or subroutines,
and the terminator ret
transfers the program flow to the call
once the block of code is executed, just as an example, if we see the _start
label, then we call _greet2
this subroutine passes the flow of the program to :
_greet2:
mov rax, 1
mov rdi, 1
mov rsi, greet
mov rdx, 31
syscall
ret
Then after the message “Hey there, What is your name?” is printed it’s returned to _start
label and further execution is continued. Therefore just for understanding call
is somehow an unconditional jump to the specific subroutine we actually want our code to jump to. Just to not leave the reader confused, if you see the _getage
subroutine, you can see a different value i.e 0
is moved inside rax which means we are using sys_read
to read an input from the user and store it and finally print it using the _printage
subroutine.
Arithmetic:
section .data
;message db "Hello World", 10
initialvariable db 0, 10
section .text
global _start
_start:
call _addregisters
call _substractregisters
call _exit
_addregisters:
mov rax, 25
add rax, 25
add rax,2
mov [initialvariable], rax
mov rax, 1
mov rdi, 1
mov rsi, initialvariable
mov rdx, 2
syscall
ret
_substractregisters:
mov rax, 2
sub [initialvariable] , rax
mov rax, 1
mov rdi, 1
mov rsi, initialvariable
mov rdx, 2
syscall
ret
_exit:
mov rax, 60
mov rdi, 0
syscall
Finally, after understanding label and calls in the above program, we add two subroutines, one for adding up values and other for substracting the values, as usual the variable is declared zero as it’s initial value then three subroutines _addregisters
, _substractregisters
_exit
are called, now :
_addregisters:
mov rax, 25
add rax, 25
add rax,2
mov [initialvariable], rax
mov rax, 1
mov rdi, 1
mov rsi, initialvariable
mov rdx, 2
syscall
ret
The value 25 is moved inside rax
, then using add
instructions more 25 is added, and finally 2 is added to it, now rax holds the value 52 which is a decimal number, and that equals to the character 4
, check out the table here, therefore then using the write syscall, we print the value of initialvariable which turns out to be 4. Then using the terminator ret
we pass the control of the program back to the _start
label , now:
_substractregisters:
mov rax, 2
sub [initialvariable] , rax
mov rax, 1
mov rdi, 1
mov rsi, initialvariable
mov rdx, 2
syscall
ret
This part is just quite simple, the value 2 is moved inside rax
then, it is susbtracted from the value in the initialvariable
which is 52 - 2 = 50
, therefore now the variable holds 50, which is then printed out using the write syscall, which turns out to be printing 2
which is character equivalent and then finally terminating it with the ret
.
Stack:
A stack is a data structure which is used to store and remove data in the memory using PUSH
& POP
instructions, the first to push an address or register or operand and the second to remove it from the top of the stack, the RSP
points towards the stack, the RIP
stores the next instruction to be performed and the RBP
points towards the base of the stack, stack grows in reverse direction but what does this mean? In layman words the Last Input
will be the First Output
means the first one to be removed just assume it this way :
1
_______________
2
--------------
3
--------------
4
--------------
5
---------------
Here the last input which is 5
will get removed from the top the stack before the others, this was just a basic example of working of stack.
Macros:
section .data
message db "Hello World", 10
section .text
global _start
%macro printmessage 1
mov rax, 1
mov rdi, 1
mov rsi, message
mov rdx, 11
syscall
%endmacro
%macro exit 0
mov rax, 60
mov rdi, 0
syscall
%endmacro
_start:
printmessage 4
exit
In the above code snippet we see some weird terms such as %macro
so what are they?
Let us understand macros in this way, imagine we could bundle a bunch of instructions and use it in various parts of program without using call
,and using that single instruction
only if you are looking for something similar macro does the exact work, let us take a small example from the above code:
%macro printmessage 1
mov rax, 1
mov rdi, 1
mov rsi, message
mov rdx, 11
syscall
%endmacro
Here, the printmessage macro is being declared, which does the work of printing the message “Hello World” , we just need to use the instruction or in simple words name of the macro in the start label ! You can define your own macro this way:
%macro <nameofmacro> <numberofargs>
your bunch of instructions
%endmacro
;This can be used inside your code just need to use the nameofmacro.
Conclusion:
Therefore, we have come to the end of this blog, this were just some very basic guide which could help people to understand x64 assembly using NASM and may be sometimes helpful to reverse ELF binaries by making it a bit easy to understand the disassembly, although I just covered 40% of the entire x64 beginners guide but I think this is enough to get someone started. Hopefully you will find this helpful, if you find any issues in this blog please reach me out at Discord ElementalX#3463 , I would be more than happy to improve and correct myself. Thanks to this awesome youtube channel who gave me a layout to write this blog. Good day ahead! Thank you for your time.