Assembly notes

some of these notes may have been created from other online sources.

they are just my home schooling notes.

 

X64-
code:

#include <iostream>
int main()
{
std::cout << “AAAA”;
return 1;
}
———————–

Assembly:
Dump of assembler code for function main:
0x000000000040077d <+0>:     push   %rbp            //store the base pointer
0x000000000040077e <+1>:     mov    %rsp,%rbp
=> 0x0000000000400781 <+4>:     mov    $0x400874,%esi  //65 ‘A’  65 ‘A’  65 ‘A’  65 ‘A’
0x0000000000400786 <+9>:     mov    $0x601060,%edi  //<_ZSt4cout@@GLIBCXX_3.4>
0x000000000040078b <+14>:    callq  0x400680 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt>
0x0000000000400790 <+19>:    mov    $0x1,%eax       //move the value 1 (return value) into eax
0x0000000000400795 <+24>:    pop    %rbp            //restore the base pointers
0x0000000000400796 <+25>:    retq                   //return from function and place old EIP into EIP
——————————————————————————————————————–
System calls

load call number into EAX
load argument 1 into ebx
load argument 2 into ecx
load argument 3 into edx
load argument 4 into esi
load argument 5 into edi
use instruction int 0x80 to call the kernel

mov     eax,59           ; system call number (sys_exit)
int     0x80            ; call kernel

——————————————————————————————————————-
if %eax contains a memory address then (%eax) will retrieve the value located at
that memory address and 8(%eax) will retrieve the value at memory address + 8.

in at&t syntax
When referencing a register, the register needs to be prefixed with a “%”.
Constant numbers need to be prefixed with a “$”.

movl    -4(%ebp, %edx, 4), %eax  # Full example: load *(ebp – 4 + (edx * 4)) into eax
movl    -4(%ebp), %eax           # Typical example: load a stack variable into eax
movl    (%ecx), %edx             # No offset: copy the target of a pointer into a register
leal    8(,%eax,4), %eax         # Arithmetic: multiply eax by 4 and add 8
leal    (%eax,%eax,2), %eax      # Arithmetic: multiply eax by 2 and add eax (i.e. multiply by 3)

——————————————————————————————————————–

data bus = moves data from cpu to other parts of the computer
address bus = define which location to read / write data to / from
control bus = collection of signals that controls how the processor communicates with the resat of the system

rax = 64bit accumulator.  can be used for any integer, boolean, logical , or memory operation
rbx = 64bit base register hold indirect addresses.  can be used for any integer, boolean, logical , or memory operation
rcx = 64bit count register  can be used by repetitive instructions that require counting
rdx = 64bit data register holds io addresses when accessing data on the bus.  can be used for any integer, boolean, logical , or memory operation
rsi = 64bit source index.   used  as a source pointers in instructions that copy memory
rdi = 64bit destination index. used as destination pointers in instructions that copy memory
rbp = 64bit base pointer used to access parameters and local variables in a procedure. usually points to the stack position right after the return address for the current function
rsp = 64bit stack ppointer  stores140737488346744 the current posisition in the stack. anything pushed to the stack gets pushed below this address and this register is updated accordingly

TSS = task state segment When suspending a task the processor automatically saves the state of the EFLAGS register in the task state segment

eax = 32bit accumulator -Main register used in arithmetic caculations. It holds the results of arithmetic operations and function return values.
ebx = 32bit base register – Used to store the base address of the program
ecx = 32bit count register – holds a value representing the number of times a process is repeated. used for loop and string operations
edx = 32bit data register – holds io addresses when accessing data on the bus. also helps extend eax to 64 bits
esi = 32bit source index – used as an offset address in string and array operations. holds the address from where to read data
edi = 32bit destination – index used as an offset address in string and array operations. holds the implied write address of all string operations.
ebp = 32bit base pointer – it points to the bottom of the current stack frame. used to access parameters and local variables in a procedure ( the frame pointer )
esp = 32bit stack pointer – it points to the top of the current stack frame. used to reference local variables.

sam was a name
v was a name
mike was a name
harner was a name
chip was a name
tom was a name
reznor was a name
bob was a name
travis was a name
wolf was a name
james was a name
audra was a name
beth was a name
amd was a cpu
intel was a cpu

eflags register 32 bit
rflags register 64 bit

ax = 16bit accumulator –   Main register used in arithmetic caculations. It holds the results of arithmetic operations and function return values.
bx = 16bit base register – Used to store the base address of the program
cx = 16bit count registerr – holds a value representing the number of times a process is repeated. used for loop and string operations
dx = 16bit data register – holds io addresses when accessing data on the bus.
si = 16bit source index – used as an offset address in string and array operations. holds the address from where to read data
di = 16bit destination index – index used as an offset address in string and array operations. holds the implied write address of all string operations.
bp = 16bit base pointer – it points to the bottom of the current stack frame. used to access parameters and local variables in a procedure ( the frame pointer )
sp = 16bit stack pointer – it points to the top of the current stack frame. used to reference local variables.
cs = 16bit code segment – base location of code section (.text section) points at the segment containing the currently executing machine instructions
ds = 16bit data segment – default location for variables / points at global variables for the program.
es = 16bit extra segment – used during string operations / 8086 programs often use this segment register to gain access to segments when it is difficult or impossible to modify the other segment registers.
ss = 16bit stack segment base location of the stack segment. used when implicitly using SP or ESP, or explicitly using BP or EBP. points at the segment containing the 8086 stack
fs  = 16bit segment register
gs  = 16bit segment register

al = 8bit ax LO byte
ah = 8bit ax HIGH byte
bl = 8bit bx LO byte
bh = 8bit bx HIGH byte
cl = 8bit cx LO byte
ch = 8bit cx HIGH byte
dl = 8bit di LO byte
dh = 8bit di HIGH byte

On Intel architecture the actual parameters have a positive offset from EBP, while local variable have a negative offset from the EBP

CR0 – CR3 = control registers
register which contain the address to a table which translates virtual adresses to physical addresses
page map level4 is the name of the table

little endian = efcdab
big endian    = abcdef

The status flags (bits 0, 2, 4, 6, 7, and 11) of the EFLAGS register indicate the results of arithmetic instructions,
such as the ADD, SUB, MUL, and DIV instructions. The status flag functions are:

CF (bit 0) Carry flag — Set if an arithmetic operation generates a carry or a borrow out of the mostsignificant
bit of the result; cleared otherwise. This flag indicates an overflow condition for
unsigned-integer arithmetic. It is also used in multiple-precision arithmetic.

PF (bit 2) Parity flag — Set if the least-significant byte of the result contains an even number of 1 bits;
cleared otherwise.

AF (bit 4) Adjust flag — Set if an arithmetic operation generates a carry or a borrow out of bit 3 of the
result; cleared otherwise. This flag is used in binary-coded decimal (BCD) arithmetic.

ZF (bit 6) Zero flag — Set if the result is zero; cleared otherwise.

SF (bit 7) Sign flag — Set equal to the most-significant bit of the result, which is the sign bit of a signed
integer. (0 indicates a positive value and 1 indicates a negative value.)

OF (bit 11) Overflow flag — Set if the integer result is too large a positive number or too small a negative
number (excluding the sign-bit) to fit in the destination operand; cleared otherwise. This
flag indicates an overflow condition for signed-integer (two’s complement) arithmetic.

Of these status flags, only the CF flag can be modified directly, using the STC, CLC, and CMC instructions. Also the
bit instructions (BT, BTS, BTR, and BTC) copy a specified bit into the CF flag.

When in 64-bit mode, operand size determines the number of valid bits in the destination general-purpose register:
• 64-bit operands generate a 64-bit result in the destination general-purpose register.
• 32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose
register.
• 8-bit and 16-bit operands generate an 8-bit or 16-bit result. The upper 56 bits or 48 bits (respectively) of the
destination general-purpose register are not modified by the operation. If the result of an 8-bit or 16-bit
operation is intended for 64-bit address calculation, explicitly sign-extend the register to the full 64-bits

The left-most bit is typically called the high order (H.O.) bit.
The right-most bit is typically called the low order bit

signal low / zero = a logic zero
signal high = logic one

if the sinal is low then the cpu is taking that action.
for example if read line is low, the cpu is reading from memory.

1011100101b   a binary number in yasm (b after the number)
Oxa1         a hexidecimal number in yasm (0x before number)

two hexadecimal digits equal one ascii character

cpu accesses 8 bit bytes.
one hexadecimal digit equals 4 bits
it takes 8 binary bits to represent 1 ascii character

the dq command specifies a quad word data item
the dd command specifies a double word data item
the dw command specifies a word data item
the db command specifies a byte data item

byte  = 8bits
word  = 2 bytes / 16 bits
dword = 4 bytes / 32 bits
qword = 8 bytes / 64 bits

signed integers are stored in a format called “two’s complement”. the first bit of a signed integer is the sign bit
if the sign bit is 0 the number is positive
if the sign bit is 1 the number is negative
the bits are stored from right to left, the highest order bit is the sign bit.

101100101010010101010101
|
the sign bit

TODO: FIND reserved unassigned opcodes

INTERRUPTS:
ISR =  interrupt service routine typically saves all the registers and flags  (so that it doesn’t disturb the computation it interrupts),
does whatever operation is necessary to handle the source of the interrupt, it restores the registers and flags , and then it resumes execution
of the code it interrupted
interrupt is an external hardware event ) that causes the CPU to interrupt the current instruction sequence and call a ISR
an interrupt should always end with an “iret” (interrupt return).

; = comment
—————————————————————————————————
segments:

.text   – code goes here
.data   – The .data section is used to store global initialized variables
.bss    – The below stack section (.bss) is used to store global noninitialized variables (meaning they have no value set yet)

yasm -f elf64 -g dwarf2 -l exit.lst exit.asm

-f elf64 64 bit output  which is compatible with linux and gcc

-g dwarf2  selects dwarf2 debugging format for use with debugger

-l exit.lst  asks for a listing file which shows the generated code in hexidecimal

The yasm command produces an object file named exit . o , which
contains the generated instructions and data in a form ready to link with
other code from other object files or libraries.

In the case of an assembly program with the _start function the linking needs to be done with ld

ld -o exit exit.o         produces executable by use of ld (the linker)

-o gives a name to the executable file / Without that option, ld produces a file named a.out

If the assembly program defines main rather than _start, then the linking needs to be done using gcc:
gcc -o exit exit.o

example assembly code file:
—————————
segment .text
global _start

_start:
INVOKE GetProcessHeap

mov eax,1
mov ebx,5
int 0x80
—————————————————————————————————

you can access the registers r8 – r15 as byte, word, double word by appending b, w, or d to the register name

Code: Where instructions are placed. These memory segments are marked readable and executable.

Data: This is where initialized global and static local variables go. The data segment can actually
be multiple segments, some of which are read-only, and some of which are readable and writable. The
read only sections are usually for things like string literals, and possibly for constant variables.

BSS: This stands for “Block Started by Symbol”. Basically, this is for uninitialized globals and static locals.
They’re initialized to 0 (you don’t care about their initial value because they’re uninitialized).
But it’s silly to have an array of 100 int’s specified in the file as 100 4-byte zeros.
So to save space, the array would be stored as something equivalent to “array 400”.
That tells the loader that it should expand array to be 400 bytes, and it sets it all to zero.
This segment is readable and writable.

Heap: This is the pool of memory uses for dynamic allocation (malloc, et al). It is readable and writable.

Stack: This is where function parameters, non-static locals and other function call data is stored
(where to go when the function returns (old instruction pointer), plus the “context” of the calling function
(the old stack and base pointers)). This is readable and writable.

mov ecx, eax     ;moves the address from eax into ecx
mov ecx,[eax]    ;moves the value of the data stored at memory location stored in eax into ecx (not the address it self)

mov ecx,[eax + 1] ; goes to memory address located in eax grabs the value and adds 1 to it, then puts it in ecx

To access an element of an array, you need a function that translates an array index to the address of the
indexed element. For a single dimension array, this function is very simple. It is

Element_Address = Base_Address + ((Index – Initial_Index) * Element_Size)

where Initial_Index is the value of the first index in the array (which you can ignore if zero) and the value Element_Size is the size, in bytes, of an individual element of the array.

CharArray: char[128];// Character array with elements 0..127.
IntArray: integer[ 8 ];// “integer” array with elements 0..7.
ByteArray: byte[10];// Array of bytes with elements 0..9.
PtrArray: dword[4];// Array of double words with elements 0..3.

You can initialize the values of the array in STATIC and READONLY sections as follows:

RealArray: real32[8] := [ 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0 ];
IntegerAry: integer[8] := [ 1, 1, 1, 1, 1, 1, 1, 1 ];

Leave a Reply