Machine language and debugging
Overview
Teaching: 0 min
Exercises: 0 minQuestions
Getting closer to the hardware via machine languages
Objectives
First learning objective. (FIXME)
1. Intel x86 processors
- Dominate laptop/desktop/server market
- Evolutionary design
- Backwards compatible up until 8086, introduced in 1978
- Added more features as time goes on
- x86 is a Complex Instruction Set Computer (CISC)
- Many different instructions with many different formats
- But, only small subset encountered with Linux programs
- Compare: Reduced Instruction Set Computer (RISC)
- RISC: very few instructions, with very few modes for each
- RISC can be quite fast (but Intel still wins on speed!)
- Current RISC renaissance (e.g., ARM, RISC V), especially for low-power
2. Intel x86 processors: machine evolution
Name Date Transistor Counts 386 1985 0.3M Pentium 1993 3.1M Pentium/MMX 1997 4.5M Pentium Pro 1995 6.5M Pentium III 1999 8.2M Pentium 4 2000 42M Core 2 Duo 2006 291M Core i7 2000 42M Core i7 Skylake 2006 291M
- Added features
- Instructions to support multimedia operations
- Instructions to enable more efficient conditional operations (!)
- Transition from 32 bits to 64 bits
- More cores
3. x86 clones: Advanced Micro Devices (AMD)
- Historically
- AMD has followed just behind Intel
- A little bit slower, a lot cheaper
- Then
- Recruited top circuit designers from Digital Equipment Corp. and other downward trending companies
- Built Opteron: tough competitor to Pentium 4
- Developed x86-64, their own extension to 64 bits
- Recent Years
- Intel got its act together
- 1995-2011: Lead semiconductor “fab” in world
- 2018: #2 largest by $$ (#1 is Samsung)
- 2019: reclaimed #1
- AMD fell behind
- Relies on external semiconductor manufacturer GlobalFoundaries
- ca. 2019 CPUs (e.g., Ryzen) are competitive again
- 2020 Epyc
![]()
4. Machine programming: levels of abstraction
Architecture: (alsoISA: instruction set architecture) The parts of a processor design that one needs to understand for writing correct machine/assembly code
- Examples: instruction set specification, registers
Machine Code: The byte-level programs that a processor executesAssembly Code: A text representation of machine codeMicroarchitecture: Implementation of the architecture
- Examples: cache sizes and core frequency
- Example ISAs:
- Intel: x86, IA32, Itanium, x86-64
- ARM: Used in almost all mobile phones
- RISC V: New open-source ISA
5. Assembly/Machine code view
- Machine code (Assembly code) differs greatly from the original C code.
- Parts of processor state that are not visible/accessible from C programs are now visible.
- PC: Program counter
- Contains address of next instruction
- Called
%rip(instruction pointer register)- Register file
- contains 16 named locations (registers), each can store 64-bit values.
- These registers can hold addresses (~ C pointers) or integer data.
- Condition codes
- Store status information about most recent arithmetic or logical operation
- Used for conditional branching (
if/while)- Vector registers to hold one or more integers or floating-point values.
- Memory
- Is seen as a byte-addressable array
- Contains code and user data
- Stack to support procedures
6. Hands on: assembly/machine code example
- Open a terminal (Windows Terminal or Mac Terminal).
- Reminder: It is
podmanon Windows anddockeron Mac. Everything else is the same!.- Launch the container:
Windows:
$ podman run --rm --userns keep-id --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -it -v /mnt/c/csc231:/home/$USER/csc231:Z localhost/csc-container /bin/bashMac:
$ docker run --rm --userns=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -it -v /Users/$USER/csc231:/home/$USER/csc231:Z csc-container /bin/bash
- Inside your
csc231, create another directory called04-machineand change into this directory.- Create a file named
mstore.cwith the following contents:
- Run the following commands It is capital o, not number 0
$ gcc -Og -S mstore.c $ cat mstore.s $ gcc -Og -c mstore.c $ objdump -d mstore.o
- x86_64 instructions range in length from 1 to 15 bytes
- The disassembler determines the assembly code based purely on the byte-sequence in the machine-code file.
- All lines begin with
.are directirves to the assembler and linker.
7. Data format
C data type Intel data type Assembly-code suffix Size char Byte b 1 short Word w 2 int Double word l 4 long Quad word q 8 char * Quad word q 8 float Single precision s 4 double Double precision l 8
8. Integer registers
- x86_64 CPU contains a set of 16
general purpose registersstoring 64-bit values.- Original 8086 design has eight 16-bit registers,
%axthrough%sp.
- Origin (mostly obsolete)
%ax: accumulate%cx: counter%dx: data%bx: base%si: source index%di: destination index%sp: stack pointer%bp: base pointer- After IA32 extension, these registers grew to 32 bits, labeled
%eaxthrough%esp.- After x86_64 extension, these registers were expanded to 64 bits, labeled
%raxthrough%rsp. Eight new registered were added:%r8through%r15.- Instructions can operate on data of different sizes stored in low-order bytes of the 16 registers.
Bryant and O’ Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
9. Assembly characteristics: operations
- Transfer data between memory and register
- Load data from memory into register
- Store register data into memory
- Perform arithmetic function on register or memory data
- Transfer control
- Unconditional jumps to/from procedures
- Conditional branches
- Indirect branches
10. Data movement
- Example:
movq Source, Dest- Note: This is ATT notation. Intel uses
mov Dest, Source- Operand Types:
- Immediate (Imm): Constant integer data.
$0x400,$-533.- Like C constant, but prefixed with
$.- Encoded with 1, 2, or 4 bytes.
- Register (Reg): One of 16 integer registers
- Example:
%rax,%r13%rspreserved for special use.- Others have special uses in particular instructions.
- Memory (Mem): 8 (
qinmovq) consecutive bytes of memory at address given by register.
- Example:
(%rax)- Various other addressing mode (See textbook page 181, Figure 3.3).
- Other
mov:
movb: move bytemovw: move wordmovl: move double wordmovq: move quad wordmoveabsq: move absolute quad word
11. movq Operand Combinations
movqSource Dest Src, Dest C Analog Imm Reg movq $0x4, %raxtmp = 0x4; Imm Mem movq $-147,(%rax)*p = -147; Reg Reg movq %rax,%rdxtmp2 = tmp1; Reg Mem movq %rax,(%rdx)*p = tmp; Mem Reg movq (%rax),%rdxtmp = *p;
12. Simple memory addressing mode
- Normal: (R) Mem[Reg[R]]
- Register R specifies memory address
- Aha! Pointer dereferencing in C
movq (%rcx),%rax- Displacement D(R) Mem[Reg[R]+D]
- Register R specifies start of memory region
- Constant displacement D specifies offset
movq 8(%rbp),%rdx
13. x86_64 Cheatsheet
14. Hands on: data movement
- Create a file named
swap.cin04-machinewith the following contents:
- Run the following commands
$ gcc -Og -c swap.c $ objdump -d swap.o
- Why
%rsiand%rdi?- Procedure Data Flow:
- First six parameters will be placed into
rdi,rsi,rdx,rcx,r8,r9.- The remaining parameters will be pushed on to the stack of the calling function.
15. Hands on: data movement
- Create a file named
swap_dsp.cin04-machinewith the following contents:
- Run the following commands
$ gcc -Og -c swap_dsp.c $ objdump -d swap_dsp.o
- What is the meaning of
0x190?
16. Complete memory addressing mode
- Most General Form
D(Rb,Ri,S):Mem[Reg[Rb]+S*Reg[Ri]+ D]- D: Constant displacement 1, 2, or 4 bytes
- Rb: Base register: Any of 16 integer registers
- Ri: Index register: Any, except for
%rsp- S: Scale: 1, 2, 4, or 8
- Special Cases
(Rb,Ri):Mem[Reg[Rb]+Reg[Ri]]D(Rb,Ri):Mem[Reg[Rb]+Reg[Ri]+D](Rb,Ri,S):Mem[Reg[Rb]+S*Reg[Ri]](,Ri,S):Mem[S*Reg[Ri]]D(,Ri,S):Mem[S*Reg[Ri] + D]
17. Arithmetic and logical operations: lea
lea: load effective address- A form of
movqintsruction
lea S, D: Write&StoD.- can be used to generate pointers
- can also be used to describe common arithmetic operations.
18. Hands on: lea
- Create a file named
m12.cin04-machinewith the following contents:
- Run the following commands
$ gcc -Og -c m12.c $ objdump -d m12.o
- Review slide 16
%rdi: x(%rdi, %rdi,2)= x + 2 * x- The above result is moved to
%rdxwithlea.0x0(,%rdx,4)= 4 * (x + 2 * x) = 12*x- The above result is moved to
%raxwithlea.
19. Other arithmetic operations
- Omitting suffixes comparing to the book.
Format Computation Description add Src,DestD <- D + S add sub Src,DestD <- D - S subtract imul Src,DestD <- D * S multiply shl Src,DestD <- D « S shift left sar Src,DestD <- D » S arith. shift right shr Src,DestD <- D » S shift right sal Src,DestD <- D « S arith. shift left xor Src,DestD <- D ^ S exclusive or and Src,DestD <- D & S and or Src,DestD <- D | S or inc SrcD <- D + 1 increment dec SrcD <- D - 1 decrement neg SrcD <- -D negate not SrcD <- -D complement
- Watch out for argument order (ATT versus Intel)
- No distinction between signed and unsigned int. Why is that?
20. Challenge: lea
- Create a file named
scale.cin04-machinewith the following contents:
- Run the following commands
$ gcc -Og -c scale.c $ objdump -d scale.o
- Identify the registers holding x, y, and z.
- Which register contains the final return value?
Solution
%rdi: x%rsi: y%rdx: z%raxcontains the final return value.
Midterm
- Friday April 2, 2021
- 24-hour windows range: 12:00AM - 11:59PM April 2, 2021.
- 75 minutes duration.
- 20 questions (similar in format to the quizzes).
- Everything (including source codes) up to slide 20 of Machine language and debugging.
- No class on Friday April 2.
21. Hands on: long arithmetic
- Create a file named
arith.cin04-machinewith the following contents:
- Run the following commands
$ gcc -Og -c arith.c $ objdump -d arith.o
- Understand how the Assembly code represents the actual arithmetic operation in the C code.
22. Quick review: processor state
- Information about currently executing program
- temporary data (
%rax,…)- location of runtime stack (
%rsp)- location of current code control point (
%rip,…)- status of recent tests (
CF,ZF,SF,OFin%EFLAGS)
23. Condition codes (implicit setting)
- Single-bit registers
CF: the most recent operation generated a carry out of the most significant bit.ZF: the most recent operation yielded zero.SF: the most recent operation yielded negative.OF: the most recent operation caused a two’s-complement overflow.- Implicitly set (as side effect) of arithmetic operations.
24. Condition codes (explicit setting)
- Exlicit setting by Compare instruction
cmpq Src2, Src1cmpq b, alike computinga - bwithout setting destinationCFset if carry/borrow out from most significant bit (unsigned comparisons)ZFset ifa == bSFset if(a - b) < 0OFset if two’s complement (signed) overflow
(a>0 && b<0 && (a-b)<0) || (a<0 && b>0 && (a-b)>0)
25. Condition branches (jX)
- Jump to different part of code depending on condition codes
- Implicit reading of condition codes
jX Condition Description jmp1 direct jump jeZF equal/zero jne~ZF not equal/not zero jsSF negative jns~SF non-negative jg~(SF^OF) & ~ZF greater jge~(SF^OF) greater or equal to jlSF^OF lesser jleSF^OF | ZF lesser or equal to ja~CF & ~ZF above jbCF below
26. Hands on: a simple jump
- Create a file named
jump.cin04-machinewith the following contents:
- Run the following commands
$ gcc -Og -c jump.c $ objdump -d jump.o
- Understand how the Assembly code enables jump across instructions to support conditional workflow.
- Rerun the above commands but ommit
-Ogflag. Think about the differences in the resulting Assembly code.
27. Hands on: loop
- Create a file named
fact_loop.cin04-machinewith the following contents:
- Run the following commands
$ gcc -Og -c fact_loop.c $ objdump -d fact_loop.o
- Understand how the Assembly code enables jump across instructions to support loop.
- Create
fact_loop_2.candfact_loop_3.cfromfact_loop.c.- Modify
fact_loop_2.cso that the factorial is implemented with awhileloop. Study the resulting Assembly code.- Modify
fact_loop_3.cso that the factorial is implemented with aforloop. Study the resulting Assembly code.
28. Hands on: switch
- Create a file named
switch.cin04-machinewith the following contents:
- View
switch.cand the resultingswitch.sin a two-column tmux terminal.$ gcc -Og -S switch.c
29. Mechanisms in procedures
- Function = procedure (book terminology)
- Support procedure
Pcalls procedureQ.- Passing control
- To beginning of procedure code
- starting instruction of
Q- Back to return point
- next instruction in
PafterQ- Passing data
- Procedure arguments
Ppasses one or more parameters toQ.Qreturns a value back toP.- Return value
- Memory management
- Allocate during procedure execution and de-allocate upon return
Qneeds to allocate space for local variables and free that storage once finishes.- Mechanisms all implemented with machine instructions
- x86-64 implementation of a procedure uses only those mechanisms required
- Machine instructions implement the mechanisms, but the choices are determined by designers. These choices make up the Application Binary Interface (ABI).
30. x86-64 stack
- Region of memory managed with stack discipline
- Memory viewed as array of bytes.
- Different regions have different purposes.
- (Like ABI, a policy decision)
- Grows toward lower addresses
- Register
%rspcontains lowest stack address.- address of “top” element
31. Stack push and pop
pushq Src
- Fetch operand at
Src- Decrement
%rspby 8- Write operand at address given by
%rsppopq Dest
- Read value at address given by
%rsp- Increment
%rspby 8- Store value at Dest (usually a register)
32. Hands on: function calls
- Create a file named
mult.cin04-machinewith the following contents:
- Compile with
-gand-Ogflags and rungdbon the resulting executable.$ gcc -g -Og -o mult mult.c $ gdb mult
- Setup gdb with a breakpoint at
mainand start running.- A new GDB command is
sorstep: executing the next instruction (machine or code instruction).
- It will execute the highlighted (greened and arrowed) instruction in the
codesection.- Be careful, Intel notation in the code segment of GDB
- Run
sonce to executesub rsp,0x8
- Use
nto execute the instruction:mov edi,0x8and step over the next instruction,call 0x0400410 <malloc@plt>, which is a function call that we don’t want to step into this call.- These are the instructions for the
malloccall.mov edi,0x8 call 0x0400410 <malloc@plt>
- Pay attention to the next three instructions after
call 0x0400410 <malloc@plt.
- Recall that the return value of malloc is placed into
%rax.- These instructions setup the parameters for the upcoming
multstore(1,2,p)call.
mov rdx,rax.mov esi,0x2mov edi,0x1- Also, check the values of
raxandpgdb-peda$ p $rax gdb-peda$ p p
- This is the value of the memory block allocated via
mallocto contain onelongelement.- Run
sto step intomultstore(1,2,p)
- Inside
mulstore, we immediately prepare to launchmult2.- Run
sonce. This will store the value inrdxintorbx. This is to save the value inrdxbecause we need to use it later. parameters for themult2call.- Procedure Data Flow:
- First six parameters will be placed into
rdi,rsi,rdx,rcx,r8,r9.- The remaining parameters will be pushed on to the stack of the calling function ( in this case
mulstore).- Note that
rsp(stack pointer ofmultstore) is currently at0x7ffffffe320.- Run
sto step intomult2.
- The 7th parameter of
mult2is pushed on to the stack frame ofmultstoreand is stored at address0x7ffffffe318.- Function/procedure
mult2has no local variable, hence the stack frame is almost empty. The stack pointer,rsp, ofmult2, is actually pointing toward the address of the 7th parameter.- Run
s.- The subsequence instructions (
<mult2>through<mult2+23>) are for the multiplication.- The final value will be stored in
raxto be returned tomultstore.- Examine the two screenshots below to see how specific registers contain certain values?
rcx?r8?r9?- What is value of
raxafter<mult2+20>?- What is value of
raxafter<mult2+23>?
![]()
- Continue running
sto finish the program.
33. Hands on: array and multi-dimensional arrays
- Given array of data type
Tand lengthL:T A[L]
- Contiguous allocated region of
L*sizeof(T)bytes in memory.- Identifier
Acan be used as a pointer to array element 0.- Create a file named
array.cin04-machinewith the following contents:
- Run
cat -n array.cand remember the line number of theprintfstatement.- Compile with
-gflag and rungdbon the resulting executable.$ cat -n array.c $ gcc -g -o array array.c $ gdb array
- Setup gdb with a breakpoint at the line number for the
printfstatement and start running.- Run
i localsto view all local variables.- Use
pand variety of pointer/array syntax to view their values and addresses:*,[],&.
34. Hands on: struct
- Create a new data type (non-primitive) that groups objects of possibly different types into a single object.
- Think classes in object-oriented programming minus the methods.
- Create a file named
struct.cin04-machinewith the following contents:
- Compile with
-gflag and rungdbon the resulting executable.$ gcc -g -o struct struct.c $ gdb struct
- Setup gdb with a breakpoint at
main` and start running.- Use
nto walk through the program and answer the following questions:
- How many bytes are there in the memory block allocated for variable
p?- How are the addresses of
x,y, andcrelative top?
35. Data alignment
- Intel recommends data to be aligned to improve memory system performance.
- K-alignment rule: Any primitive object of
Kbytes must have an address that is multiple ofK: 1 forchar, 2 forshort, 4 forintandfloat, and 8 forlong,double, andchar *.- Create a file named
alignment.cin04-machinewith the following contents:
- Run
cat -n alignment.cand remember the line number of thereturnstatement.- Compile with
-gflag and rungdbon the resulting executable.$ cat -n array.c $ gcc -g -o array array.c $ gdb array gdb-peda$ b 17 gdb-peda$ run
- Run
i localsto view all local variables.- Use
pand&to view addresses of the threechararray variables and theivariable.- Why does address displacement is not an exact match between the
ivariable and the next array variable versus between the array variables?
Key Points
First key point. Brief Answer to questions. (FIXME)




Bryant and O’ Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
















