Machine language and debugging
Overview
Teaching: 0 min
Exercises: 0 minQuestions
Getting closer to the hardware via machine languages
Objectives
First learning objective. (FIXME)
1. Intel x86 processors
- Dominate laptop/desktop/server market
- Evolutionary design
- Backwards compatible up until 8086, introduced in 1978
- Added more features as time goes on
- x86 is a Complex Instruction Set Computer (CISC)
- Many different instructions with many different formats
- But, only small subset encountered with Linux programs
- Compare: Reduced Instruction Set Computer (RISC)
- RISC: very few instructions, with very few modes for each
- RISC can be quite fast (but Intel still wins on speed!)
- Current RISC renaissance (e.g., ARM, RISC V), especially for low-power
2. Intel x86 processors: machine evolution
Name Date Transistor Counts 386 1985 0.3M Pentium 1993 3.1M Pentium/MMX 1997 4.5M Pentium Pro 1995 6.5M Pentium III 1999 8.2M Pentium 4 2000 42M Core 2 Duo 2006 291M Core i7 2000 42M Core i7 Skylake 2006 291M
- Added features
- Instructions to support multimedia operations
- Instructions to enable more efficient conditional operations (!)
- Transition from 32 bits to 64 bits
- More cores
3. x86 clones: Advanced Micro Devices (AMD)
- Historically
- AMD has followed just behind Intel
- A little bit slower, a lot cheaper
- Then
- Recruited top circuit designers from Digital Equipment Corp. and other downward trending companies
- Built Opteron: tough competitor to Pentium 4
- Developed x86-64, their own extension to 64 bits
- Recent Years
- Intel got its act together
- 1995-2011: Lead semiconductor “fab” in world
- 2018: #2 largest by $$ (#1 is Samsung)
- 2019: reclaimed #1
- AMD fell behind
- Relies on external semiconductor manufacturer GlobalFoundaries
- ca. 2019 CPUs (e.g., Ryzen) are competitive again
- 2020 Epyc
![]()
4. Machine programming: levels of abstraction
Architecture
: (alsoISA
: instruction set architecture) The parts of a processor design that one needs to understand for writing correct machine/assembly code
- Examples: instruction set specification, registers
Machine Code
: The byte-level programs that a processor executesAssembly Code
: A text representation of machine codeMicroarchitecture
: Implementation of the architecture
- Examples: cache sizes and core frequency
- Example ISAs:
- Intel: x86, IA32, Itanium, x86-64
- ARM: Used in almost all mobile phones
- RISC V: New open-source ISA
5. Assembly/Machine code view
- Machine code (Assembly code) differs greatly from the original C code.
- Parts of processor state that are not visible/accessible from C programs are now visible.
- PC: Program counter
- Contains address of next instruction
- Called
%rip
(instruction pointer register)- Register file
- contains 16 named locations (registers), each can store 64-bit values.
- These registers can hold addresses (~ C pointers) or integer data.
- Condition codes
- Store status information about most recent arithmetic or logical operation
- Used for conditional branching (
if
/while
)- Vector registers to hold one or more integers or floating-point values.
- Memory
- Is seen as a byte-addressable array
- Contains code and user data
- Stack to support procedures
6. Hands on: assembly/machine code example
- Open a terminal (Windows Terminal or Mac Terminal).
- Reminder: It is
podman
on Windows anddocker
on Mac. Everything else is the same!.- Launch the container:
Windows:
$ podman run --rm --userns keep-id --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -it -v /mnt/c/csc231:/home/$USER/csc231:Z localhost/csc-container /bin/bash
Mac:
$ docker run --rm --userns=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -it -v /Users/$USER/csc231:/home/$USER/csc231:Z csc-container /bin/bash
- Inside your
csc231
, create another directory called04-machine
and change into this directory.- Create a file named
mstore.c
with the following contents:
- Run the following commands It is capital o, not number 0
$ gcc -Og -S mstore.c $ cat mstore.s $ gcc -Og -c mstore.c $ objdump -d mstore.o
- x86_64 instructions range in length from 1 to 15 bytes
- The disassembler determines the assembly code based purely on the byte-sequence in the machine-code file.
- All lines begin with
.
are directirves to the assembler and linker.
7. Data format
C data type Intel data type Assembly-code suffix Size char Byte b 1 short Word w 2 int Double word l 4 long Quad word q 8 char * Quad word q 8 float Single precision s 4 double Double precision l 8
8. Integer registers
- x86_64 CPU contains a set of 16
general purpose registers
storing 64-bit values.- Original 8086 design has eight 16-bit registers,
%ax
through%sp
.
- Origin (mostly obsolete)
%ax
: accumulate%cx
: counter%dx
: data%bx
: base%si
: source index%di
: destination index%sp
: stack pointer%bp
: base pointer- After IA32 extension, these registers grew to 32 bits, labeled
%eax
through%esp
.- After x86_64 extension, these registers were expanded to 64 bits, labeled
%rax
through%rsp
. Eight new registered were added:%r8
through%r15
.- Instructions can operate on data of different sizes stored in low-order bytes of the 16 registers.
Bryant and O’ Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
9. Assembly characteristics: operations
- Transfer data between memory and register
- Load data from memory into register
- Store register data into memory
- Perform arithmetic function on register or memory data
- Transfer control
- Unconditional jumps to/from procedures
- Conditional branches
- Indirect branches
10. Data movement
- Example:
movq Source, Dest
- Note: This is ATT notation. Intel uses
mov Dest, Source
- Operand Types:
- Immediate (Imm): Constant integer data.
$0x400
,$-533
.- Like C constant, but prefixed with
$
.- Encoded with 1, 2, or 4 bytes.
- Register (Reg): One of 16 integer registers
- Example:
%rax
,%r13
%rsp
reserved for special use.- Others have special uses in particular instructions.
- Memory (Mem): 8 (
q
inmovq
) consecutive bytes of memory at address given by register.
- Example:
(%rax)
- Various other addressing mode (See textbook page 181, Figure 3.3).
- Other
mov
:
movb
: move bytemovw
: move wordmovl
: move double wordmovq
: move quad wordmoveabsq
: move absolute quad word
11. movq Operand Combinations
movq
Source Dest Src, Dest C Analog Imm Reg movq $0x4, %rax
tmp = 0x4; Imm Mem movq $-147,(%rax)
*p = -147; Reg Reg movq %rax,%rdx
tmp2 = tmp1; Reg Mem movq %rax,(%rdx)
*p = tmp; Mem Reg movq (%rax),%rdx
tmp = *p;
12. Simple memory addressing mode
- Normal: (R) Mem[Reg[R]]
- Register R specifies memory address
- Aha! Pointer dereferencing in C
movq (%rcx),%rax
- Displacement D(R) Mem[Reg[R]+D]
- Register R specifies start of memory region
- Constant displacement D specifies offset
movq 8(%rbp),%rdx
13. x86_64 Cheatsheet
14. Hands on: data movement
- Create a file named
swap.c
in04-machine
with the following contents:
- Run the following commands
$ gcc -Og -c swap.c $ objdump -d swap.o
- Why
%rsi
and%rdi
?- Procedure Data Flow:
- First six parameters will be placed into
rdi
,rsi
,rdx
,rcx
,r8
,r9
.- The remaining parameters will be pushed on to the stack of the calling function.
15. Hands on: data movement
- Create a file named
swap_dsp.c
in04-machine
with the following contents:
- Run the following commands
$ gcc -Og -c swap_dsp.c $ objdump -d swap_dsp.o
- What is the meaning of
0x190
?
16. Complete memory addressing mode
- Most General Form
D(Rb,Ri,S)
:Mem[Reg[Rb]+S*Reg[Ri]+ D]
- D: Constant displacement 1, 2, or 4 bytes
- Rb: Base register: Any of 16 integer registers
- Ri: Index register: Any, except for
%rsp
- S: Scale: 1, 2, 4, or 8
- Special Cases
(Rb,Ri)
:Mem[Reg[Rb]+Reg[Ri]]
D(Rb,Ri)
:Mem[Reg[Rb]+Reg[Ri]+D]
(Rb,Ri,S)
:Mem[Reg[Rb]+S*Reg[Ri]]
(,Ri,S)
:Mem[S*Reg[Ri]]
D(,Ri,S)
:Mem[S*Reg[Ri] + D]
17. Arithmetic and logical operations: lea
lea
: load effective address- A form of
movq
intsruction
lea S, D
: Write&S
toD
.- can be used to generate pointers
- can also be used to describe common arithmetic operations.
18. Hands on: lea
- Create a file named
m12.c
in04-machine
with the following contents:
- Run the following commands
$ gcc -Og -c m12.c $ objdump -d m12.o
- Review slide 16
%rdi
: x(%rdi, %rdi,2)
= x + 2 * x- The above result is moved to
%rdx
withlea
.0x0(,%rdx,4)
= 4 * (x + 2 * x) = 12*x- The above result is moved to
%rax
withlea
.
19. Other arithmetic operations
- Omitting suffixes comparing to the book.
Format Computation Description add Src,Dest
D <- D + S add sub Src,Dest
D <- D - S subtract imul Src,Dest
D <- D * S multiply shl Src,Dest
D <- D « S shift left sar Src,Dest
D <- D » S arith. shift right shr Src,Dest
D <- D » S shift right sal Src,Dest
D <- D « S arith. shift left xor Src,Dest
D <- D ^ S exclusive or and Src,Dest
D <- D & S and or Src,Dest
D <- D | S or inc Src
D <- D + 1 increment dec Src
D <- D - 1 decrement neg Src
D <- -D negate not Src
D <- -D complement
- Watch out for argument order (ATT versus Intel)
- No distinction between signed and unsigned int. Why is that?
20. Challenge: lea
- Create a file named
scale.c
in04-machine
with the following contents:
- Run the following commands
$ gcc -Og -c scale.c $ objdump -d scale.o
- Identify the registers holding x, y, and z.
- Which register contains the final return value?
Solution
%rdi
: x%rsi
: y%rdx
: z%rax
contains the final return value.
Midterm
- Friday April 2, 2021
- 24-hour windows range: 12:00AM - 11:59PM April 2, 2021.
- 75 minutes duration.
- 20 questions (similar in format to the quizzes).
- Everything (including source codes) up to slide 20 of Machine language and debugging.
- No class on Friday April 2.
21. Hands on: long arithmetic
- Create a file named
arith.c
in04-machine
with the following contents:
- Run the following commands
$ gcc -Og -c arith.c $ objdump -d arith.o
- Understand how the Assembly code represents the actual arithmetic operation in the C code.
22. Quick review: processor state
- Information about currently executing program
- temporary data (
%rax
,…)- location of runtime stack (
%rsp
)- location of current code control point (
%rip
,…)- status of recent tests (
CF
,ZF
,SF
,OF
in%EFLAGS
)
23. Condition codes (implicit setting)
- Single-bit registers
CF
: the most recent operation generated a carry out of the most significant bit.ZF
: the most recent operation yielded zero.SF
: the most recent operation yielded negative.OF
: the most recent operation caused a two’s-complement overflow.- Implicitly set (as side effect) of arithmetic operations.
24. Condition codes (explicit setting)
- Exlicit setting by Compare instruction
cmpq Src2, Src1
cmpq b, a
like computinga - b
without setting destinationCF
set if carry/borrow out from most significant bit (unsigned comparisons)ZF
set ifa == b
SF
set if(a - b) < 0
OF
set if two’s complement (signed) overflow
(a>0 && b<0 && (a-b)<0) || (a<0 && b>0 && (a-b)>0)
25. Condition branches (jX)
- Jump to different part of code depending on condition codes
- Implicit reading of condition codes
jX Condition Description jmp
1 direct jump je
ZF equal/zero jne
~ZF not equal/not zero js
SF negative jns
~SF non-negative jg
~(SF^OF) & ~ZF greater jge
~(SF^OF) greater or equal to jl
SF^OF lesser jle
SF^OF | ZF lesser or equal to ja
~CF & ~ZF above jb
CF below
26. Hands on: a simple jump
- Create a file named
jump.c
in04-machine
with the following contents:
- Run the following commands
$ gcc -Og -c jump.c $ objdump -d jump.o
- Understand how the Assembly code enables jump across instructions to support conditional workflow.
- Rerun the above commands but ommit
-Og
flag. Think about the differences in the resulting Assembly code.
27. Hands on: loop
- Create a file named
fact_loop.c
in04-machine
with the following contents:
- Run the following commands
$ gcc -Og -c fact_loop.c $ objdump -d fact_loop.o
- Understand how the Assembly code enables jump across instructions to support loop.
- Create
fact_loop_2.c
andfact_loop_3.c
fromfact_loop
.c.- Modify
fact_loop_2.c
so that the factorial is implemented with awhile
loop. Study the resulting Assembly code.- Modify
fact_loop_3.c
so that the factorial is implemented with afor
loop. Study the resulting Assembly code.
28. Hands on: switch
- Create a file named
switch.c
in04-machine
with the following contents:
- View
switch.c
and the resultingswitch.s
in a two-column tmux terminal.$ gcc -Og -S switch.c
29. Mechanisms in procedures
- Function = procedure (book terminology)
- Support procedure
P
calls procedureQ
.- Passing control
- To beginning of procedure code
- starting instruction of
Q
- Back to return point
- next instruction in
P
afterQ
- Passing data
- Procedure arguments
P
passes one or more parameters toQ
.Q
returns a value back toP
.- Return value
- Memory management
- Allocate during procedure execution and de-allocate upon return
Q
needs to allocate space for local variables and free that storage once finishes.- Mechanisms all implemented with machine instructions
- x86-64 implementation of a procedure uses only those mechanisms required
- Machine instructions implement the mechanisms, but the choices are determined by designers. These choices make up the Application Binary Interface (ABI).
30. x86-64 stack
- Region of memory managed with stack discipline
- Memory viewed as array of bytes.
- Different regions have different purposes.
- (Like ABI, a policy decision)
- Grows toward lower addresses
- Register
%rsp
contains lowest stack address.- address of “top” element
31. Stack push and pop
pushq Src
- Fetch operand at
Src
- Decrement
%rsp
by 8- Write operand at address given by
%rsp
popq Dest
- Read value at address given by
%rsp
- Increment
%rsp
by 8- Store value at Dest (usually a register)
32. Hands on: function calls
- Create a file named
mult.c
in04-machine
with the following contents:
- Compile with
-g
and-Og
flags and rungdb
on the resulting executable.$ gcc -g -Og -o mult mult.c $ gdb mult
- Setup gdb with a breakpoint at
main
and start running.- A new GDB command is
s
orstep
: executing the next instruction (machine or code instruction).
- It will execute the highlighted (greened and arrowed) instruction in the
code
section.- Be careful, Intel notation in the code segment of GDB
- Run
s
once to executesub rsp,0x8
- Use
n
to execute the instruction:mov edi,0x8
and step over the next instruction,call 0x0400410 <malloc@plt>
, which is a function call that we don’t want to step into this call.- These are the instructions for the
malloc
call.mov edi,0x8 call 0x0400410 <malloc@plt>
- Pay attention to the next three instructions after
call 0x0400410 <malloc@plt
.
- Recall that the return value of malloc is placed into
%rax
.- These instructions setup the parameters for the upcoming
multstore(1,2,p)
call.
mov rdx,rax
.mov esi,0x2
mov edi,0x1
- Also, check the values of
rax
andp
gdb-peda$ p $rax gdb-peda$ p p
- This is the value of the memory block allocated via
malloc
to contain onelong
element.- Run
s
to step intomultstore(1,2,p)
- Inside
mulstore
, we immediately prepare to launchmult2
.- Run
s
once. This will store the value inrdx
intorbx
. This is to save the value inrdx
because we need to use it later. parameters for themult2
call.- Procedure Data Flow:
- First six parameters will be placed into
rdi
,rsi
,rdx
,rcx
,r8
,r9
.- The remaining parameters will be pushed on to the stack of the calling function ( in this case
mulstore
).- Note that
rsp
(stack pointer ofmultstore
) is currently at0x7ffffffe320
.- Run
s
to step intomult2
.
- The 7th parameter of
mult2
is pushed on to the stack frame ofmultstore
and is stored at address0x7ffffffe318
.- Function/procedure
mult2
has no local variable, hence the stack frame is almost empty. The stack pointer,rsp
, ofmult2
, is actually pointing toward the address of the 7th parameter.- Run
s
.- The subsequence instructions (
<mult2>
through<mult2+23>
) are for the multiplication.- The final value will be stored in
rax
to be returned tomultstore
.- Examine the two screenshots below to see how specific registers contain certain values?
rcx
?r8
?r9
?- What is value of
rax
after<mult2+20>
?- What is value of
rax
after<mult2+23>
?
![]()
- Continue running
s
to finish the program.
33. Hands on: array and multi-dimensional arrays
- Given array of data type
T
and lengthL
:T A[L]
- Contiguous allocated region of
L
*sizeof(T)
bytes in memory.- Identifier
A
can be used as a pointer to array element 0.- Create a file named
array.c
in04-machine
with the following contents:
- Run
cat -n array.c
and remember the line number of theprintf
statement.- Compile with
-g
flag and rungdb
on the resulting executable.$ cat -n array.c $ gcc -g -o array array.c $ gdb array
- Setup gdb with a breakpoint at the line number for the
printf
statement and start running.- Run
i locals
to view all local variables.- Use
p
and variety of pointer/array syntax to view their values and addresses:*
,[]
,&
.
34. Hands on: struct
- Create a new data type (non-primitive) that groups objects of possibly different types into a single object.
- Think classes in object-oriented programming minus the methods.
- Create a file named
struct.c
in04-machine
with the following contents:
- Compile with
-g
flag and rungdb
on the resulting executable.$ gcc -g -o struct struct.c $ gdb struct
- Setup gdb with a breakpoint at
main
` and start running.- Use
n
to walk through the program and answer the following questions:
- How many bytes are there in the memory block allocated for variable
p
?- How are the addresses of
x
,y
, andc
relative top
?
35. Data alignment
- Intel recommends data to be aligned to improve memory system performance.
- K-alignment rule: Any primitive object of
K
bytes must have an address that is multiple ofK
: 1 forchar
, 2 forshort
, 4 forint
andfloat
, and 8 forlong
,double
, andchar *
.- Create a file named
alignment.c
in04-machine
with the following contents:
- Run
cat -n alignment.c
and remember the line number of thereturn
statement.- Compile with
-g
flag and rungdb
on the resulting executable.$ cat -n array.c $ gcc -g -o array array.c $ gdb array gdb-peda$ b 17 gdb-peda$ run
- Run
i locals
to view all local variables.- Use
p
and&
to view addresses of the threechar
array variables and thei
variable.- Why does address displacement is not an exact match between the
i
variable and the next array variable versus between the array variables?
Key Points
First key point. Brief Answer to questions. (FIXME)