Machine language and debugging
Contents
Machine language and debugging#
1. Intel x86 processors#
Dominate laptop/desktop/server market
Evolutionary design
Backwards compatible up until 8086, introduced in 1978
Added more features as time goes on
x86 is a Complex Instruction Set Computer (CISC)
Many different instructions with many different formats
But, only small subset encountered with Linux programs
Compare: Reduced Instruction Set Computer (RISC)
RISC: very few instructions, with very few modes for each
RISC can be quite fast (but Intel still wins on speed!)
Current RISC renaissance (e.g., ARM, RISC V), especially for low-power
2. Intel x86 processors: machine evolution#
Name |
Date |
Transistor Counts |
|---|---|---|
386 |
1985 |
0.3M |
Pentium |
1993 |
3.1M |
Pentium/MMX |
1997 |
4.5M |
Pentium Pro |
1995 |
6.5M |
Pentium III |
1999 |
8.2M |
Pentium 4 |
2000 |
42M |
Core 2 Duo |
2006 |
291M |
Core i7 |
2008 |
731M |
Core i7 Skylake |
2015 |
1.75B |
Added features
Instructions to support multimedia operations
Instructions to enable more efficient conditional operations (!)
Transition from 32 bits to 64 bits
More cores
3. x86 clones: Advanced Micro Devices (AMD)#
Historically
AMD has followed just behind Intel
A little bit slower, a lot cheaper
Then
Recruited top circuit designers from Digital Equipment Corp. and other downward trending companies
Built Opteron: tough competitor to Pentium 4
Developed x86-64, their own extension to 64 bits
Recent Years
Intel got its act together
1995-2011: Lead semiconductor “fab” in world
2018: #2 largest by $$ (#1 is Samsung)
2019: reclaimed #1
AMD fell behind
Relies on external semiconductor manufacturer GlobalFoundaries
ca. 2019 CPUs (e.g., Ryzen) are competitive again
2020 Epyc
4. Machine programming: levels of abstraction#
>
- `Architecture`: (also `ISA`: instruction set architecture) The parts of a processor
design that one needs to understand for writing correct machine/assembly code
- Examples: instruction set specification, registers
- `Machine Code`: The byte-level programs that a processor executes
- `Assembly Code`: A text representation of machine code
- `Microarchitecture`: Implementation of the architecture
- Examples: cache sizes and core frequency
- Example ISAs:
- Intel: x86, IA32, Itanium, x86-64
- ARM: Used in almost all mobile phones
- RISC V: New open-source ISA
5. Assembly/Machine code view#
Machine code (Assembly code) differs greatly from the original C code.
Parts of processor state that are not visible/accessible from C programs are now visible.
PC: Program counter
Contains address of next instruction
Called
%rip(instruction pointer register)
Register file
contains 16 named locations (registers), each can store 64-bit values.
These registers can hold addresses (~ C pointers) or integer data.
Condition codes
Store status information about most recent arithmetic or logical operation
Used for conditional branching (
if/while)
Vector registers to hold one or more integers or floating-point values.
Memory
Is seen as a byte-addressable array
Contains code and user data
Stack to support procedures
>
6. Hands on: assembly/machine code example#
Inside your
csc231, create another directory called04-machineand change into this directory.Create a file named
mstore.cwith the following contents:
Run the following commands It is capital o, not number 0
$ gcc -Og -S mstore.c
$ cat mstore.s
$ gcc -Og -c mstore.c
$ objdump -d mstore.o
>
- x86_64 instructions range in length from 1 to 15 bytes
- The disassembler determines the assembly code based purely on the
byte-sequence in the machine-code file.
- All lines begin with `.` are directirves to the assembler and linker.
7. Data format#
C data type |
Intel data type |
Assembly-code suffix |
Size |
|---|---|---|---|
char |
Byte |
b |
1 |
short |
Word |
w |
2 |
int |
Double word |
l |
4 |
long |
Quad word |
q |
8 |
char * |
Quad word |
q |
8 |
float |
Single precision |
s |
4 |
double |
Double precision |
l |
8 |
8. Integer registers#
x86_64 CPU contains a set of 16
general purpose registersstoring 64-bit values.Original 8086 design has eight 16-bit registers,
%axthrough%sp.Origin (mostly obsolete)
%ax: accumulate%cx: counter%dx: data%bx: base%si: source index%di: destination index%sp: stack pointer%bp: base pointer
After IA32 extension, these registers grew to 32 bits, labeled
%eaxthrough%esp.After x86_64 extension, these registers were expanded to 64 bits, labeled
%raxthrough%rsp. Eight new registered were added:%r8through%r15.Instructions can operate on data of different sizes stored in low-order bytes of the 16 registers.
*Bryant and O' Hallaron, Computer Systems: A Programmer's Perspective, Third Edition*
9. Assembly characteristics: operations#
Transfer data between memory and register
Load data from memory into register
Store register data into memory
Perform arithmetic function on register or memory data
Transfer control
Unconditional jumps to/from procedures
Conditional branches
Indirect branches
10. Data movement#
Example:
movq Source, DestNote: This is ATT notation. Intel uses
mov Dest, SourceOperand Types:
Immediate (Imm): Constant integer data.
$0x400,$-533.Like C constant, but prefixed with
$.Encoded with 1, 2, or 4 bytes.
Register (Reg): One of 16 integer registers
Example:
%rax,%r13%rspreserved for special use.Others have special uses in particular instructions.
Memory (Mem): 8 (
qinmovq) consecutive bytes of memory at address given by register.Example:
(%rax)Various other addressing mode (See textbook page 181, Figure 3.3).
Other
mov:movb: move bytemovw: move wordmovl: move double wordmovq: move quad wordmoveabsq: move absolute quad word
11. movq Operand Combinations#
|
Source |
Dest |
Src, Dest |
C Analog |
|---|---|---|---|---|
Imm |
Reg |
|
tmp = 0x4; |
|
Imm |
Mem |
|
*p = -147; |
|
Reg |
Reg |
|
tmp2 = tmp1; |
|
Reg |
Mem |
|
*p = tmp; |
|
Mem |
Reg |
|
tmp = *p; |
12. Simple memory addressing mode#
Normal: (R) Mem[Reg[R]]
Register R specifies memory address
Aha! Pointer dereferencing in C
movq (%rcx),%rax
Displacement D(R) Mem[Reg[R]+D]
Register R specifies start of memory region
Constant displacement D specifies offset
movq 8(%rbp),%rdx
Midterm#
Monday October 25, 2021
12-hour windows range: 9:00AM - 9:00PM October 25, 2021.
50 minutes duration.
20-25 questions (similar in format to the quizzes).
Everything (including source codes) up to and including the episode on Representing and manipulating information.
No class on Monday October 25, 2021.
13. x86_64 Cheatsheet#
14. Hands on: data movement#
Create a file named
swap.cin04-machinewith the following contents:
Run the following commands
$ gcc -Og -c swap.c
$ objdump -d swap.o
>
- [Why `%rsi` and `%rdi`?](http://6.s081.scripts.mit.edu/sp18/x86-64-architecture-guide.html)
- Procedure Data Flow:
- First six parameters will be placed into `rdi`, `rsi`, `rdx`, `rcx`, `r8`, `r9`.
- The remaining parameters will be pushed on to the stack of the calling function.
>
15. Hands on: data movement#
Create a file named
swap_dsp.cin04-machinewith the following contents:
Run the following commands
$ gcc -Og -c swap_dsp.c
$ objdump -d swap_dsp.o
>
- What is the meaning of `0x190`?
16. Complete memory addressing mode#
Most General Form
D(Rb,Ri,S):Mem[Reg[Rb]+S*Reg[Ri]+ D]D: Constant displacement 1, 2, or 4 bytes
Rb: Base register: Any of 16 integer registers
Ri: Index register: Any, except for
%rspS: Scale: 1, 2, 4, or 8
Special Cases
(Rb,Ri):Mem[Reg[Rb]+Reg[Ri]]D(Rb,Ri):Mem[Reg[Rb]+Reg[Ri]+D](Rb,Ri,S):Mem[Reg[Rb]+S*Reg[Ri]](,Ri,S):Mem[S*Reg[Ri]]D(,Ri,S):Mem[S*Reg[Ri] + D]
17. Arithmetic and logical operations: lea#
lea: load effective addressA form of
movqintsructionlea S, D: Write&StoD.can be used to generate pointers
can also be used to describe common arithmetic operations.
18. Hands on: lea#
Create a file named
m12.cin04-machinewith the following contents:
Run the following commands
$ gcc -Og -c m12.c
$ objdump -d m12.o
>
- Review slide 16
- `%rdi`: x
- `(%rdi, %rdi,2)` = x + 2 * x
- The above result is moved to `%rdx` with `lea`.
- `0x0(,%rdx,4)` = 4 * (x + 2 * x) = 12*x
- The above result is moved to `%rax` with `lea`.
19. Other arithmetic operations#
Omitting suffixes comparing to the book.
Format |
Computation |
Description |
|---|---|---|
|
D <- D + S |
add |
|
D <- D - S |
subtract |
|
D <- D * S |
multiply |
————— |
———– |
—————— |
|
D <- D << S |
shift left |
|
D <- D >S |
arith. shift right |
|
D <- D >S |
shift right |
|
D <- D << S |
arith. shift left |
————— |
———– |
—————— |
|
D <- D ^ S |
exclusive or |
|
D <- D & S |
and |
|
D <- D | S |
or |
————— |
———– |
—————— |
|
D <- D + 1 |
increment |
|
D <- D - 1 |
decrement |
|
D <- -D |
negate |
|
D <- -D |
complement |
Watch out for argument order (ATT versus Intel)
No distinction between signed and unsigned int. Why is that?
20. Challenge: lea#
Create a file named
scale.cin04-machinewith the following contents:
Run the following commands
$ gcc -Og -c scale.c
$ objdump -d scale.o
>
- Identify the registers holding x, y, and z.
- Which register contains the final return value?
>
## Solution
- `%rdi`: x
- `%rsi`: y
- `%rdx`: z
- `%rax` contains the final return value.
{: .solution}
{: .challenge}
21. Hands on: long arithmetic#
Create a file named
arith.cin04-machinewith the following contents:
Run the following commands
$ gcc -Og -c arith.c
$ objdump -d arith.o
Understand how the Assembly code represents the actual arithmetic operation in the C code.
>
22. Quick review: processor state#
Information about currently executing program
temporary data (
%rax,…)location of runtime stack (
%rsp)location of current code control point (
%rip,…)status of recent tests (
CF,ZF,SF,OFin%EFLAGS)
>
23. Condition codes (implicit setting)#
Single-bit registers
CF: the most recent operation generated a carry out of the most significant bit.ZF: the most recent operation yielded zero.SF: the most recent operation yielded negative.OF: the most recent operation caused a two’s-complement overflow.
Implicitly set (as side effect) of arithmetic operations.
24. Condition codes (explicit setting)#
Exlicit setting by Compare instruction
cmpq Src2, Src1cmpq b, alike computinga - bwithout setting destination
CFset if carry/borrow out from most significant bit (unsigned comparisons)ZFset ifa == bSFset if(a - b) < 0OFset if two’s complement (signed) overflow(a>0 && b<0 && (a-b)<0) || (a<0 && b>0 && (a-b)>0)
25. Condition branches (jX)#
Jump to different part of code depending on condition codes
Implicit reading of condition codes
jX |
Condition |
Description |
|---|---|---|
|
1 |
direct jump |
|
ZF |
equal/zero |
|
~ZF |
not equal/not zero |
|
SF |
negative |
|
~SF |
non-negative |
|
~(SF^OF) & ~ZF |
greater |
|
~(SF^OF) |
greater or equal to |
|
SF^OF |
lesser |
|
SF^OF | ZF |
lesser or equal to |
|
~CF & ~ZF |
above |
|
CF |
below |
26. Hands on: a simple jump#
Create a file named
jump.cin04-machinewith the following contents:
Run the following commands
$ gcc -Og -c jump.c
$ objdump -d jump.o
Understand how the Assembly code enables jump across instructions to support conditional workflow.
In the next video, we will look at how
cmpandjleofabsdiffreally behave in an actual execution.
27. Hands on: loop#
Create a file named
factorial.cin04-machinewith the following contents:
Run the following commands
$ gcc -Og -c factorial.c
$ objdump -d factorial.o
Understand how the Assembly code enables jump across instructions to support loop.
Create
factorial_2.candfactorial_3.cfromfactorial.c.Modify
factorial_2.cso that the factorial is implemented with awhileloop. Study the resulting Assembly code.Modify
factorial_3.cso that the factorial is implemented with aforloop. Study the resulting Assembly code.Behavior of
factorialAssembly instructions inside GDB
28. Mechanisms in procedures#
Function = procedure (book terminology)
Support procedure
Pcalls procedureQ.Passing control
To beginning of procedure code
starting instruction of
Q
Back to return point
next instruction in
PafterQ
Passing data
Procedure arguments
Ppasses one or more parameters toQ.Qreturns a value back toP.
Return value
Memory management
Allocate during procedure execution and de-allocate upon return
Qneeds to allocate space for local variables and free that storage once finishes.
Mechanisms all implemented with machine instructions
x86-64 implementation of a procedure uses only those mechanisms required
Machine instructions implement the mechanisms, but the choices are determined by designers. These choices make up the Application Binary Interface (ABI).
29. x86-64 stack#
Region of memory managed with stack discipline
Memory viewed as array of bytes.
Different regions have different purposes.
(Like ABI, a policy decision)
Grows toward lower addresses
Register
%rspcontains lowest stack address.address of “top” element
30. Stack push and pop#
pushq SrcFetch operand at
SrcDecrement
%rspby 8Write operand at address given by
%rsp
popq DestRead value at address given by
%rspIncrement
%rspby 8Store value at Dest (usually a register)

31. What really happens in memory/registers at the beginning and the end of a function#
The
-Ogflag often combines/reduces these steps.The memory stack architecture for a function has a base pointer (
$rbp) and a stack pointer ($rsp).Base pointer: the bottom of the stack (higher memory address)
Stack pointer: the top of the stack (lower memory address)
Function prologue
Push the current base pointer onto the memory stack (to be restored later).
Assign the value of the base pointer (set the
$rbpto that value) to the current address pointed to by the stack pointer.Move the stack pointer down further (push new memory in) a distance that would accommodate local variables of the function.
Function prologue (Assembly), ATT notation, assume rbp/ebp and rsp/esp
push $rbpmov $rsp, $rbpsub N, $rsp
Function epilogue
Drop the stack pointer to the current base pointer, so room reserved in the prologue for local variables is freed.
Pops the base pointer off the stack, so it is restored to its value before the prologue.
Returns to the calling function, by popping the previous frame’s program counter off the stack and jumping to it.
Function prologue (Assembly), ATT notation, assume rbp/ebp and rsp/esp
mov $rbp, $rsppop $rbpretVideo lecture on the slide
32. Hands on: function calls#
Create a file named
mult.cin04-machinewith the following contents:
Description of C code:
Compile with
-gflag and rungdbon the resulting executable.
$ gcc -g -o mult mult.c
$ gdb mult
Setup gdb with a breakpoint at
mainand start running.A new GDB command is
si: executing the next instruction (machine or code instruction).It will execute the highlighted (greened and arrowed) instruction in the
codesection.If the Assembly instruction is calling another function, we need to use
niif we don’t want to step into that instruction.
Be careful, Intel notation in the code segment of GDB
endbr64is a new instruction to help enforce Control Flow Technology to prevent potential stitching of malicious Assembly codes.
33. Data alignment#
Intel recommends data to be aligned to improve memory system performance.
K-alignment rule: Any primitive object of
Kbytes must have an address that is multiple ofK: 1 forchar, 2 forshort, 4 forintandfloat, and 8 forlong,double, andchar *.
{% include links.md %}