Machine language and debugging

Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • Getting closer to the hardware via machine languages

Objectives
  • First learning objective. (FIXME)

1. Intel x86 processors

  • Dominate laptop/desktop/server market
  • Evolutionary design
    • Backwards compatible up until 8086, introduced in 1978
    • Added more features as time goes on
  • x86 is a Complex Instruction Set Computer (CISC)
    • Many different instructions with many different formats
    • But, only small subset encountered with Linux programs
  • Compare: Reduced Instruction Set Computer (RISC)
    • RISC: very few instructions, with very few modes for each
    • RISC can be quite fast (but Intel still wins on speed!)
    • Current RISC renaissance (e.g., ARM, RISC V), especially for low-power

2. Intel x86 processors: machine evolution

Name Date Transistor Counts
386 1985 0.3M
Pentium 1993 3.1M
Pentium/MMX 1997 4.5M
Pentium Pro 1995 6.5M
Pentium III 1999 8.2M
Pentium 4 2000 42M
Core 2 Duo 2006 291M
Core i7 2000 42M
Core i7 Skylake 2006 291M
  • Added features
    • Instructions to support multimedia operations
    • Instructions to enable more efficient conditional operations (!)
    • Transition from 32 bits to 64 bits
    • More cores

3. x86 clones: Advanced Micro Devices (AMD)

  • Historically
    • AMD has followed just behind Intel
    • A little bit slower, a lot cheaper
  • Then
    • Recruited top circuit designers from Digital Equipment Corp. and other downward trending companies
    • Built Opteron: tough competitor to Pentium 4
    • Developed x86-64, their own extension to 64 bits
  • Recent Years
    • Intel got its act together
      • 1995-2011: Lead semiconductor “fab” in world
      • 2018: #2 largest by $$ (#1 is Samsung)
      • 2019: reclaimed #1
  • AMD fell behind
    • Relies on external semiconductor manufacturer GlobalFoundaries
    • ca. 2019 CPUs (e.g., Ryzen) are competitive again
    • 2020 Epyc

amd trolls intel 1 amd trolls intel 2

4. Machine programming: levels of abstraction

levels of abstraction

  • Architecture: (also ISA: instruction set architecture) The parts of a processor design that one needs to understand for writing correct machine/assembly code
    • Examples: instruction set specification, registers
    • Machine Code: The byte-level programs that a processor executes
    • Assembly Code: A text representation of machine code
  • Microarchitecture: Implementation of the architecture
    • Examples: cache sizes and core frequency
  • Example ISAs:
    • Intel: x86, IA32, Itanium, x86-64
    • ARM: Used in almost all mobile phones
    • RISC V: New open-source ISA

5. Assembly/Machine code view

  • Machine code (Assembly code) differs greatly from the original C code.
  • Parts of processor state that are not visible/accessible from C programs are now visible.
    • PC: Program counter
      • Contains address of next instruction
      • Called %rip (instruction pointer register)
    • Register file
      • contains 16 named locations (registers), each can store 64-bit values.
      • These registers can hold addresses (~ C pointers) or integer data.
    • Condition codes
      • Store status information about most recent arithmetic or logical operation
      • Used for conditional branching (if/while)
    • Vector registers to hold one or more integers or floating-point values.
    • Memory
      • Is seen as a byte-addressable array
      • Contains code and user data
      • Stack to support procedures

Assembly programmer

6. Hands on: assembly/machine code example

  • Open a terminal (Windows Terminal or Mac Terminal).
  • Reminder: It is podman on Windows and docker on Mac. Everything else is the same!.
  • Launch the container:

Windows:

$ podman run --rm --userns keep-id --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -it -v /mnt/c/csc231:/home/$USER/csc231:Z localhost/csc-container /bin/bash

Mac:

$ docker run --rm --userns=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -it -v /Users/$USER/csc231:/home/$USER/csc231:Z csc-container /bin/bash
  • Inside your csc231, create another directory called 04-machine and change into this directory.
  • Create a file named mstore.c with the following contents:
  • Run the following commands It is capital o, not number 0
$ gcc -Og -S mstore.c
$ cat mstore.s
$ gcc -Og -c mstore.c
$ objdump -d mstore.o

Assembler code

  • x86_64 instructions range in length from 1 to 15 bytes
  • The disassembler determines the assembly code based purely on the byte-sequence in the machine-code file.
  • All lines begin with . are directirves to the assembler and linker.

7. Data format

C data type Intel data type Assembly-code suffix Size
char Byte b 1
short Word w 2
int Double word l 4
long Quad word q 8
char * Quad word q 8
float Single precision s 4
double Double precision l 8

8. Integer registers

  • x86_64 CPU contains a set of 16 general purpose registers storing 64-bit values.
  • Original 8086 design has eight 16-bit registers, %ax through %sp.
    • Origin (mostly obsolete)
      • %ax: accumulate
      • %cx: counter
      • %dx: data
      • %bx: base
      • %si: source index
      • %di: destination index
      • %sp: stack pointer
      • %bp: base pointer
  • After IA32 extension, these registers grew to 32 bits, labeled %eax through %esp.
  • After x86_64 extension, these registers were expanded to 64 bits, labeled %rax through %rsp. Eight new registered were added: %r8 through %r15.
  • Instructions can operate on data of different sizes stored in low-order bytes of the 16 registers.

general purpose registers Bryant and O’ Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

9. Assembly characteristics: operations

  • Transfer data between memory and register
    • Load data from memory into register
    • Store register data into memory
  • Perform arithmetic function on register or memory data
  • Transfer control
    • Unconditional jumps to/from procedures
    • Conditional branches
    • Indirect branches

10. Data movement

  • Example: movq Source, Dest
  • Note: This is ATT notation. Intel uses mov Dest, Source
  • Operand Types:
    • Immediate (Imm): Constant integer data.
      • $0x400, $-533.
      • Like C constant, but prefixed with $.
      • Encoded with 1, 2, or 4 bytes.
    • Register (Reg): One of 16 integer registers
      • Example: %rax, %r13
      • %rsp reserved for special use.
      • Others have special uses in particular instructions.
    • Memory (Mem): 8 (q in movq) consecutive bytes of memory at address given by register.
      • Example: (%rax)
      • Various other addressing mode (See textbook page 181, Figure 3.3).
  • Other mov:
    • movb: move byte
    • movw: move word
    • movl: move double word
    • movq: move quad word
    • moveabsq: move absolute quad word

11. movq Operand Combinations

movq Source Dest Src, Dest C Analog
  Imm Reg movq $0x4, %rax tmp = 0x4;
  Imm Mem movq $-147,(%rax) *p = -147;
  Reg Reg movq %rax,%rdx tmp2 = tmp1;
  Reg Mem movq %rax,(%rdx) *p = tmp;
  Mem Reg movq (%rax),%rdx tmp = *p;

12. Simple memory addressing mode

  • Normal: (R) Mem[Reg[R]]
    • Register R specifies memory address
    • Aha! Pointer dereferencing in C
    • movq (%rcx),%rax
  • Displacement D(R) Mem[Reg[R]+D]
    • Register R specifies start of memory region
    • Constant displacement D specifies offset
    • movq 8(%rbp),%rdx

13. x86_64 Cheatsheet

Brown University - Dr. Doeppner

14. Hands on: data movement

  • Create a file named swap.c in 04-machine with the following contents:
  • Run the following commands
$ gcc -Og -c swap.c
$ objdump -d swap.o

swap.c

  • Why %rsi and %rdi?
  • Procedure Data Flow:
    • First six parameters will be placed into rdi, rsi, rdx, rcx, r8, r9.
    • The remaining parameters will be pushed on to the stack of the calling function.

15. Hands on: data movement

  • Create a file named swap_dsp.c in 04-machine with the following contents:
  • Run the following commands
$ gcc -Og -c swap_dsp.c
$ objdump -d swap_dsp.o

swap_dsp.c

  • What is the meaning of 0x190?

16. Complete memory addressing mode

  • Most General Form
    • D(Rb,Ri,S): Mem[Reg[Rb]+S*Reg[Ri]+ D]
    • D: Constant displacement 1, 2, or 4 bytes
    • Rb: Base register: Any of 16 integer registers
    • Ri: Index register: Any, except for %rsp
    • S: Scale: 1, 2, 4, or 8
  • Special Cases
    • (Rb,Ri): Mem[Reg[Rb]+Reg[Ri]]
    • D(Rb,Ri): Mem[Reg[Rb]+Reg[Ri]+D]
    • (Rb,Ri,S): Mem[Reg[Rb]+S*Reg[Ri]]
    • (,Ri,S): Mem[S*Reg[Ri]]
    • D(,Ri,S): Mem[S*Reg[Ri] + D]

17. Arithmetic and logical operations: lea

  • lea: load effective address
  • A form of movq intsruction
    • lea S, D: Write &S to D.
    • can be used to generate pointers
    • can also be used to describe common arithmetic operations.

18. Hands on: lea

  • Create a file named m12.c in 04-machine with the following contents:
  • Run the following commands
$ gcc -Og -c m12.c
$ objdump -d m12.o

m12.c

  • Review slide 16
  • %rdi: x
  • (%rdi, %rdi,2) = x + 2 * x
  • The above result is moved to %rdx with lea.
  • 0x0(,%rdx,4) = 4 * (x + 2 * x) = 12*x
  • The above result is moved to %rax with lea.

19. Other arithmetic operations

  • Omitting suffixes comparing to the book.
Format Computation Description
add Src,Dest D <- D + S add
sub Src,Dest D <- D - S subtract
imul Src,Dest D <- D * S multiply
shl Src,Dest D <- D « S shift left
sar Src,Dest D <- D » S arith. shift right
shr Src,Dest D <- D » S shift right
sal Src,Dest D <- D « S arith. shift left
xor Src,Dest D <- D ^ S exclusive or
and Src,Dest D <- D & S and
or Src,Dest D <- D | S or
inc Src D <- D + 1 increment
dec Src D <- D - 1 decrement
neg Src D <- -D negate
not Src D <- -D complement
  • Watch out for argument order (ATT versus Intel)
  • No distinction between signed and unsigned int. Why is that?

20. Challenge: lea

  • Create a file named scale.c in 04-machine with the following contents:
  • Run the following commands
$ gcc -Og -c scale.c
$ objdump -d scale.o

scale.c

  • Identify the registers holding x, y, and z.
  • Which register contains the final return value?

Solution

  • %rdi: x
  • %rsi: y
  • %rdx: z
  • %rax contains the final return value.

Midterm

  • Friday April 2, 2021
  • 24-hour windows range: 12:00AM - 11:59PM April 2, 2021.
  • 75 minutes duration.
  • 20 questions (similar in format to the quizzes).
  • Everything (including source codes) up to slide 20 of Machine language and debugging.
  • No class on Friday April 2.

21. Hands on: long arithmetic

  • Create a file named arith.c in 04-machine with the following contents:
  • Run the following commands
$ gcc -Og -c arith.c
$ objdump -d arith.o
  • Understand how the Assembly code represents the actual arithmetic operation in the C code.

arith.c

22. Quick review: processor state

  • Information about currently executing program
    • temporary data (%rax,…)
    • location of runtime stack (%rsp)
    • location of current code control point (%rip,…)
    • status of recent tests (CF, ZF, SF, OF in %EFLAGS)

processor state

23. Condition codes (implicit setting)

  • Single-bit registers
    • CF: the most recent operation generated a carry out of the most significant bit.
    • ZF: the most recent operation yielded zero.
    • SF: the most recent operation yielded negative.
    • OF: the most recent operation caused a two’s-complement overflow.
  • Implicitly set (as side effect) of arithmetic operations.

24. Condition codes (explicit setting)

  • Exlicit setting by Compare instruction
    • cmpq Src2, Src1
    • cmpq b, a like computing a - b without setting destination
  • CF set if carry/borrow out from most significant bit (unsigned comparisons)
  • ZF set if a == b
  • SF set if (a - b) < 0
  • OF set if two’s complement (signed) overflow
    • (a>0 && b<0 && (a-b)<0) || (a<0 && b>0 && (a-b)>0)

25. Condition branches (jX)

  • Jump to different part of code depending on condition codes
  • Implicit reading of condition codes
jX Condition Description
jmp 1 direct jump
je ZF equal/zero
jne ~ZF not equal/not zero
js SF negative
jns ~SF non-negative
jg ~(SF^OF) & ~ZF greater
jge ~(SF^OF) greater or equal to
jl SF^OF lesser
jle SF^OF | ZF lesser or equal to
ja ~CF & ~ZF above
jb CF below

26. Hands on: a simple jump

  • Create a file named jump.c in 04-machine with the following contents:
  • Run the following commands
$ gcc -Og -c jump.c
$ objdump -d jump.o
  • Understand how the Assembly code enables jump across instructions to support conditional workflow.

jump.c

  • Rerun the above commands but ommit -Og flag. Think about the differences in the resulting Assembly code.

27. Hands on: loop

  • Create a file named fact_loop.c in 04-machine with the following contents:
  • Run the following commands
$ gcc -Og -c fact_loop.c
$ objdump -d fact_loop.o
  • Understand how the Assembly code enables jump across instructions to support loop.

fact_loop.c

  • Create fact_loop_2.c and fact_loop_3.c from fact_loop.c.
  • Modify fact_loop_2.c so that the factorial is implemented with a while loop. Study the resulting Assembly code.
  • Modify fact_loop_3.c so that the factorial is implemented with a for loop. Study the resulting Assembly code.

28. Hands on: switch

  • Create a file named switch.c in 04-machine with the following contents:
  • View switch.c and the resulting switch.s in a two-column tmux terminal.
$ gcc -Og -S switch.c

switch.c

29. Mechanisms in procedures

  • Function = procedure (book terminology)
  • Support procedure P calls procedure Q.
  • Passing control
    • To beginning of procedure code
      • starting instruction of Q
    • Back to return point
      • next instruction in P after Q
  • Passing data
    • Procedure arguments
      • P passes one or more parameters to Q.
      • Q returns a value back to P.
    • Return value
  • Memory management
    • Allocate during procedure execution and de-allocate upon return
      • Q needs to allocate space for local variables and free that storage once finishes.
  • Mechanisms all implemented with machine instructions
  • x86-64 implementation of a procedure uses only those mechanisms required
  • Machine instructions implement the mechanisms, but the choices are determined by designers. These choices make up the Application Binary Interface (ABI).

30. x86-64 stack

  • Region of memory managed with stack discipline
    • Memory viewed as array of bytes.
    • Different regions have different purposes.
    • (Like ABI, a policy decision)
  • Grows toward lower addresses
    • Register %rsp contains lowest stack address.
    • address of “top” element

stack frames

31. Stack push and pop

  • pushq Src
    • Fetch operand at Src
    • Decrement %rsp by 8
    • Write operand at address given by %rsp
  • popq Dest
    • Read value at address given by %rsp
    • Increment %rsp by 8
    • Store value at Dest (usually a register) Push and pop

32. Hands on: function calls

  • Create a file named mult.c in 04-machine with the following contents:
  • Compile with -g and -Og flags and run gdb on the resulting executable.
$ gcc -g -Og -o mult mult.c
$ gdb mult
  • Setup gdb with a breakpoint at main and start running.
  • A new GDB command is s or step: executing the next instruction (machine or code instruction).
    • It will execute the highlighted (greened and arrowed) instruction in the code section.
  • Be careful, Intel notation in the code segment of GDB
  • Run s once to execute sub rsp,0x8

function

  • Use n to execute the instruction: mov edi,0x8 and step over the next instruction, call 0x0400410 <malloc@plt>, which is a function call that we don’t want to step into this call.
  • These are the instructions for the malloc call.
mov edi,0x8
call 0x0400410 <malloc@plt>
  • Pay attention to the next three instructions after call 0x0400410 <malloc@plt.
    • Recall that the return value of malloc is placed into %rax.
  • These instructions setup the parameters for the upcoming multstore(1,2,p) call.
    • mov rdx,rax.
    • mov esi,0x2
    • mov edi,0x1
  • Also, check the values of rax and p
gdb-peda$ p $rax
gdb-peda$ p p
  • This is the value of the memory block allocated via malloc to contain one long element.
  • Run s to step into multstore(1,2,p)

malloc

  • Inside mulstore, we immediately prepare to launch mult2.
  • Run s once. This will store the value in rdx into rbx. This is to save the value in rdx because we need to use it later. parameters for the mult2 call.
  • Procedure Data Flow:
    • First six parameters will be placed into rdi, rsi, rdx, rcx, r8, r9.
    • The remaining parameters will be pushed on to the stack of the calling function ( in this case mulstore).
    • Note that rsp (stack pointer of multstore) is currently at 0x7ffffffe320.
  • Run s to step into mult2.

multstore

  • The 7th parameter of mult2 is pushed on to the stack frame of multstore and is stored at address 0x7ffffffe318.
  • Function/procedure mult2 has no local variable, hence the stack frame is almost empty. The stack pointer, rsp, of mult2, is actually pointing toward the address of the 7th parameter.
  • Run s.
  • The subsequence instructions (<mult2> through <mult2+23>) are for the multiplication.
  • The final value will be stored in rax to be returned to multstore.
  • Examine the two screenshots below to see how specific registers contain certain values?
    • rcx?
    • r8?
    • r9?
    • What is value of rax after <mult2+20>?
    • What is value of rax after <mult2+23>?

mult2 mult2

  • Continue running s to finish the program.

33. Hands on: array and multi-dimensional arrays

  • Given array of data type T and length L: T A[L]
    • Contiguous allocated region of L * sizeof(T) bytes in memory.
    • Identifier A can be used as a pointer to array element 0.
  • Create a file named array.c in 04-machine with the following contents:
  • Run cat -n array.c and remember the line number of the printf statement.
  • Compile with -g flag and run gdb on the resulting executable.
$ cat -n array.c
$ gcc -g -o array array.c
$ gdb array
  • Setup gdb with a breakpoint at the line number for the printf statement and start running.
  • Run i locals to view all local variables.
  • Use p and variety of pointer/array syntax to view their values and addresses: *, [], &.

array

34. Hands on: struct

  • Create a new data type (non-primitive) that groups objects of possibly different types into a single object.
    • Think classes in object-oriented programming minus the methods.
  • Create a file named struct.c in 04-machine with the following contents:
  • Compile with -g flag and run gdb on the resulting executable.
$ gcc -g -o struct struct.c
$ gdb struct
  • Setup gdb with a breakpoint at main` and start running.
  • Use n to walk through the program and answer the following questions:
    • How many bytes are there in the memory block allocated for variable p?
    • How are the addresses of x, y, and c relative to p?

struct

35. Data alignment

  • Intel recommends data to be aligned to improve memory system performance.
    • K-alignment rule: Any primitive object of K bytes must have an address that is multiple of K: 1 for char, 2 for short, 4 for int and float, and 8 for long, double, and char *.
  • Create a file named alignment.c in 04-machine with the following contents:
  • Run cat -n alignment.c and remember the line number of the return statement.
  • Compile with -g flag and run gdb on the resulting executable.
$ cat -n array.c
$ gcc -g -o array array.c
$ gdb array
gdb-peda$ b 17
gdb-peda$ run
  • Run i locals to view all local variables.
  • Use p and & to view addresses of the three char array variables and the i variable.
  • Why does address displacement is not an exact match between the i variable and the next array variable versus between the array variables?

alignment

Key Points

  • First key point. Brief Answer to questions. (FIXME)