Network programming#

1. In the beginning#

  • 1966: Advanced Research Projects Agency Network (ARPANET) - DOD

  • 1969: Four sites:

    • SRI: Stanford Research Institutte

    • Utah: University of Utah

    • UCLA

    • UCSB

scientific debugging
  • 1977: East and West

    • MITRE Corporation: managed federally funded research and development centers (FFRDC) for a number of U.S. agencies (DOD, FAA, IRS, DVA, DHS, AO, CMMS, NIST)

    • BNN: now a subsidiary of Raytheon

    • Burroughs: now part of Unisys

scientific debugging

2. Now#

  • 2012: the Carna Botnet was unleashed.

    • An ethical hacking experiment in 2012 that used Nmap Scripting Engine (NSE) to scan for random devices with default telnet login username and password.

    • Over 100,000 devices had these features and could easily be accessed.

    • A spider-like crawling approach was used to have the vulnerable devices to scan for other vulnerable devices.

    • In the end, a total of 420,000 devices were assisting the internet search, and of the 4.3 billion IP addresses possible, the Carna Botnet found 1.3 billion.

    • What came from the Carna Botnet was a massive census of the internet in 2012.

carna botnet

3. A client-server transaction#

  • Most network applications are based on the client-server model:

    • A server process and one or more client processes

    • Server manages some resource

    • Server provides service by manipulating resource for clients

    • Server activated by request from client (vending machine analogy)

  • Clients and servers are processes running on hosts (can be the same or different hosts).

scientific debugging

4. Computer networks#

  • A network is a hierarchical system of boxes and wires organized by geographical proximity

    • BAN (Body Area Network) spans devices carried / worn on body

    • SAN* (System Area Network) spans cluster or machine room

    • LAN (Local Area Network) spans a building or campus

    • WAN (Wide Area Network) spans country or world

  • An internetwork (internet) is an interconnected set of networks

5. From the ground up#

  • Lowest level: Ethernet segments

    • consists of a collection of hosts connected by wires (twisted pairs) to a hub (replaced by switches and routers today).

    • Spans room or floor in a building

    • Each Ethernet adapter has a unique 48-bit address called MAC address: 00:16:ea:e3:54:e6

    • Hosts send bits to any other host in chunks called frames

    • Hub copies each bit from each port to every other port

      • Every host sees every bit

  • Next level: bridged Ethernet segments.

    • Spans building or campus

    • Bridges cleverly learn which hosts are reachable from which ports and then selectively copy frames from port to port.

  • Next level: Internet

    • Multiple incompatible LANs can be physically connected by specialized computers called routers.

    • The connected networks are called an internet (lower case)

6. Logical structure of an internet#

  • Ad hoc interconnection of networks

  • No particular topology

  • Vastly different router & link capacities

  • Send packets from source to destination by hopping through networks

    • Router forms bridge from one network to another

    • Different packets may take different routes

Logical structure of an internet

7. The notion of an internet protocol#

  • How is it possible to send bits across incompatible LANs and WANs?

  • Solution: protocol software running on each host and router

    • Protocol is a set of rules that governs how hosts and routers should cooperate when they transfer data from network to network.

    • Smooths out the differences between the different networks

8. What does an internet protocol do?#

  • Provides a naming scheme

    • An internet protocol defines a uniform format for host addresses.

    • Each host (and router) is assigned at least one of these internet addresses that uniquely identifies it.

  • Provides a delivery mechanism

    • An internet protocol defines a standard transfer unit (packet)

    • Packet consists of header and payload:

      • Header: contains info such as packet size, source and destination addresses

      • Payload: contains data bits sent from source host

9. Transferring internet data via encapsulation#

  • Ad hoc interconnection of networks

  • No particular topology

  • Vastly different router & link capacities

  • Send packets from source to destination by hopping through networks

    • Router forms bridge from one network to another

    • Different packets may take different routes

data encapsulation

10. A trip down memory lane#

11. Other issues#

  • We are glossing over a number of important questions:

    • What if different networks have different maximum frame sizes? (segmentation)

    • How do routers know where to forward frames?

    • How are routers informed when the network topology changes?

    • What if packets get lost?

  • These (and other) questions are addressed by the area of systems known as computer networking

12. Global IP Internet (upper case)#

  • Most famous example of an internet

  • Based on the TCP/IP protocol family

    • IP (Internet Protocol)

      • Provides basic naming scheme and unreliable delivery capability of packets (datagrams) from host-to-host

    • UDP (Unreliable Datagram Protocol)

      • Uses IP to provide unreliable datagram delivery from process-to-process.

    • TCP (Transmission Control Protocol)

      • Uses IP to provide reliable byte streams from process-to-process over connections.

  • Accessed via a mix of Unix file I/O and functions from the sockets interface.

13. Hardware and software organization of an Internet Application#

Internet application

14. A programmer’s view of the Internet#

  • Hosts are mapped to a set of 32-bit IP addresses (lookout for IPv6 in the future)

    • 128.2.203.179

    • 127.0.0.1 (always localhost)

  • The set of IP addresses is mapped to a set of identifiers called Internet domain names: 144.26.2.9 is mapped to www.wcupa.edu

  • A process on one Internet host can communicate with a process on another Internet host over a connection.

15. IP addresses#

  • 32-bit IP addresses are stored in an IP address struct.

    • IP addresses are always stored in memory in network byte order (big-endian byte order)

    • True in general for any integer transferred in a packet header from one machine to another.

      • E.g., the port number used to identify an Internet connection

/* Internet address structure */
struct in_addr {
  uint32_t s_addr; /* network byte order (big-endian) */
};
  • Dotted decimal notation

    • By convention, each byte in a 32-bit IP address is represented by its decimal value and separated by a period.

    • IP address: 0x8002C2F2 = 128.2.194.242

  • Use getaddrinfo and getnameinfo functions (described later) to convert between IP addresses and dotted decimal format.

  • Domain Naming System (DNS)

    • The Internet maintains a mapping between IP addresses and domain names in a huge worldwide distributed database called DNS.

    • Conceptually, programmers can view the DNS database as a collection of millions of host entries.

      • Each host entry defines the mapping between a set of domain names and IP addresses.

      • In a mathematical sense, a host entry is an equivalence class of domain names and IP addresses.

$ nslookup localhost
$ hostname -f
$ nslookup www.facebook.com
$ nslookup www.twitter.com
nslookup

16. Internet connections#

  • Clients and servers communicate by sending streams of bytes over connections. Each connection is:

    • Point-to-point: connects a pair of processes.

    • Full-duplex: data can flow in both directions at the same time,

    • Reliable: stream of bytes sent by the source is eventually received by the destination in the same order it was sent.

  • A socket is an endpoint of a connection

    • Socket address is an IPaddress:port pair

  • A port is a 16-bit integer that identifies a process:

    • Ephemeral port: Assigned automatically by client kernel when client makes a connection request.

    • Well-known port: Associated with some service provided by a server (e.g., port 80 is associated with Web servers)

17. Well known service names and ports#

  • Popular services have permanently assigned well-known ports and corresponding well-known service names:

    • echo servers: echo  7

    • ftp servers: ftp 21

    • ssh servers: ssh 22

    • email servers: smtp 25

    • Web servers: http 80

  • Mappings between well-known ports and service names is contained in the file /etc/services on each Linux machine.

18. Socket interface#

  • Set of system-level functions used in conjunction with Unix I/O to build network applications.

  • Created in the early 80’s as part of the original Berkeley distribution of Unix that contained an early version of the Internet protocols.

  • Available on all modern systems: Unix variants, Windows, OS X, IOS, Android, ARM

  • What is a socket?

    • To the kernel, a socket is an endpoint of communication

    • To an application, a socket is a file descriptor that lets the application read/write from/to the network.

    • Remember: All Unix I/O devices, including networks, are modeled as files.

  • Clients and servers communicate with each other by reading from and writing to socket descriptors.

  • The main distinction between regular file I/O and socket I/O is how the application opens the socket descriptors.

19. Hands on: client/server network programming#

  • Create a directory called 07-networks.

  • Create the following files server.c and client.c.

  • Compile the two programs.

  • Open a dual panel tmux and run server and client on each panel.

  • Type strings into client and observe how it shows up on server.

  • Type a corresponding string on server and observe how it shows on client.

  • Type exit on each side to stop client and server.

$ gcc -o server server.c
$ gcc -o client client.c
server/client