System Admin and Security: All in One View

Content from Introduction to Linux

Last updated on 2024-06-24 | Edit this page

Overview

Questions

How does Linux come to be?

Objectives

Explain the historical development of Linux

1. Linux, A Brief History

The dawn of computing: system operators

First commercial computer by IBM: IBM 701 (1952)
- Require system operators to help run the programs
Second generation: IBM 704 (1956)
- Incompatible with IBM 701
- No support from IBM
- No textbook/documentation
IBM 701 system operators
- Trained by IBM for 701
- Encouraged by IBM to organize informal meetings to share expertise

From single-purpose to time-sharing

Concept of time-sharing: Operating System
Allows computers to become general-purposes (serving multiple users running different programs)
Enables system operators to transition into the role of system administrators.
MIT, General Electric (GE), and Bell Labs collaborate to create the first time-sharing system call Multics
- Multiplexed Information and Computing Service

The birth of UNIX

Bell Labs left Multics in 1964 (over budget, behind schedule)
- Ken Thompson, Rudd Canaday, Dennis Ritchie continued working on Multics
Summer 1969
- Ken Thompson’s wife and kids were out of town for a month
- One week was assigned for each key components of UNIX (the operating system, the shell, the editor, and the assembler).
UNIX
- UNiplexed Information and Computing Service
- An emasculated Multics
By 1971: as, cal, cat, chdir, chmod, chown, cmp, cp, date, dc, du, ed …
By 1973: 16 new UNIX installations (components)
- C programming language (Dennis Ritchie)
- Pipe (designed by Doug McIlroy, implemented by Ken Thompson)

A new era for UNIX

Ritchie, D.M. and Thompson, K., 1973. The UNIX time-sharing system. ACM SIGOPS Operating Systems Review, 7(4), p.27.
After 6 months, number of UNIX installation tripled.
Due to AT&T’s antitrust settlement in 1958, UNIX cannot be sold as a product, but Bell Labs still retains licensing right
- Individual software and licenses must be shipped to others.
- 1974: Berkeley UNIX (BSD - Berkeley Software Distribution - led by Robert Fabrey at University of California at Berkeley)
- 1976: V6 (John Lions at University of New South Wales - Australia)
- 1982: SunOS (Sun Microsystem by Bill Joy - graduate student of Robert Fabrey)
- 1983: AT&T UNIX System V (after court-ordered divestiture of AT&T in 1983)

The rise of Skywalker system administrators

Managing general-purpose computing systems requires a different set of skills.
Serving a wide variety of users and applications.
Universities were early leaders in fostering system admin groups
- Purdue, Utah, Colorado-Boulder, and SUNY Buffalo were the initial hotbeds
Evi Nemeth: Mother of system administration
- Graduate student administrative team.
A system administrator:
- ~~Jack of all trades~~
- Rabid jacks of all trades: Hardware, software, system configuration, programming …
1989: First edition of UNIX and Linux System Administration Handbook

The birth of Linux

Late 1990, UNIX was gaining ground everywhere …
1992: AT&T filed copyright lawsuit against BDSI and the Regents of University of California
1994: The lawsuit was settled and three files were removed from BSD code base.
- Impact was lasting
- Everyone moved to Microsoft Windows
1984: Andrew Tennenbaum of Vrije Universiteit in Amsterdam developed MINIX as a learning software for his students.
1992: Linus Torvalds, an undergraduate at University of Helsinki, Finland, developed his own OS called Linux, with inspirations from both UNIX and MINIX.
UNIX administration skill sets applies directly to Linux

Post dot-com bubble burst

Unix and Linux becomes more mainstream, as their TCO for computing servers was significantly lower than that of a Windows server.
It is not a war, but rather the right combination of Windows and Unix/Linux systems within an organization.
With its new CEO, Windows has been embracing Linux and open-source:
- 2019: https://www.anandtech.com/show/14301/microsoft-build-day-1-windows-subsystem-for-linux-gets-more-linux
- 2019: https://www.theverge.com/2019/5/6/18527870/microsoft-windows-terminal-command-line-tool
- Fully integrated on Windows 11

Where will Linux be in the future?

Cloud
IoT (small devices )

2. The course

Course progress

We will follow the book, but not following the chapters in the book’s order.
Rather, we will glean from the perspective of a normal user than eventually learn enough of Linux to become an administrators.

Work environment

Operating system: Linux
- Most servers are remote, even when you are in the same data center.
- Remote access via terminal.
Know your terminal, how to launch them?
- Linux
- Mac
  - Mac Terminal
- Windows
  - Windows Terminal, or
  - Git Bash
As an admin, you need to know how to launch terminals on any computer.

Content from Linux System Administrators

Last updated on 2024-06-24 | Edit this page

Overview

Questions

What are the key responsibilities of a Linux system administrator?
What are the required technical skills of a Linux system administrator?

Objectives

Understand the capabilities and responsibilities of a Linux system administrator

1. Essential duties of a Linux system aministrator (sysadmin)

Controlling Access
Adding Hardware
Automating Tasks
Overseeing Backups
Installing and Upgrading Software
Monitoring
Troubleshooting
Maintaining Local Documentation
Vigilantly Monitoring Security
Tuning Performance
Developing Site Policies
Working With Vendors
Fire Fighting

2. Controlling access

Create new user accounts
Remove expired accounts
Handle all account-related issues
- Access control

3. Adding hardware

Adding/removing physical components from the system
Installing/configuring corresponding hardware drivers

4. Automating tasks

Leverage script programming (scripting) and Linux/Unix system commands to automate repetitive and time-consuming tasks.
Reduce human errors
Improve response time
Indispensable to adminstrating and managing large cluster of computers.
Example script installing and configuring Docker for all user accounts on CloudLab:

5. Overseeing backup

Computing systems will fail.
Large computing systems will fail frequently.
Backup is time consuming, tedious, and highly critical.
- Should be automated!

6. Installing and upgrading software

Installing software as needed.
Upgrading/patching security holes of existing software as needed.
- Juggling multiple versions of same software.
Manage software to manage installed software.

7. Monitoring

Help identifying issues related to the computing systems.
- Collecting and analyzing log files
- Monitoring resource avaiability (CPU and memory utilization, storage availability,)

8. Hands-on: Getting started

SSH to molly. Refer to the Setup page if you need a refresher on how to do so.
Run the following commands to observe the system.
- The $ is not meant to be typed into the terminal.
- It implies that the rest of the command (htop in this case) is to be typed into a terminal.

$ htop

9. Troubleshooting

The sysadmin identifies the issue via monitoring or users’ complaints
The sysadmin needs to fix the issue.

10. Maintaining local documentation

Overtime, computing systems will become customized according to the preferences and styles of their current sysadmins.
- Software vendors
- Deployment methods
- Automation scripts
It is critical that sysadmins maintain detailed documentations so that you and others can understand/remember how the systems worked (and evolved) for maintenance and upgrade purposes.

11. Vigilantly monitoring security

Attempts within 12 hours on molly:
- This is only the most naive type of hacking attempts, there are others.
Exercise:
- Visit https://www.iplocation.net
- Find out where are these IP addresses located.

12. Tuning performance

Based on system monitors, sysadmins can, and should, configure system components (operating system configurations, software-specific configuration) in order to improve users’ application performance.
In many cases, sysadmins need to work with users to ensure that users apply application-specific run-time configuration to ensure optimal performance.

13. Developing site policies

The main responsibility of sysadmins are to deploy and maintain complex computing systems that support a diverse set of applications and users.
This includes developing appropriate documents regarding:
- Acceptable use of computer systems
- Management and retention of data
- Privacy and security of networks and systems
- Adherence to regulation (local and governmental)
- Anything specifics that you want (hope) the users to follow …

14. Working with vendors

Be the liaison between vendors and the institutions (businesses).
- Hardware vendors
- Software vendors
- Cloud providers

15. Fire fighting

On-the-fly troubleshooting of critical issues
- Most of the time user-related
- Critical patching of security issues (outside of normal maintenance schedule)

16. Relevant Tools

Highly comfortable with keyboard (say goodbye to the mouse)
Know your environment (laptop)
Text editors: nano or vim
Scripting: bash (or Python or Ruby)

17. Specialization and related areas of emphasis

DevOps
Site-reliability engineers
Security operations (SecOps) engineers
Network administrators
Database administrators
Network operations center (NOC) engineers
Data center technicians
System architects

Content from Introduction to the Linux Shell

Last updated on 2024-06-24 | Edit this page

Overview

Questions

What is a shell
How does a shell differ from traditional GUI likes Windows Desktop

Objectives

Understand the Linux shell
Be able to carry out basic commands inside the Linux shell

Prior to this lesson, you need to launch an experiment on CloudLab.

1. The Shell

Traditional computers: Graphical User Interface (GUI)
- Included in modern Linux distributions (distros)
Remote Linux cluster of computers: Command-Line Interface (CLI)
- Great for automation
- Familiarity with CLI and shell scripting is essential
Linux CLI: The Shell
- Is a program where users can type commands
- Tasks that are often managed by a mouse click are now carried out by these commands and their respective options (flags)
Shell scripting:
- Sequence of commands can be combined into a script to automate the workflow.
This is an example comparing the contents of a directory between a GUI view (left) and a CLI view (right).
- Both display contents of a home directory on a Windows Kernel Subsystem for Linux

2. Hands-on: preparing shell and data

SSH into your CloudLab experiment
Run the following commands to prepare the environment.

BASH

wget --no-check-certificate https://www.cs.wcupa.edu/lngo/data/shell-lesson-data.zip

unzip shell-lesson-data.zip

3. Files and Directories

File System: an Operating System component responsible for managing files and directories.
Perspective:
- On a GUI, you click to move from one place to another, so you are outside the file system space looking in.
- On a CLI, you need to explicitly provide direction (path) for the command to know with which file/directory it is supposed to interact. The perspective is more inside the file system space.

Key commands:
- pwd: path of working (current) directory
- ls: listing
- cd: change directory

4. Key commands: pwd, ls, cd

pwd returns the absolute path to the current working directory (i.e.: where you are when you are in the terminal).

BASH

pwd

ls returns the list of current files and directories in the target directory.

BASH

ls /

Listing of directories and files in the root directory

There are many options available for different commands. To view the documentation, run the followings:
- As a sys admin, you have to become very good at reading documentation!

BASH

ls --help

Detailed manual can be viewed using the following command:
- Use the Space key to move down page by page
- How do you quit?

BASH

man ls

View help documentation for ls using man

Challenge: exploring more flags

You can also use two options at the same time. What does the command ls do when used with the -l option? What about if you use both the -l and the -h option?
Some of its output is about properties that we do not cover in this lesson (such as file permissions and ownership), but the rest should be useful nevertheless.

Show me the solution

The -l option makes ls use a long listing format, showing not only the file/directory names but also additional information, such as the file size and the time of its last modification.
If you use both the -h option and the -l option, this makes the file size human readable, i.e. displaying something like 5.3K instead of 5369.

Challenge: Listing in reverse chronological order

By default, ls lists the contents of a directory in alphabetical order by name. The command ls -t lists items by time of last change instead of alphabetically. The command ls -r lists the contents of a directory in reverse order.
Which file is displayed last when you combine the -t and -r options? Hint: You may need to use the -l option to see the last changed dates.

Show me the solution

The most recently changed file is listed last when using -rt. This can be very useful for finding your most recent edits or checking to see if a new output file was written.

Run ls by itself will list the contents of the current directory.

BASH

ls

cd allows users to change the current directory (outcome of pwd) to the target directory.
- Run man cd or cd --help to read the documentation for cd.
- The generate syntax for cd is cd DESTINATION with DESTINATION can either be absolute or relative paths or special paths.
Change to root directory and view contents of root:

BASH

cd /

ls

Special paths:
- ~: home direcrory
- .: current directory
- ..: a directory that is one level above the current directory
Change to your home directory using either the special paths or /home/YOURUSERNAME (YOURUSERNAME: your username on molly)
- Check the content of your home directory to confirm that you have the shell-lesson-data directory.
- Change into shell-lesson-data directory and view the contents of this directory

BASH

cd ~

ls

cd shell-lesson-data

ls

Challenge: Reading comprehension

Using the filesystem diagram below.
If pwd displays /Users/backup and -r tells ls to display things in reverse order, what command(s) will result in the following output:

BASH

pnas_sub/ pnas_final/ original/

ls pwd
ls -r -F
ls -r -F /Users/backup

Show me the solution

No: pwd is not the name of a directory.
Yes: ls without directory argument lists files and directories in the current directory.
Yes: uses the absolute path explicitly.

5. General syntax of a shell command

ls is the command, with an option -F and an argument /.
Option:
- either start with a single dash (-) or two dashes (--),
- change the behavior of a command.
- can be referred to as either switches or flags.
Arguments tell the command what to operate on (e.g. files and directories).
Sometimes options and arguments are referred to as parameters.
- The shell is in fact just a process/function and these options and arguments are being passed as parameters to the shell’s function that is responsible for executing the command.
A command can be called with more than one option and more than one argument, but a command doesn’t always require an argument or an option.
Each part is separated by spaces: if you omit the space between ls and -F the shell will look for a command called ls-F, which doesn’t exist.
Capitalization can be important.
- ls -s will display the size of files and directories alongside the names
- ls -S will sort the files and directories by size

6. Hands-on: explore data

Check where you are, change back to your home directory, then navigate to exercise-data.

BASH

pwd

cd ~

cd shell-lesson-data

cd exercise-data/writing

ls -F

7. Creating directories: mkdir

Create a directory called thesis, and check for its existence.
- Also check that there is nothing inside the newly created directory.

BASH

mkdir thesis

ls -F

Challenge: creating multiple directories

What is the role of the -p flag in the following commands:

BASH

mkdir ../project/data 

ls -F ../project

mkdir -p ../project/data

mkdir -p ../project/report ../project/results

ls -F ../project

Show me the solution

-p allows the creation of all directories on the specified path, regardless whether any directory on that path exists.

Important for directory and file names in Linux!!!
- Do not use spaces/special characters in file and directory names.
- Use -, _, and . for annotation, but do not begin the names with them.

8. Creating files: nano (or vim)

Linux terminal environment is text-only, hence its editors are text only as well.
- nano
- vim
- emacs.
Fun read: One does not simply exist vim
We are using nano (lowest learning curve).
Create a file named draft.txt inside thesis.
- Type in the contents shown in the screenshot.

BASH

pwd

ls

cd thesis

nano draft.txt

To save the text, you need to press Ctrl + O keys:
- Press and hold Ctrl then press O.
- You will be asked whether to keep the same file name or to edit the name. Press Enter to confirm.
To quit nano, press Ctrl + X.
- If you have not saved the text before, nano will ask if you want to save the file first and confirm the name with Y or N.

9. Moving files and directories: mv

mv is short for move. It will move a file/directory from one location to another.

BASH

cd ~/shell-lesson-data/exercise-data/writing

ls thesis

mv thesis/draft.txt thesis/quotes.txt

ls thesis

mv thesis/quotes.txt .

ls thesis

ls

Challenge: Moving files to a new folder

After running the following commands, Jamie realizes that she put the files sucrose.dat and maltose.dat into the wrong folder. The files should have been placed in the raw folder.

BASH

ls -F

analyzed/ raw/

ls -F analyzed

fructose.dat glucose.dat maltose.dat sucrose.dat

cd analyzed

Fill in the blanks to move these files to the raw folder:

BASH

mv sucrose.data maltose.data ____/_____

Show me the solution

BASH

mv sucrose.data maltose.data ../raw

10. Copying files and directories: cp

cp stands for copy. It copies a file or directory to a new location, possibly with a new name.

BASH

cp quotes.txt thesis/quotations.txt

ls quotes.txt thesis/quotations.txt

cp -r thesis thesis_backup

ls thesis thesis_backup

Challenge: Renaming files

Suppose that you created a plain-text file in your current directory to contain a list of the statistical tests you will need to do to analyze your data, and named it: statstics.txt
After creating and saving this file you realize you misspelled the filename! You want to correct the mistake, which of the following commands could you use to do so?

cp statstics.txt statistics.txt
mv statstics.txt statistics.txt
mv statstics.txt .
cp statstics.txt .

Show me the solution

No. While this would create a file with the correct name, the incorrectly named file still exists in the directory and would need to be deleted.
Yes, this would work to rename the file.
No, the period(.) indicates where to move the file, but does not provide a new file name; identical file names cannot be created.
No, the period(.) indicates where to copy the file, but does not provide a new file name; identical file names cannot be created.

Challenge: Moving and copying

What is the output of the last ls command in the sequence shown below?

BASH

pwd

/home/rammy/data

ls

proteins.dat

mkdir recombined

mv proteins.dat recombined/

cp recombined/proteins.dat ../proteins-saved.dat

ls

proteins-saved.dat recombined
recombined
proteins.dat recombined
proteins-saved.dat

Show me the solution

No, proteins-saved.dat is located at /home/rammy/
Yes
proteins.dat is located at /home/rammy/data/recombined
No, proteins-saved.dat is located at /home/rammy/

11. Removing files and directories: rm

Returning to the shell-lesson-data/exercise-data/writing directory, let’s tidy up this directory by removing the quotes.txt file we created.
The command we’ll use for this is rm (short for ‘remove’):

BASH

cd ~/shell-lesson-data/exercise-data/writing

ls 

rm quotes.txt

ls quotes.txt

rm thesis

rm -r thesis

12. Wildcards

* is a wildcard, which matches zero or more characters.
- Inside shell-lesson-data/exercise-data/proteins directory:
  - *.pdb matches ethane.pdb, propane.pdb, and every file that ends with ‘.pdb’.
  - p*.pdb only matches pentane.pdb and propane.pdb, because the ‘p’ at the front only matches filenames that begin with the letter ‘p’.
? is also a wildcard, but it matches exactly one character. So
- ?ethane.pdb would match methane.pdb
- *ethane.pdb matches both ethane.pdb, and methane.pdb.
Wildcards can be used in combination with each other
- ???ane.pdb matches three characters followed by ane.pdb.
- cubane.pdb, ethane.pdb, octane.pdb.
When the shell sees a wildcard, it expands the wildcard to create a list of matching filenames before running the command that was asked for. It is the shell, not the other programs, that deals with expanding wildcards.
Change into shell-lesson-data/exercise-data/proteins and try the following commands

BASH

ls *t*ane.pdb

ls *t?ne.*

ls *t??ne.pdb

ls ethane.*

Challenge: more on wildcards

Sam has a directory containing calibration data, datasets, and descriptions of the datasets:

BASH

.

├── 2015-10-23-calibration.txt

├── 2015-10-23-dataset1.txt

├── 2015-10-23-dataset2.txt

├── 2015-10-23-dataset_overview.txt

├── 2015-10-26-calibration.txt

├── 2015-10-26-dataset1.txt

├── 2015-10-26-dataset2.txt

├── 2015-10-26-dataset_overview.txt

├── 2015-11-23-calibration.txt

├── 2015-11-23-dataset1.txt

├── 2015-11-23-dataset2.txt

├── 2015-11-23-dataset_overview.txt

├── backup

│   ├── calibration

│   └── datasets

└── send_to_bob

    ├── all_datasets_created_on_a_23rd

    └── all_november_files

Before heading off to another field trip, Sam wants to back up her data and send datasets created the 23rd of any month to Bob. Sam uses the following commands to get the job done:

BASH

cp *dataset* backup/datasets

cp ____calibration____ backup/calibration

cp 2015-____-____ send_to_bob/all_november_files/

cp ____ send_to_bob/all_datasets_created_on_a_23rd/

Help Sam by filling in the blanks.

The resulting directory structure should look like this ~bash . ├── 2015-10-23-calibration.txt ├── 2015-10-23-dataset1.txt ├── 2015-10-23-dataset2.txt ├── 2015-10-23-dataset_overview.txt ├── 2015-10-26-calibration.txt ├── 2015-10-26-dataset1.txt ├── 2015-10-26-dataset2.txt ├── 2015-10-26-dataset_overview.txt ├── 2015-11-23-calibration.txt ├── 2015-11-23-dataset1.txt ├── 2015-11-23-dataset2.txt ├── 2015-11-23-dataset_overview.txt ├── backup │ ├── calibration │ │ ├── 2015-10-23-calibration.txt │ │ ├── 2015-10-26-calibration.txt │ │ └── 2015-11-23-calibration.txt │ └── datasets │ ├── 2015-10-23-dataset1.txt │ ├── 2015-10-23-dataset2.txt │ ├── 2015-10-23-dataset_overview.txt │ ├── 2015-10-26-dataset1.txt │ ├── 2015-10-26-dataset2.txt │ ├── 2015-10-26-dataset_overview.txt │ ├── 2015-11-23-dataset1.txt │ ├── 2015-11-23-dataset2.txt │ └── 2015-11-23-dataset_overview.txt └── send_to_bob ├── all_datasets_created_on_a_23rd │ ├── 2015-10-23-dataset1.txt │ ├── 2015-10-23-dataset2.txt │ ├── 2015-10-23-dataset_overview.txt │ ├── 2015-11-23-dataset1.txt │ ├── 2015-11-23-dataset2.txt │ └── 2015-11-23-dataset_overview.txt └── all_november_files ├── 2015-11-23-calibration.txt ├── 2015-11-23-dataset1.txt ├── 2015-11-23-dataset2.txt └── 2015-11-23-dataset_overview.txt~

Show me the solution

BASH

cp *calibration.txt backup/calibration

cp 2015-11-* send_to_bob/all_november_files/

cp *-23-dataset* send_to_bob/all_datasets_created_on_a_23rd/

Content from Access Control

Last updated on 2024-06-24 | Edit this page

Overview

Questions

Who has the ability to carry out privileged tasks in a Linux system?

Objectives

Understand the scopes of administrative controls inside Linux
Undersand common access control models

1. Unix standards

Access control decisions depend on which user is attempting to perform and operation on that user’s membership in a UNIX group.
Objects have owners. Owners have broad (but not necessarily unrestricted) control over their objects.
You own the object you create.
The special user account root can act as the owner of any object. Only root can perform certain sensitive administrative operation.

2. Unix/Linux standards for access control

Access control decisions depend on which user is attempting to perform and operation on that user’s membership in a UNIX group.
Objects have owners. Owners have broad (but not necessarily unrestricted) control over their objects.
You own the object you create.
The special user account root can act as the owner of any object. Only root can perform certain sensitive administrative operation.

3. Root

The omnipotent administrative user (superuser)
Can perform all restrictive operations:
- Creating device files
- Setting the system clock
- Raising resource usage limits and process priorities
- Setting the system’s hostname
- Configuring network interfaces
- Opening privileged network ports (those below 1024)
- Shutting down the system

4. Rootly powers

sudo: Running the commands as another user. If there is no username provided, the user is going to be root.
For security purposes, the password of the root account should always be very complicated and not be given out lightly.
Administrative teams are often granted sudo power, meaning that they can execute commands in the name of other accounts, including root.

How does it help with security aspects, since technically everyone have rootly power anyway with sudo?

5. Hands-on: Rootly power

SSH into the CloudLab experiment launched earlier.
whoami: Give you the effective user id of the one running the shell.
Run the following bash commands to observe the power of sudo:

BASH

whoami

sudo whoami

cat /etc/shadow

sudo cat /etc/shadow

6. Other (less secure) means of granting rootly powers

setuid
- Grant privilege to the task (the program), not the user
- Possible by leveraging a process’ user ID:
  - real user ID (ruid)
  - effective user ID (euid)
  - saved user ID (suid)

BASH

id

7. Hands-on: setuid

A way to grant privileges to non-root and non-sudo account.

BASH

man chown

man chmod

cat /etc/shadow

which cat

cp $(which cat) mycat

./mycat /etc/shadow

sudo chown root mycat

sudo chmod 4755 mycat

./mycat /etc/shadow

8. Managemen of the root account

Why direct log in of root account is a bad idea.
- Root logins leave no record of what operations where performed as root.
- We also don’t know who logged in as root.
By default, most systems allow root login to be disabled on terminals, through the windows systems, and across the network.
- Passwordless root account is another solution.
If root is accessible, password must be really good.

Challenge: granting sudo power

Run the following command to create a new account called student.

BASH

sudo useradd -s /bin/sh -d /home/student -m student

Search for documentation to find out how to turn student into an account with passwordless power.

Show me the solution

Setup passwordless sudo

8. Drawback of standard models

Root access presents a potential single point of failure.
The setuid alternative is difficult to manage due to potential capability leaks from complex software suites.
Minimal control over network security.
Group management cannot be done by users (more work for administrators).
Access control rules are embedded in individual codes, cannot be easily rewritten.
Little to no support for auditing and logging.

9. Extensions to standard models

PAM: Pluggable Authentication Modules
- Wrapper for various method-specific authentication libraries
- SSO (Single Sign-On)
Kerberos: netowrk cryptographic authentication
- Authentication rather than access control
- Uses trusted third party to perform authentication for an entire network.
Filesystem access control lists )ACL)
- Set permissions for multiple users and groups at once.
Linux capabilities
- man capabilities
- Privileges traditionally associated with superuser are divided into units, known as capabilities, which can be independently enabled and disabled.
- Capabilities are a per-thread attribute.
- This is in use extenively for higher-level systems like AppArmor or Docker.
Linux namespaces
- Processes can be separated into hierarchical partitions (namespaces) from which they see only a subset of the system’s files, network ports, and processes.
- Preemptive access control.
- Foundation for software containerization
- Docker

10. Modern access control

Linux’s standard access control model is considered discretionary access control (DAC)
- Owners of access-controlled entities to set the permissions on them.
- Bad example: users expose their home directories.
Mandatory access control (MAC)
- Administrators write access control policies that override or supplement DAC.
- Enabling technology for new security models.
- Principle of least privilege
Role-based access control (RBAC)
- Added layer of indirection to access control calculations
- Permissions are granted to intermediate constructs (roles), and roles are assigned to users.
- roles can have hierarchical relationships (easier to administer)
SELinux: Security-Enhanced Linux
- MAC model
- Created by NSA
- Difficult to administer and troubleshoot

Content from Process Control

Last updated on 2024-06-24 | Edit this page

Overview

Questions

What is a process?
How can we monitor running processes?

Objectives

Undertand how to observe and monitor running processes.

1. Components of a process

Represents a running program
An address space
A set of data structures within the kernel
- Address space map
- Current status of the process
- Execution priority of the process
- Information about resources
- Information about files and open ports
- Signal mask
- Owner
- …

2. Identifiers

PID: process ID number
PPID: parent PID - how is a process spawn?
UID: user identifier of the person who created the process
EUID: effective user identifier - what resources the process has access to at any given moment
GID: group identifier of the person who created the process
EGID: effective group identifier

3. Life cycle of a process

When system boot, the first process is created (init or systemd) with process ID of 1
All other processes are created through fork
- fork() creates a copy of the parent process.
- In the copy (child), fork() returns 0.
- In the original (parent), fork() returns the new PID of the child process.

4. Signals

Process-level interrupt requests
Approximately thirty signals
- Are sent as means of communication
- Are sent by terminal driver to kill/interrupt/suspend processes using Ctrl-C or Ctrl-Z
- Are sent by kill to terminal processes
- Are sent by kernel when faults happen (e.g., divided by 0)
- Are sent by kernel to notify the process of interesting events (I/O data is available, child process is dead …)

5. Important signals

Signal description
HUP
INT
QUIT
KILL
BUS
SEGV
TERM
STOP
TSTP
CONT
WINCH
USR1
USR2

6. Monitoring processes: ps

ps: Snapshot of current processes
ssh to your CloudLab experiment and run:

BASH

ps aux

USER, PID, %CPU, %MEM
VSZ: Virtual size of the process
RSS: Resident set size (number of pages)
TTY: control terminal ID
STAT: current process status (Process State Codes from manual)
TIME: CPU time process consumed
COMMAND: command and arguments.

7. Other monitoring commands:

top
htop

8. Hands-on:

Spend 10-15 minutes to read and learn how to use tmux
This is to be done on CloudLab
strace: check what a process is doing
- Create a tmux session with two horitonzal panel.
- Run top in the first screen
- In the second screen
- Identify the process ID of top
- Run strace on this process ID:

BASH

sudo strace -p <top_process_ID>

9. Runaway processes:

System slows down!
Use ps and top (htop) to identify those that soak up CPU and memory
Check filesystem disk usage: df -h
Check directory usage: du -h

10. Periodic processes:

The cron daemon allows the execution of commands on a predetermined schedule.
Run the followings

crontab -e

Select an editor (recommend nano)
Type * * * * * echo $(/bin/date) >> /users/<your_username>/cron.log into the file
Save and quit nano (we did this before)
Wait for a few minutes, then check the content of cron.log
Common usage:
- Sending mail
- Cleaning up a file system
- Rotating a log file
- Running batch jobs
- Backing up and mirroring
A legacy!!!

Content from The Filesystem

Last updated on 2024-06-24 | Edit this page

1. In Linux, everything is a file

Processes
Audio devices
Kernel data structures and tuning parameters
Inter-process communication channels

2. Main components

A namespace
An API
Securiy models
An implementation

3. Path names

Single unified hierarchy start at root: /
Absolute path: path name starts from root
Relative path: path name starts from current directory: . or subdirectory name

4. Mouning and unmounting

The root file system is composed of smaller trunks (smaller filesystems)
Smaller file systems are attached to the tree with the mount command, which …
- Maps a directory within the existing filesystem tree, called the mount point, to the root of the newly attached filesystem.

5. Who is doing what on which file system?

fuser

BASH

man fuser

sudo fuser -cv /users

Instead of rebooting, perhaps unmounting/remounting of offending device drivers.

6. Organizaion of the file system tree

7. Filetype encoding

Character/block device file: standard communication interface provided by device drivers.
Local domain sockets: connections between processes that allow them to communicate hygienically.
Named pipes allow communication between two processes running on the same host.
Symbolic links: point to a file by name
Hard links: create an illusion that a file exists in more than one place at the same time.

8. File attributes

Traditionally 12 bits for each file: the file’s mode (plus 4 more bits : file’s type)
9 permission bits - read, write, execute for owner, group, others
setuid & setgid bits (4000 , 2000)
- setgid on directory - newly created file has group ownership of the directory (not group ownership of a user creating it)
sticky bit (1000)
- on regular files ignored (original meaning: keep program text on swap device)
- on directories - only the owner of the file and the owner of that directory may remove the file from that directory

File type permission using mnemonic syntax

9. Access control lists:

supported for ext2, ext3, ext4, reiserfs, XFS, JFS: mount -o [no]acl
allows rwx to be set independently for any user.group combination: getfacl, setfacl ( plus man acl)
NFSv4 - superset of POSIX ACLs plus all permission bits and most semantics from Windows

Content from User Management

Last updated on 2024-06-24 | Edit this page

1. What is a user?

Nothing more than a number (user ID - UID)
Everything else revolves around this number
System maps UID to additional set of information based on an API.

2. /etc/passwd

login names: ≤ 32 chars, case sensitive and (in some cases) even special chars (☹, …)
encrypted password ( or *) - do NOT leave empty
- DES, MD5($1$..), Blowfish($2y$), SHA-256 ($5$), SHA-512 ($6$),
- check /etc/login.defs or (was) /etc/default/passwd plus PAM and on RHEL/CentoS authconfig.
UID (32-bit integer)
- 0 for root by default
- do not recycle them (or as late as possible) - ? why
- should be unique in the whole organization ( else NFS problems, ..)
GID
GECOS (finger’s interpretation)
home dir
login shell

3. /etc/group

Contains the names of UNIX groups and list of each group’s members

4. System utility commands

useradd
userdel
usermod
pwconv
pwunconv
groupadd
groupmod
groupdel

Content from Software Installation and Configuration

Last updated on 2024-06-24 | Edit this page

1. Launch your CloudLab experiment

Instantiate an experiment from the CloudLab profile created last week
SSH into the experiment once it is ready.

2. Basic installation and configuration

Two approaches:
- From package management system
- From source
Hands-on:
- Install nginx from package management system
- Install another version of nginx into /opt/nginx/VERSION

3. Overview

The purpose of this document is to create a LAMP (Linux/Apache/MySQL/Php) installation on your experiment.
With this in place, we’ll also install two common LAMP applications:
- phpMyAdmin: a web app for managing MySQL databases
- Drupal: a content management system.
More passwords:
- The MySQL installation needs an administrative (root) password to get started. This is an important one, but we will make it so that you, with machine root access, do not have to remember it.
- phpMyAdmin has its own administrative database/user/password; fortunately you do not need to remember this password.
- Drupal also has its own administrative database/user/password which you don’t have to remember. Drupal also requires a site administrator login/password, which you do have to remember.

4. MySQL: Installing mysql-server

Run the following commands

BASH

sudo apt update

sudo apt install -y mysql-server

sudo systemctl start mysql.service

5. MySQL: Configuring mysql-server

Open mysql

BASH

mysql

Run the following commands inside MySQL

ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'password';

exit

Setup MySQL using mysql_secure_installation: first, run the following command

BASH

sudo mysql_secure_installation

Answer the questions on the prompt as follows:

Securing the MySQL server deployment.



Enter password for user root:



VALIDATE PASSWORD COMPONENT can be used to test passwords

and improve security. It checks the strength of password

and allows the users to set only those passwords which are

secure enough. Would you like to setup VALIDATE PASSWORD component?



Press y|Y for Yes, any other key for No: n

Using existing password for root.

Change the password for root ? ((Press y|Y for Yes, any other key for No) : n



 ... skipping.

By default, a MySQL installation has an anonymous user,

allowing anyone to log into MySQL without having to have

a user account created for them. This is intended only for

testing, and to make the installation go a bit smoother.

You should remove them before moving into a production

environment.



Remove anonymous users? (Press y|Y for Yes, any other key for No) : n



 ... skipping.





Normally, root should only be allowed to connect from

'localhost'. This ensures that someone cannot guess at

the root password from the network.



Disallow root login remotely? (Press y|Y for Yes, any other key for No) : n



 ... skipping.

By default, MySQL comes with a database named 'test' that

anyone can access. This is also intended only for testing,

and should be removed before moving into a production

environment.





Remove test database and access to it? (Press y|Y for Yes, any other key for No) : n



 ... skipping.

Reloading the privilege tables will ensure that all changes

made so far will take effect immediately.



Reload privilege tables now? (Press y|Y for Yes, any other key for No) : Y

Success.



All done!

6. MySQL: Test mysql-server

Run the following command and provide the password to test the MySQL server:
- mysql -u root -p
- To quit MySQL, type \q
Create a file named .my.cnf (run sudo nano /root/.my.cnf)directly inside /root with the following content:

[client]

user=root

password="MYSQL_ROOT_PASS"

Test the effectiveness of this passwordless setup

sudo su

mysql

\q

exit

sudo -H mysql

\q

7. Apache

Run the following commands

$ sudo apt install -y  apache2 php libapache2-mod-php php-cli php-mysql php-cgi php-curl php-json php-apcu php-gd php-xml php-mbstring

Test the installation once it is completed by open a web browser and go to your CloudLab experiment’s host name (the same one that you SSH to)

The Apache service is one which needs to be reset often because of configuration changes. Control the Apache server is done as a SysV service which is based on the systemctl command:

BASH

sudo systemctl  COMMAND  apache2.service

or, more simply:

BASH

sudo systemctl  COMMAND  apache2

where COMMAND can be: status, start, stop, restart, reload.

If something goes wrong, the first place to look is usually this log file: /var/log/apache2/error_log
This error log file is readable by you, a system admin, without invoking sudo. Often a useful thing to do is to follow the tail of this file as messages are generated with using the tail -f command:

BASH

tail -f /var/log/apache2/error_log

If you make a configuration change, you can test its effectiveness by running this prior to attempting to reset the service.

BASH

sudo apachectl configtest

Apache understands user sites as the automatic association of a special directory owned by you (by default, ~/public_html) to the URL http://hostname/~LOGIN

BASH

mkdir ~/public_html

User directories are not enabled by default. To enable the Apache userdir module by:

$ sudo a2enmod userdir

$ sudo systemctl reload apache2

The first command simply creates a symbolic link. Check it yourself:

$ ll /etc/apache2/mods-enabled/userdir.*

Visit http://hostname/~LOGIN
- Does it work?
- Why?
Run $ sudo nano /etc/apache2/mods-available/userdir.conf
- Make the appropriate changes or additions
- Save and reload
- Try visiting the site again

8. PHP

To enable PHP, edit the file /etc/apache2/mods-enabled/php7.4.conf
Comment out the last five lines as per the instructions in the comments
Restart apache2
Create a file named hello.php in your public_html directory with the following contents:

<?php

echo "Hello from PHP";

?>

Refresh your home page and view this file.

9. PHP/phpmyadmin

Run the following command to install phpmyadmin

sudo apt-get install -y phpmyadmin

Press the Space button to check apache2, then Tab to move to Ok and press Enter.

Accept default Yes answer on Configuring database for phpadmin with dbconfig-common.
Select and enter a password for phpmyadmin.
- Press the Tab key to go to Ok, then press Enter
Re-enter the above password for confirmation, then Tab to Ok and Enter.
Provide the password of account root for MySQL (from the MySQL installation).
- Press the Tab key to go to Ok, then press Enter

10. Drupal

Run the following commands

BASH

sudo su

cd

wget https://ftp.drupal.org/files/projects/drupal-9.4.2.tar.gz

Find your machine simple hostname
- Going forward, MACHINE will refer to the outcome of this command.

BASH

hostname -f | awk -F\. '{print $1}'

Find your machine full hostname
- Going forward, HOSTNAME will refer to the outcome of this command.

BASH

hostname -f

Think of a password for your drupal database.
- Going forward, DRUPAL_DB_PASS will refer to this value.

BASH

mysql

mysql> create database drupal;

mysql> create user drupal@localhost identified by "DRUPAL_DB_PASS";

mysql> grant all on drupal.* to drupal@localhost;

mysql> quit;

Install drupal

BASH

tar xzf drupal-9.4.2.tar.gz

mv drupal-9.4.2 /var/www/html/$(hostname -f | awk -F\. '{print $1}')_drupal

Visit http://HOSTNAME/MACHINE_drupal to start the browser-based configuration process for Drupal.
On first windows, select language English and click Save and Continue.
Next, select Standard then click Save and Continue.
We need to address two errors and one warnings.

11. Error: File system

BASH

clear

cd /var/www/html/$(hostname -f | awk -F\. '{print $1}')_drupal

pwd

apt install -y acl

mkdir sites/default/files

setfacl -m g:www-data:rwx sites/default/files

Scroll to bottom of page, click try again to confirm that the File system error message is gone.

12. Error: Settings File

Confirm that you are still inside the drupal directory.

BASH

pwd

cp sites/default/default.settings.php sites/default/settings.php

setfacl -m g:www-data:rw sites/default/settings.php

Scroll to bottom of page, click try again to confirm that the Settings File error message is gone.

13. Warnings: Clean URLS

Confirm that you are still inside the drupal directory.

BASH

pwd

a2enmod rewrite

mv .htaccess /etc/apache2/conf-available/drupal.conf

Edit /etc/apache2/conf-available/drupal.conf and add <Directory /var/www/html/MACHINE_drupal> as first line and </Directory> as last line.

BASH

a2enconf drupal

systemctl reload apache2

Scroll to bottom of page, click try again to confirm that the warning error message is gone and the configuration has moved on to the next step.

Provide the authentication for the drupal username and database table as created earlier.
Wait for the installation to complete.
For configuration:
- Site name: MACHINE
- Site email address: your email address.
- Other options can be selected as you see fit.
Challenge: Create a first page via Drupal and display the content.

14. Challenge: Automating the process

Ansible is one of the more well-known configuration management tools at the disposal of sys admins.
This is to help facilitate the automatic installation and configuration of various software.
Tasks:
- Install Ansible on CloudLab experiment
- Follow instructions at Digital Ocean to setup and create a LAMP stack using Ansible playbooks.
- Integrate everything into the CloudLab experiment so that all is automated!

Content from Bash Scripting

Last updated on 2024-06-24 | Edit this page

Based on previous materials by Dr. Robert Kline

1. Overview of Bash Script

Just like the script for a movie that tells the actors what they should say and do, a script for a computer tells the computer what is should do or say.
A bash script is a plain text file which contains a series of commands.
Anything you can run normally on the command line can be put into a script and it will do exactly the same thing. Similarly, anything you can put into a script can also be run normally on the command and it will do exactly the same thing.

2. Setup

In your CloudLab experiment, run the following:

BASH

wget --no-check-certificate https://cs.wcupa.edu/lngo/assets/src/bash_basics.zip

unzip bash_basics.zip

cd bash_basics

ls

These scripts will be used to illustrate concepts in the remainder of this slide deck.
There is far too much content in the Bash language to be covered in any single document like this one, a tutorial, or even an introductory textbook. Inevitably, if you need to write programs in Bash, you will have to consult the online manual: https://linux.die.net/man/1/bash

3. Executing a bash script

Bash script files can be named as you like. Unlike Windows systems, the extension is not an essential feature which determines the usage. The .sh extension is merely a convention which can assist editor recognition. All scripts can be executed explicitly using the bash executable:

BASH

bash SOME-SCRIPT.sh

Create a file named hello.sh with the following content:

BASH

echo "hello world"

Execute the file using the following command:

BASH

bash hello.sh

4. Self-executing bash script

The file itself must be executable by you.
If you are the owner of the script you can add that permission with statements like:

BASH

chmod +x SOME-SCRIPT.sh

BASH

chmod 700 SOME-SCRIPT.sh

The file must either be locatable by its path prefix or have its containing directory in the PATH variable. A full path to the script might be: /usr/local/bin/SOME-SCRIPT.sh
If the script is in the shell’s current directory, this is also a full path: ./SOME-SCRIPT.sh
The file must identify itself as self-executing.
If the first two characters are #!, this indicates that the file is a text-based script file, and that the remaining portion of the first line provides the program to run the script. Thus, a Bash script begins with this first line: #!/bin/bash
Edit and add #!/bin/bash to the first line of hello.sh

BASH

chmod +x hello.sh

./hello.sh

5. The Bash Language

The Bash language has three main functions:
- execute commands interactively
- extend the set of commands via scripts
- build up, via sourcing, the user environment with variables, aliases, functions
In particular, Bash, per se, is not a general purpose programming script language like, say, Perl, Python or TCL.
- Its main orientation is towards executing the standard UNIX command set and Bash scripts rely heavily on the standard UNIX commands.

6. Interactive Execution

When a shell is run interactively the lines of a bash program a re created one-by-one.
Shell code usually is considers the script to be interactive if the prompt variable, PS1 is defined, since all statements receive this prompt before entry.
In interactive execution, Bash will source each statement, which is a form of execution in which all variable settings are retained.
Interactive execution also permits many user-friendly control features not necessary in script execution such as:
- line repeat control with up and down arrows
- line editing and extension features
- tab-based command and filename completion

7. Variables and Values

The program scalars.sh illustrates basic principles of Bash variables and values. In particular, the only scalar data type is a string. Values are created in several ways:
within uninterpolated quotes: ’ ’
within interpolated quotes: ” ”
the output of a command within shell evaluated back quotes $`\_`$ or within $( )
a bareword which is not a Bash reserved word and contains no special operator characters

8. String operations

The most basic operation on strings is concatenation, which, in Bash, is simply juxtaposition. In general, whitespace sequences are collapsed into a single blank; whitespace sequences at the ends of strings are truncated (i.e., trimmed).
Variables are defined using the assign operator = in a very strict sort of way.
Once a variable, v, is defined, its value is automatically used with the expression $v.
A double-quoted variable’s value, like "$y", can behave differently from $y when the value has internal whitespace. If there is any doubt, it is recommended to always use double quotes.
A newline is interpreted as a statement terminator. A semicolon (;) can also be used as a statement terminator if you want two or more statements on the same line.
View, then execute scalars.h
Observe the corresponding outcomes versus the codes

BASH

more scalars.sh

./scalars.sh

Type something and hit Enter to exit this script.

9. echo and printf

Although echo is the most common output statement, Bash also supports the C-style printf statement, e.g.,

BASH

printf "num=%05d\n" 27

echo AFTER

There is an equivalent to sprintf (printf to a variable) in the form of

BASH

printf -v num "%05d" 27

echo $num

For most situations, echo is more common. It is easy to use and, for the most part does what you want in a simple manner. One
- problem spot is printing control characters like \t for tab.
- The bash syntax for this control character has the cumbersome form: $’
For example, these two statements generate the same output:

BASH

echo   $'\t'foo

printf "\tfoo\n"

As you can imagine the printf version is more memorable. On feature available to echo which is not available to printf is colorization. When used with the -e flag, echo interprets certain special convoluted escape sequences as indication to change the color of the output. For example this prints “HELLO” in bold red followed by “THERE” in (normal) black

BASH

echo -e "\033[01;31m HELLO \033[0m THERE"

The output need not be separated like this, we are simply making it easier to see

10. Other types and declarations

Bash, just as other languages, does support additional structured data types in the form of lists and maps (associative lists).
It also provides a way of assigning a type to a variable through a the declare statement. View and execute the following script for observation

BASH

more scalar-declares.sh

./scalar-declares.sh

11. Command-line arguments

One of the primary purpose of the bash language is to extend the set of commands. For this reason Bash provides simple access to the command-line parameters. Bash uses the variables $1, $2, etc. The expression $0 is the command name itself. They should be double-quoted. Use these test-runs:

BASH

$ more args.sh

$ ./args.sh 

$ ./args.sh  a     b    c

$ ./args.sh "a     b"   c

12. if-else statements

The bash if-else syntax is unusual compared to other languages. The format looks like this:

BASH

if ...

then

  some statements

elif ...

  some statements

else

  some statements

fi

The “…” sections represent boolean “tests”. The chained elif and the else parts are optional. The “then” syntax is often written on the same line as the if portion like this: if ...; then

13. Program exit status

BASH

more pingtest.sh

./pingtest.sh 

./pingtest.sh 8.8.8.8

./pingtest.sh 2.2.2.2

What is happening is that the ping operation with the options used is a single ping which can either succeed or fail within 2 seconds with these two possible outcomes:
it succeeds with exit status is 0, the test is true and the if part is executed.
it fails with non-zero exit status, the test is false and the else part is executed.
The notion of true and false in these bash tests can be counter-intuitive: an exit status of 0 means true, non-zero means false. The $? construct used in echo status=$? is a Bash special variable which gives the exit status of a previous command (and so it has to come before the second echo statement).

14. The && and || operators

The && and || operators are much the same sense as other languages using short-circuit execution.
In Bash they are often used to express the chaining of operations based on success or failure. A good example is: c++ myprog.cc && a.out, in which we only run the compiled program if the compilation succeeds.

15. Boolean expressions in test statements

What is considered as boolean expression in an if test uses this syntax:

BASH

if [ BOOLEAN-EXPRESSION ]; then

  statements ...

fi

The only value regarded as false is the empty string. Bash does not recognize any numerical types per se, only strings used in a numerical context. An undefined value is, in every way, equivalent to the empty string in Bash.
You have to be careful about using an undefined variable in a script since it may be an exported variable and, thereby, implicitly defined. You can always explicitly undefined a variable x by unset x.
You can verify the values of false by viewing and running this sample script: falsetest.sh

BASH

more falsetest.sh

./falsetest.sh

An example usage is this line in pingtest.sh:

BASH

[ "$host" ] || { echo usage: $(basename $0) "<host or ip>"; exit 1; }

In this example host is the first parameter; if undefined, give a “usage” message.

16. Unary file information operators

A number of common Bash constructions use the unary “–” prefix file test operators, e.g.,
- -e NAME: NAME exists as a file (of some type)
- -f NAME: NAME exists as a regular file
- -d NAME: NAME exists as a directory
An example of this appears in the ~/.bashrc startup script:

BASH

if [ -f ~/.bash_aliases ]; then

    . ~/.bash_aliases

fi

17. Binary test operators

The if operator (and other tests) can be used with boolean expressions using appropriate syntax.
The test expressions are normally within single brackets [ .. ].
- There is a single space after [ and before ].
Within these we have these operator usages:
- =, !=: lexicographic comparison
- -eq, -ne, -lt, -le, -gt, -ge: numerical comparison
However both double brackets [[ .. ]] and double parentheses (( .. )) can serve as delimiters.
The operators < and > normally represent file redirection, but can be used for lexicographic comparison, within [[ .. ]] and numerical comparison within (( .. )).
You can view and observe some examples from: test-values.sh

BASH

more test-values.sh

./test-values.sh

18. Subtle syntax issues

The way Bash deals with strings has certain unexpected consequences. Consider the program errors.sh:

BASH

more errors.sh

./errors.sh

When executed, the 3 out of 4 test lines are flagged as errors:
- line 4: [a: command not found
- line 5: [: missing]’`
- line 7: [: a: unary operator expected
The first two mistakes were caused by having the expression $x touch a bracket.
The last was caused by the missing quotes around the $y expression in which case it interpreted the inserted expression "a b" as the operator a with argument b.

19. String patterns and the case statement

Bash uses primitive globbing patterns for various matching operations.
The most common is the usage of * which matches any sequence of characters.
Less common is ? which matches any single character and even less common are character sets, such as [A-Z] and [^0-9].
These type of expressions stand in contrast to more powerful regular expression pattern generators which, in Bash, are only available through auxiliary commands.
Glob patterns are simple, familiar patterns such as those used commonly in file listing:
- ls *.html # all HTML files (not starting with “.”)
- ls .??* # all dot files except “.” and “..”
- ls test[0-3] # “test0”, “test1”, “test2”, “test3”
The Bash case statement distinguishes itself from an if/else constructions primarily by its ability to test its cases by matching the argument against glob patterns. The syntax is like this:

BASH

case "$file" in

  *.txt)  # treat "$file" like a text file

          ;;

  *.gif)  # treat it like a GIF file

          ;;

  *) # catch-all

     ;;

esac

Unlike C++ or Java syntax, the break exits an enclosing loop, not exit the particular case.

20. Loops

Bash has both for and while loops. However, the type of control for these is typically not numerical. The most common looping structure in Bash is the for/in structure like this: for x in … do statements involving $x done

Loops The “…” is a list of things generated in a number of ways. The x is the loop variable which iterates through each item in the list. For example, try running this program in the current directory: $ more fileinfo.sh $ ./fileinfo.sh In this case the things iterated are the files in the current directory. Loops One can use numerical-like looping with the double-parentheses like those in for numerical comparison: for ((i=1; i<=10; ++i)); do echo $i done

21. Reading lines in Bash

The while loop also has an advantage in its ability to read live input. For example, this simple program reads and echos input lines:

BASH

while read line; do

  echo "$line"

done

In a programmatic setting, it is often useful to process lines generated from the output of some command.
Say we want to process all words starting with my in the system dictionary (/usr/share/dict/words) by removing - the initial my part.
The following two scripts represent two possible ways of doing so:

BASH

more process-lines-1.sh

more process-lines-2.sh

The command grep ^my /usr/share/dict/words is used to generate the target information.
The two respective approaches to processing this are:
- input redirection into the while ... done loop using the manufactured “input device” < (grep ^my /usr/share/dict/words)
- piping (i.e., |) the command into the “while … done” loop.
It turns out that only the former method works as we want it to. The problem with the latter method is that the count variable is being manipulated in a subshell created by the pipe operation and so its value cannot be used upon exiting the while loop.
- In contrast, the former method with the odd syntax “<(..)” turns out to be more useful.

22. Command-line options

Command-line arguments commonly consist of option arguments beginning with a “-”. Consider, for example, the follow unzip command which extracts FILE.zip into /usr/local,
- doing so with no output (-q) and
- overriding existing files (-o).
- The FILE.zip portion is the argument and others are options.
- Some options, like -d, take an argument themselves.
The unzip command takes many more options (mostly prior to the argument).

BASH

unzip -q -o FILE.zip -d /usr/local

The options can be “compressed” under certain circumstances. For example, this is an equivalent call:

BASH

unzip -qo FILE.zip -d /usr/local

The bash built-in operation getopts is meant to assist in extracting these options from the command line.
Consider the program getopts-test.sh:

BASH

more getopts-test.sh

./getopts-test.sh

Running this command

BASH

./getopts-test.sh -q -o FILE.zip -d /usr/local

yields the output:

BASH

q 2

o 3

? 3

FILE.zip

d 3 /usr/local

? 3

The while loop while getopts runs through the arguments looking for -n, -o, -q, -s options.
- OPTIND gives the position of the option (1-based).
- When a non-option argument is encountered the while loop terminates with flag set to ?. We can keep on going by shifting everything out and resetting OPTIND back to 1.
The second part of the option search uses: while getopts "d:" flag
- The “d:” syntax indicates that the d option also takes an argument. In this case, the $OPTARG expression captures that value.
A useful style of option sensing is to set option flag variables as shown in optflags.sh. Try the followings:

BASH

./optflags.sh

./optflags.sh -abc foo -d bar foobar barfoo

What is happening is that the variables opt_a, opt_b, and opt_c are being created through deferred evaluation using the Bash eval function.
The actual $flag, say “b”, subtitutes into the evaluated expression eval “opt_$flag=1" thus defining `opt_b` and setting it. We can later test for the presence of the "b" flag by: `if [ "$opt_b” ]; then …`

23. Built-in string processing operations

The Bash language itself has very unintuitive string-processing operations. Later we’ll see how to use UNIX commands to do string processing.

BASH

more string-processing.sh

./string-processing.sh

24. Functions

Functions offer an improvement of aliases. They must be defined before being used. In practice, they are often grouped into Bash files which are sourced within the script which uses them.
Functions are supposed to emulate the way commands work. They do not return values in the usual way; any value sent back by the return statement must be an integer which acts like the exit code of an executable.

BASH

more functions.sh

./functions.sh

25. System command string processing

The Bash language relies heavily on the UNIX-like environment in which it resides in order to create utility scripts. This environment includes many standard UNIX string processing operations such as these:
sed: (stream editor) for regular-expression substitution
grep: can be used to perform match testing with -c (count) option; the -e option uses regular expression instead of glob patterns
awk: captures the fields of a line (separated by whitespace) and does operations on these fields;
tr: translate from one list of characters to another; often used to convert case of a string
sed, grep, awk, and tr are used in Bash via standard I/O. All above operations act on text files when given file name as a parameter, or act from standard input with no arguments.
A common bash expression which uses an external OPERATION to compute some internal value looks something like this: result="$(echo "input string" | OPERATION)"
The pipe operator “|” is crucial for passing the input string to OPERATION via echo. The following program illustrates some of these external operations.

BASH

more string-operations.sh

./string-operations.sh

Content from More Linux Bash Scriptings

Last updated on 2024-06-24 | Edit this page

Overview

Questions

How can sys admin setup complex workflows in Linux?

Objectives

Understand pipe and filter tools

1. Pipes and Filters

It is possible to combine multiple Linux commands into one
Settings:
- Data files have .pdb extension.
Question:
- Which of these files contains the fewest lines?

2. Capturing output from commands

SSH into your CloudLab experiment.
Run the following commands to prepare the environment.

BASH

clear

cd

pwd

wget --no-check-certificate https://www.cs.wcupa.edu/lngo/data/shell-lesson-data.zip

unzip shell-lesson-data.zip

cd ~/shell-lesson-data/exercise-data/proteins

ls -l *.pdb

To get counts of characters, words, and lines in a file, we use wc.

BASH

man wc

wc *.pdb

wc -l *.pdb

We can use the > to redirect output to a file
- > redirects output and creates a new file.
- >> appends output to a file (if the file already exists, else creates a new file)

BASH

ls

wc -l *.pdb > lengths.txt

ls

cat lengths.txt

wc -l *.pdb >> lengths.txt

cat lengths.txt

wc -l *.pdb > lengths.txt

cat lengths.txt

3. Filtering output

We can sort the contents of lengths.txt using sort

BASH

man sort

Challenge: what does sort -n do?

Explain what does -n do by observing the following two commands

BASH

sort ~/shell-lesson-data/exercise-data/numbers.txt

10

19

2

22

6

BASH

sort -n ~/shell-lesson-data/exercise-data/numbers.txt

2

6

10

19

22

Show me the solution

The -n option specifies a numerical rather than an alphanumerical sort.

Let’s look at lengths.txt:

sort -n lengths.txt

sort -n lengths.txt > sorted-lengths.txt

cat sorted-lengths.txt

We can use the head command to get the first line

head -n 1 sorted-lengths.txt

4. Passing output to another command

We used intermediate files to store output. We can use a pipe (|) to combine them together.

sort -n lengths.txt | head -n 1

We can combine multiple commands

wc -l *.pdb | sort -n | head -n 1

Challenge: piping commands together

In our current directory, we want to find the 3 files which have the least number of lines. Which command listed below would work?

wc -l * > sort -n > head -n 3
wc -l * | sort -n | head -n 1-3
wc -l * | head -n 3 | sort -n
wc -l * | sort -n | head -n 3

Show me the solution

Option 4 is the solution. The pipe character | is used to connect the output from one command to the input of another. > is used to redirect standard output to a file. Try it in the shell-lesson-data/exercise-data/proteins directory!

Challenge: pipe reading comprehension

A file called animals.csv (in the shell-lesson-data/exercise-data/animal-counts folder) contains the following data:

BASH

cat ~/shell-lesson-data/exercise-data/animal-counts/animals.csv

2012-11-05,deer,5

2012-11-05,rabbit,22

2012-11-05,raccoon,7

2012-11-06,rabbit,19

2012-11-06,deer,2

2012-11-06,fox,4

2012-11-07,rabbit,16

2012-11-07,bear,1

What text passes through each of the pipes and the final redirect in the pipeline below? Note, the sort -r command sorts in reverse order.

BASH

cat animals.csv | head -n 5 | tail -n 3 | sort -r > final.txt

Show me the solution

BASH

2012-11-06,rabbit,19

2012-11-06,deer,2

2012-11-05,raccoon,7

Challenge: pipe construction

For the file animals.csv from the previous exercise, consider the following command:

BASH

man cut

cut -d , -f 2 animals.csv

The uniq command filters out adjacent matching lines in a file. How could you extend this pipeline (using uniq and another command) to find out what animals the file contains (without any duplicates in their names)?

Show me the solution

BASH

cut -d , -f 2 animals.csv | sort | uniq

Challenge: which pipe?

The file animals.csv contains 8 lines of data formatted as follows::

BASH

2012-11-05,deer,5

2012-11-05,rabbit,22

2012-11-05,raccoon,7

2012-11-06,rabbit,19

...

The uniq command has a -c option which gives a count of the number of times a line occurs in its input. Assuming your current directory is shell-lesson-data/exercise-data/animal-counts, what command would you use to produce a table that shows the total count of each type of animal in the file?

sort animals.csv | uniq -c
sort -t, -k2,2 animals.csv | uniq -c
cut -d, -f 2 animals.csv | uniq -c
cut -d, -f 2 animals.csv | sort | uniq -c
cut -d, -f 2 animals.csv | sort | uniq -c | wc -l

Show me the solution

Option 4. is the correct answer.

5. Nelle’s Pipeline: Checking Files

Nelle has run her samples through the assay machines and created 17 files in the north-pacific-gyre directory described earlier. Let’s check the integrity of this data:

BASH

cd ~/shell-lesson-data/north-pacific-gyre

ls -l

How do we check for data integrity? Imagine if you have thousands of files?

BASH

wc -l *.txt | sort -n | head -n 5

This is possible by looking at metadata (line counts, word counts, etc)
There are also files containing Z in their names,

BASH

ls *Z.txt

It is important to be careful when using wildcards if we don’t want to include these strange files in our calculations.

6. Loop

Suppose we have several hundred genome data files named basilisk.dat, minotaur.dat, and unicorn.dat. For this example, we’ll use the exercise-data/creatures directory which only has three example files, but the principles can be applied to many many more files at once.

The structure of these files is the same:

The common name, classification, and updated date are presented on the first three lines
The DNA sequences on the following lines.

Let’s look at the files:

BASH

cd ~/shell-lesson-data/exercise-data/creatures/

head -n 5 basilisk.dat minotaur.dat unicorn.dat

We would like to print out the classification for each species, which is given on the second line of each file.
For each file, we would need to execute the command head -n 2 and pipe this to tail -n 1.
We’ll use a loop to solve this problem, but first let’s look at the general form of a loop:

for thing in list_of_things

do

    operation_using $thing    # Indentation within the loop is not required, but aids legibility

done

and we can apply this to our example like this:

BASH

for filename in basilisk.dat minotaur.dat unicorn.dat

> do

>   head -n 2 $filename | tail -n 1

> done

Follow the prompt

The shell prompt changes from $ to > and back again as we were typing in our loop. The second prompt, >, is different to remind us that we haven’t finished typing a complete command yet. A semicolon, ;, can be used to separate two commands written on a single line.

When the shell sees the keyword for, it knows to repeat a command (or group of commands) once for each item in a list.
Inside the loop, we call for the variable’s value by putting $ in front of it. The $ tells the shell interpreter to treat the variable as a variable name and substitute its value in its place, rather than treat it as text or an external command.
In this example, the list is three filenames: basilisk.dat, minotaur.dat, and unicorn.dat. Each time the loop iterates, it will assign a file name to the variable filename and run the head command.
- The first time through the loop, $filename is basilisk.dat. The interpreter runs the command head on basilisk.dat and pipes the first two lines to the tail command, which then prints the second line of basilisk.dat.
- For the second iteration, $filename becomes minotaur.dat. This time, the shell runs head on minotaur.dat and pipes the first two lines to the tail command, which then prints the second line of minotaur.dat.
- For the third iteration, $filename becomes unicorn.dat, so the shell runs the head command on that file, and tail on the output of that.
- Since the list was only three items, the shell exits the for loop.

7. Challenges: loop

Here we see > being used as a shell prompt, whereas > is also used to redirect output.
Similarly, $ is used as a shell prompt, but, as we saw earlier, it is also used to ask the shell to get the value of a variable.
If the shell prints > or $ then it expects you to type something, and the symbol is a prompt.
If you type > or $ yourself, it is an instruction from you that the shell should redirect output or get the value of a variable.
When using variables it is also possible to put the names into curly braces to clearly delimit the variable name:
- $filename is equivalent to ${filename}, but is different from ${file}name. You may find this notation in other people’s programs.

Challenge: write your own loop

How would you write a loop that echoes all 10 numbers from 0 to 9?

Show me the solution

BASH

for loop_variable in 0 1 2 3 4 5 6 7 8 9

> do

>   echo $loop_variable

> done

Challenge: variables in loops

This exercise refers to the shell-lesson-data/exercise-data/proteins directory.
Run the following commands, observe the outputs, and answer the questions:

BASH

cd ~/shell-lesson-data/exercise-data/proteins/

ls *.pdb

What is the output of the following code?

BASH

for datafile in *.pdb

> do

>   ls *.pdb

> done

Now, what is the output of the following code?

BASH

for datafile in *.pdb

> do

>   ls $datafile

> done

Why do these two loops give different outputs?

Show me the solution

The first code block gives the same output on each iteration through the loop.
- Bash expands the wildcard *.pdb within the loop body (as well as before the loop starts) to match all files ending in .pdb and then lists them using ls.
The second code block lists a different file on each loop iteration. The value of the datafile variable is evaluated using $datafile, and then listed using ls.

Challenge: limiting sets of files

:class: note - What would be the output of running the following loop in the shell-lesson-data/exercise-data/proteins directory?

BASH

cd ~/shell-lesson-data/exercise-data/proteins/

for filename in c*

> do

>   ls $filename

> done

No files are listed.
All files are listed.
Only cubane.pdb, octane.pdb and pentane.pdb are listed.
Only cubane.pdb is listed.

How would the output differ from using this command instead?

BASH

cd ~/shell-lesson-data/exercise-data/proteins/

for filename in *c*

> do

>   ls $filename

> done

The same files would be listed.
All the files are listed this time.
No files are listed this time.
The files cubane.pdb and octane.pdb will be listed.
Only the file octane.pdb will be listed.

Show me the solution

4 is the correct answer. * matches zero or more characters, so any file name starting with the letter c, followed by zero or more other characters will be matched.
8 is the correct answer. * matches zero or more characters, so a file name with zero or more characters before a letter c and zero or more characters after the letter c will be matched.

Challenge: saving to a file in a Loop

:class: note - In the shell-lesson-data/exercise-data/proteins directory, what is the effect of this loop?

BASH

cd ~/shell-lesson-data/exercise-data/proteins/

for alkanes in *.pdb

> do

>   echo $alkanes

>   cat $alkanes > alkanes.pdb

> done

Prints cubane.pdb, ethane.pdb, methane.pdb, octane.pdb, pentane.pdb and propane.pdb, and the text from propane.pdb will be saved to a file called alkanes.pdb.
Prints cubane.pdb, ethane.pdb, and methane.pdb, and the text from all three files would be concatenated and saved to a file called alkanes.pdb.
Prints cubane.pdb, ethane.pdb, methane.pdb, octane.pdb, and pentane.pdb, and the text from propane.pdb will be saved to a file called alkanes.pdb.
None of the above.

Also in the shell-lesson-data/exercise-data/proteins directory, what would be the output of the following loop?

BASH

cd ~/shell-lesson-data/exercise-data/proteins/

for datafile in *.pdb

> do

>   cat $datafile >> all.pdb

> done

All of the text from cubane.pdb, ethane.pdb, methane.pdb, octane.pdb, and pentane.pdb would be concatenated and saved to a file called all.pdb.
The text from ethane.pdb will be saved to a file called all.pdb.
All of the text from cubane.pdb, ethane.pdb, methane.pdb, octane.pdb, pentane.pdb and propane.pdb would be concatenated and saved to a file called all.pdb.
All of the text from cubane.pdb, ethane.pdb, methane.pdb, octane.pdb, pentane.pdb and propane.pdb would be printed to the screen and saved to a file called all.pdb.

Show me the solution

1. The text from each file in turn gets written to the alkanes.pdb file. However, the file gets overwritten on each loop iteration, so the final content of alkanes.pdb is the text from the propane.pdb file.
7 is the correct answer. >> appends to a file, rather than overwriting it with the redirected output from a command. Given the output from the cat command has been redirected, nothing is printed to the screen.

8. More complicated loop

Run the following loop
- The shell starts by expanding *.dat to create the list of files it will process.
- The loop body then executes two commands for each of those files.
  - The first command, echo, prints its command-line arguments to standard output. In this case, since the shell expands $filename to be the name of a file, echo $filename prints the name of the file.
  - Finally, the head and tail combination selects lines 81-100 from whatever file is being processed (assuming the file has at least 100 lines).

BASH

cd ~/shell-lesson-data/exercise-data/creatures

for filename in *.dat

> do

>   echo $filename

>   head -n 100 $filename | tail -n 20

> done

We would like to modify each of the files in shell-lesson-data/exercise-data/creatures, but also save a version of the original files, naming the copies original-basilisk.dat and original-unicorn.dat.
We can’t use:

BASH

cp *.dat original-*.dat

because that would expand to:

BASH

cp basilisk.dat minotaur.dat unicorn.dat original-*.dat

This wouldn’t back up our files, instead we get an error:

BASH

cp: target `original-*.dat' is not a directory

This problem arises when cp receives more than two inputs. When this happens, it expects the last input to be a directory where it can copy all the files it was passed. Since there is no directory named original-*.dat in the creatures directory we get an error.
Instead, we can use a loop:

BASH

for filename in *.dat

> do

>   cp $filename original-$filename

> done

Since the cp command does not normally produce any output, it’s hard to check that the loop is doing the correct thing. However, we learned earlier how to print strings using echo, and we can modify the loop to use echo to print our commands without actually executing them. As such we can check what commands would be run in the unmodified loop.

The following diagram shows what happens when the modified loop is executed, and demonstrates how the judicious use of echo is a good debugging technique.

9. Nelle’s Pipeline: Processing Files

Nelle is now ready to process her data files using goostats.sh — a shell script written by her supervisor. This calculates some statistics from a protein sample file, and takes two arguments:

an input file (containing the raw data)
an output file (to store the calculated statistics)

Since she’s still learning how to use the shell, she decides to build up the required commands in stages. Her first step is to make sure that she can select the right input files — remember, these are ones whose names end in ‘A’ or ‘B’, rather than ‘Z’. Starting from her home directory, Nelle types:

BASH

cd ~/shell-lesson-data/north-pacific-gyre

for datafile in NENE*A.txt NENE*B.txt

> do

>     echo $datafile

> done

Her next step is to decide what to call the files that the goostats.sh analysis program will create. Prefixing each input file’s name with ‘stats’ seems simple, so she modifies her loop to do that:

BASH

for datafile in NENE*A.txt NENE*B.txt

> do

>     echo $datafile stats-$datafile

> done

She hasn’t actually run goostats.sh yet, but now she’s sure she can select the right files and generate the right output filenames.

Typing in commands over and over again is becoming tedious, though, and Nelle is worried about making mistakes, so instead of re-entering her loop, she presses ↑. In response, the shell redisplays the whole loop on one line (using semi-colons to separate the pieces):

BASH

for datafile in NENE*A.txt NENE*B.txt; do echo $datafile stats-$datafile; done

Using the left arrow key, Nelle backs up and changes the command echo to bash goostats.sh:

BASH

for datafile in NENE*A.txt NENE*B.txt; do bash goostats.sh $datafile stats-$datafile; done

When she presses Enter, the shell runs the modified command. However, nothing appears to happen — there is no output. After a moment, Nelle realizes that since her script doesn’t print anything to the screen any longer, she has no idea whether it is running, much less how quickly. She kills the running command by typing Ctrl+C, uses ↑ to repeat the command, and edits it to read:

BASH

for datafile in NENE*A.txt NENE*B.txt; do echo $datafile;

bash goostats.sh $datafile stats-$datafile; done

When she runs her program now, it produces one line of output every five seconds or so 1518 times 5 seconds, divided by 60, tells her that her script will take about two hours to run. As a final check, she opens another terminal window, goes into north-pacific-gyre, and uses cat stats-NENE01729B.txt to examine one of the output files. It looks good, so she decides to get some coffee and catch up on her reading.

Another way to repeat previous work is to use the history command to get a list of the last few hundred commands that have been executed, and then to use !123 (where ‘123’ is replaced by the command number) to repeat one of those commands. For example, if Nelle types this:

BASH

history | tail -n 5

   456  ls -l NENE0*.txt

   457  rm stats-NENE01729B.txt.txt

   458  bash goostats.sh NENE01729B.txt stats-NENE01729B.txt

   459  ls -l NENE0*.txt

   460  history

then she can re-run goostats.sh on NENE01729B.txt simply by typing !458.

Challenge: doing a dry run

A loop is a way to do many things at once — or to make many mistakes at once if it does the wrong thing. One way to check what a loop would do is to echo the commands it would run instead of actually running them.
Suppose we want to preview the commands the following loop will execute without actually running those commands:

BASH

for datafile in *.pdb

> do

>   cat $datafile >> all.pdb

> done

What is the difference between the two loops below, and which one would we want to run?

BASH

# Version 1

for datafile in *.pdb

> do

>   echo cat $datafile >> all.pdb

> done

BASH

# Version 2

for datafile in *.pdb

> do

>   echo "cat $datafile >> all.pdb"

> done

Show me the solution

The second version is the one we want to run. This prints to screen everything enclosed in the quote marks, expanding the loop variable name because we have prefixed it with a dollar sign. It also does not modify nor create the file all.pdb, as the >> is treated literally as part of a string rather than as a redirection instruction.
The first version appends the output from the command echo cat $datafile to the file, all.pdb. This file will just contain the list; cat cubane.pdb, cat ethane.pdb, cat methane.pdb etc.
Try both versions for yourself to see the output! Be sure to change to the proper directory and open all.pdb file to view its contents.

Challenge: nested loops

:class: note

Suppose we want to set up a directory structure to organize some experiments measuring reaction rate constants with different compounds and different temperatures. What would be the result of the following code:

BASH

for species in cubane ethane methane

> do

>    for temperature in 25 30 37 40

>    do

>       mkdir $species-$temperature

>     done

> done

Show me the solution

We have a nested loop, i.e. contained within another loop, so for each species in the outer loop, the inner loop (the nested loop) iterates over the list of temperatures, and creates a new directory for each combination.
Try running the code for yourself to see which directories are created!

10. Shell scripting

Let’s start by going back to ~/shell-lesson-data/exercise-data/proteins$ and creating a new file, middle.sh which will become our shell script:

BASH

cd ~/shell-lesson-data/exercise-data/proteins

nano middle.sh

cat middle.sh

Add the following line to middle.sh and save:
- head -n 15 octane.pdb | tail -n 5
Once we have saved the file, we can ask the shell to execute the commands it contains. Our shell is called bash, so we run the following command:

BASH

bash middle.sh

What if we want to select lines from an arbitrary file? We could edit middle.sh each time to change the filename, but that would probably take longer than typing the command out again in the shell and executing it with a new file name. Instead, let’s edit middle.sh and make it more versatile:
- Edit middle.sh and replace the text octane.pdb with the special variable called $1.
  - Wrap $1 inside double quotes: "$1".
- $1 means ‘the first filename (or other argument) on the command line’.

BASH

nano middle.sh

cat middle.sh

bash middle.sh octane.pdb

bash middle.sh pentane.pdb

Currently, we need to edit middle.sh each time we want to adjust the range of lines that is returned. Let’s fix that by configuring our script to instead use three command-line arguments.
After the first command-line argument ($1), each additional argument that we provide will be accessible via the special variables $1, $2, $3, which refer to the first, second, third command-line arguments, respectively.
Edit middle.sh and replace 15 with "$2" and 5 with "$3"

BASH

nano middle.sh

cat middle.sh

bash middle.sh pentane.pdb 15 5

By changing the arguments to our command we can change our script’s behaviour:

BASH

bash middle.sh pentane.pdb 20 5

This works, but it may take the next person who reads middle.sh a moment to figure out what it does. We can improve our script by adding some comments at the top:
- A comment starts with a # character and runs to the end of the line.
- Add the following comments to middle.sh at the top:
  - # Select lines from the middle of a file.
  - #Usage: bash middle.sh filename end_line num_lines
What if we want to process many files in a single pipeline? For example, if we want to sort our .pdb files by length, we would type the following command because wc -l lists the number of lines in the files and sort -n sorts things numerically.

BASH

wc -l *.pdb | sort -n

We could put this in a file, but then it would only ever sort a list of .pdb files in the current directory. If we want to be able to get a sorted list of other kinds of files, we need a way to get all those names into the script.
We can’t use $1, $2, and so on because we don’t know how many files there are.
Instead, we use the special variable $@, which means, ‘All of the command-line arguments to the shell script’.
We also should put $@ inside double-quotes to handle the case of arguments containing spaces ("$@" is special syntax and is equivalent to "$1" "$2" …).
Create a file called sorted.sh inside shell-lesson-data/exercise-data/proteins with the following contents:

BASH

# Sort files by their length.

# Usage: bash sorted.sh one_or_more_filenames

wc -l "$@" | sort -n

Observe the following commands:

BASH

cd ~/shell-lesson-data/exercise-data/proteins

nano sorted.sh

cat sorted.sh

bash sorted.sh *.pdb ../creatures/*.dat

To turn your script into an executable file (run without bash command), the following line must be at the top of your script:

BASH

#!/bin/bash

and your script file must have executable permission:

BASH

chmod 755 sorted.sh

./sorted.sh

Challenge: list unique species

:class: note

Leah has several hundred data files, each of which is formatted like this:

BASH

2013-11-05,deer,5

2013-11-05,rabbit,22

2013-11-05,raccoon,7

2013-11-06,rabbit,19

2013-11-06,deer,2

2013-11-06,fox,1

2013-11-07,rabbit,18

2013-11-07,bear,1

An example of this type of file is given in shell-lesson-data/exercise-data/animal-counts/animals.csv.
We can use the command cut -d , -f 2 animals.txt | sort | uniq to produce the unique species in animals.txt.
In order to avoid having to type out this series of commands every time, a scientist may choose to write a shell script instead.
Write a shell script called species.sh that takes any number of filenames as command-line arguments, and uses a variation of the above command to print a list of the unique species appearing in each of those files separately.

Show me the solution

BASH

#!/bin/bash

# Script to find unique species in csv files where species is the second data field

# This script accepts any number of file names as command line arguments

# Loop over all files

for file in $@

do

  echo "Unique species in $file:"

  # Extract species names

  cut -d , -f 2 $file | sort | uniq

done

Suppose we have just run a series of commands that did something useful — for example, that created a graph we’d like to use in a paper. We’d like to be able to re-create the graph later if we need to, so we want to save the commands in a file.
Instead of typing them in again (and potentially getting them wrong) we can do this:

BASH

history | tail -n 5 > redo-figure-3.sh

The file redo-figure-3.sh now could contains:

BASH

297 bash goostats.sh NENE01729B.txt stats-NENE01729B.txt

298 bash goodiff.sh stats-NENE01729B.txt /data/validated/01729.txt > 01729-differences.txt

299 cut -d ',' -f 2-3 01729-differences.txt > 01729-time-series.txt

300 ygraph --format scatter --color bw --borders none 01729-time-series.txt figure-3.png

301 history | tail -n 5 > redo-figure-3.sh

After a moment’s work in an editor to remove the serial numbers on the commands, and to remove the final line where we called the history command, we have a completely accurate record of how we created that figure.
In practice, most people develop shell scripts by running commands at the shell prompt a few times to make sure they’re doing the right thing, then saving them in a file for re-use.
This style of work allows people to recycle what they discover about their data and their workflow with one call to history and a bit of editing to clean up the output and save it as a shell script.

11. Nelle’s Pipeline: Creating a Script

Nelle’s supervisor insisted that all her analytics must be reproducible. The easiest way to capture all the steps is in a script.
First we return to Nelle’s project directory:

BASH

cd ../../north-pacific-gyre/

then creates a file using nano …

BASH

nano do-stats.sh

…which contains the following:

BASH

#!/bin/bash

# Calculate stats for data files.

for datafile in "$@"

do

    echo $datafile

    bash goostats.sh $datafile stats-$datafile

done

… saves this in a file called do-stats.sh and set executable mode so that she can now re-do the first stage of her analysis by typing:

BASH

./do-stats.sh NENE*A.txt NENE*B.txt

She can also do the following so that the output is just the number of files processed rather than the names of the files that were processed.

BASH

./do-stats.sh NENE*A.txt NENE*B.txt | wc -l

One thing to note about Nelle’s script is that it lets the person running it decide what files to process. She could have written it as:

BASH

#!/bin/bash

# Calculate stats for Site A and Site B data files.

for datafile in NENE*A.txt NENE*B.txt

do

    echo $datafile

    bash goostats.sh $datafile stats-$datafile

done

The advantage is that this always selects the right files:
- she doesn’t have to remember to exclude the ‘Z’ files.
The disadvantage is that it always selects just those files — she can’t run it on all files (including the ‘Z’ files), or on the ‘G’ or ‘H’ files her colleagues in Antarctica are producing, without editing the script.
She could modify her script to check for command-line arguments, and use NENE*A.txt NENE*B.txt if none were provided. Of course, this introduces another tradeoff between flexibility and complexity.

Challenge: variables in shell scripts

:class: note

In the proteins directory, imagine you have a shell script called script.sh containing the following commands:

BASH

#!/bin/bash

head -n $2 $1

tail -n $3 $1

While you are in the proteins directory, you type the following command:

BASH

./script.sh '*.pdb' 1 1

Which of the following outputs would you expect to see?

All of the lines between the first and the last lines of each file ending in .pdb in the proteins directory
The first and the last line of each file ending in .pdb in the proteins directory
The first and the last line of each file in the proteins directory
An error because of the quotes around *.pdb

Show me the solution

The correct answer is 2.
The special variables $1, $2 and $3 represent the command line arguments given to the script, such that the commands run are:

BASH

head -n 1 cubane.pdb ethane.pdb octane.pdb pentane.pdb propane.pdb

tail -n 1 cubane.pdb ethane.pdb octane.pdb pentane.pdb propane.pdb

The shell does not expand '*.pdb' because it is enclosed by quote marks.
As such, the first argument to the script is '*.pdb' which gets expanded within the script by head and tail.

Challenge: find the longest file with a given extension

:class: note

Write a shell script called longest.sh that takes the name of a directory and a filename extension as its arguments, and prints out the name of the file with the most lines in that directory with that extension. For example:

BASH

./longest.sh shell-lesson-data/data/pdb pdb

would print the name of the .pdb file in shell-lesson-data/data/pdb that has the most lines.

Feel free to test your script on another directory e.g. ~~~bash bash longest.sh shell-lesson-data/writing/data txt~~~

Show me the solution

BASH

#!/bin/bash

# Shell script which takes two arguments:

#    1. a directory name

#    2. a file extension

# and prints the name of the file in that directory

# with the most lines which matches the file extension.

wc -l $1/*.$2 | sort -n | tail -n 2 | head -n 1

The first part of the pipeline, wc -l $1/*.$2 | sort -n, counts the lines in each file and sorts them numerically (largest last). When there’s more than one file, wc also outputs a final summary line, giving the total number of lines across all files. We use tail -n 2 | head -n 1 to throw away this last line.
With wc -l $1/*.$2 | sort -n | tail -n 1 we’ll see the final summary line: we can build our pipeline up in pieces to be sure we understand the output.

Challenge: script reading comprehension

:class: note

For this question, consider the shell-lesson-data/exercise-data/proteins directory once again. This contains a number of .pdb files in addition to any other files you may have created.
Explain what each of the following three scripts would do when run as bash script1.sh *.pdb, bash script2.sh *.pdb, and bash script3.sh *.pdb respectively.

BASH

# Script 1

echo *.*

BASH

# Script 2

for filename in $1 $2 $3

do

  cat $filename

done

BASH

# Script 3

echo $@.pdb

Show me the solution

In each case, the shell expands the wildcard in *.pdb before passing the resulting list of file names as arguments to the script. - Script 1 would print out a list of all files containing a dot in their name. The arguments passed to the script are not actually used anywhere in the script. - Script 2 would print the contents of the first 3 files with a .pdb file extension. $1, $2, and $3 refer to the first, second, and third argument respectively. - Script 3 would print all the arguments to the script (i.e. all the .pdb files), followed by .pdb. $@ refers to all the arguments given to a shell script.

BASH

 cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb.pdb

Challenge: debugging scripts

:class: note

Suppose you have saved the following script in a file called do-errors.sh in Nelle’s north-pacific-gyre/scripts directory:

BASH

# Calculate stats for data files.

for datafile in "$@"

do

  echo $datfile

  bash goostats.sh $datafile stats-$datafile

done

When you run it from the north-pacific-gyre directory, the output is blank.

BASH

bash do-errors.sh NENE*A.txt NENE*B.txt

To figure out why, re-run the script using the -x option:

BASH

bash -x do-errors.sh NENE*A.txt NENE*B.txt

What is the output showing you?
Which line is responsible for the error?

Show me the solution

The -x option causes bash to run in debug mode.
This prints out each command as it is run, which will help you to locate errors.
In this example, we can see that echo isn’t printing anything. We have made a typo in the loop variable name, and the variable datfile doesn’t exist, hence returning an empty string.

Content from Networked File System

Last updated on 2024-06-24 | Edit this page

1. Anyone knows/remembers?

Sandberg, R., Goldberg, D., Kleiman, S., Walsh, D., & Lyon, B. (1985, June). Design and implementation of the Sun network filesystem. In Proceedings of the Summer USENIX conference (pp. 119-130)

2. Design goal

Machine and operating system independence
Crash recovery
Transparent access
UNIX semantics maintained on client
Reasonable performance (target 80% as fast as local disk)

3. NSF design components

NFS protocol
Server side implementation
Client side implementation

4. NFS protocol

Remote Procedure Call (RPC) mechanism
- Simplify the definition, organization, and implementation of remote services.
Stateless protocol
- Parameters to each procedure call contain all the information necessary to complete the call.
- The server does not keep track of past requests. This makes crash recovery easy.
Transport independent (works with both TCP and UDP).
Key procedure parameter: a file handler (fh)

5. NFS Server

Commit modified data to stable storage before returning RPC calls
- Write to disk
New parameter, generation number, for inode and file system id.

6. NFS Client

Allow mount to attach remote file system
New Unix kernel interface for all file system types: Virtual FileSystem (VFS)
- Allows system calls to be unified
- VFS will automatically determine/interact with the correct file system types, including networked file systems.

7. Architectural Design

8. Challenge

Modify CloudLab’s experimental profile with the following requirements
- Two nodes
- One node setup as Ansible master (public IP)
- One node setup as Ansible host (public IP)
- On Ansible host, automate the setup of Ansible stack
Keyoutcome:
- By the time everything is setup, a webserver is available on the Ansible host.
- Which means we are doing two things:
  - Manual setup (for testing purpose)
  - Convert setup commands to bash scripts …
Relevant documents:
- Setup NSF
- Non-interactive passwordless SSH keygen
- Install and configure Ansibe
- Install and setup lamp on Ubuntu to setup and create a LAMP stack using Ansible playbooks.

Content from SSO: Single Sign On

Last updated on 2024-06-24 | Edit this page

1. Core SSO Elements

A centralized directory store that contains user identity and authorization information.
A tool for managing user information in the directory.
A mechanism for authenticating user identities. It could be the LDAP store itself, or a Kerberos ticket-based authentication system.
Centralized-identity-and-authentication-aware versions of the C library routines that look up user attributes. This is often configured through the name service switch file, /etc/nsswitch.conf.

2. LDAP: Lightweight Directory Access Protocol

Assumptions:
- Data objects are relatively small.
- The database will be widely replicated and cached.
- The information is attribute-based.
- Data are read often but written infrequently.
- Searching is a common operation.
Common usage: A central repository for login names, passwords, and other account attributes.

3. Structure of LDAP data

Property lists (entries).
Each entry consists of a set of named attributes along with those attributes’ values.
Every attribute can have multiple values.

Callout

Example attributes

BASH

dn: uid=ghopper,ou=People,dc=navy,dc=mil

objectClass: top

objectClass: person

objectClass: organizationalPerson

objectClass: posixAccount

objectClass: shadowAccount

uid: ghopper

cn: Grace Hopper

userPassword: {crypt}$1$pZaGA2RL$MPDJoc0afuhHY6k8HQFp0

loginShell: /bin/bash

uidNumber: 1202

gidNumber: 1202

homeDirectory: /home/ghopper

4. Common attribute names

Attribute	Stand for	What it is
o	Organization	Identifies a site’s top-level entry (not used at sites that model their hierarchy on DNS)
ou	Organizational Unit	A logical subdivision, e.g. “marketing”
cn	Common name	The most natural name to represent the entry
dc	Domain component	Used at sites that model their hierarchy on DNS
objectClass	Object class	Schema to which this entry’s attributes conform

5. Hands-on: update your webserver profile

Create a new branch from webserver and call webserver-ldap.

c - Update profile.py to match the following content

Update and instantiate an experiment from this new branch.

6: Hands-on: update and launch CloudLab

Create a new branch from webserver and call webserver-ldap.

Update profile.py to match the following content

Update and instantiate an experiment from this new branch.

7. Hands-on: install and configure LDAP

Connect into the ldap node and run the following commands

bash

clear

sudo apt update

sudo apt install -y slapd ldap-utils

Provide a (simple) password for LDAP server
- Press the Tab key to go to Ok, then press Enter.
- Retype the password, then Tab, Ok, and Enter again.
Setup OpenLDAP server

sudo dpkg-reconfigure slapd

Refuse to omit OpenLDAP server configuration
- Keep the default No (or make sure that you stay on No), then press Enter.
Enter wcupa.edu as default DNS domain name
- Press the Tab key to go to Ok, then press Enter.
Enter wcupa.edu as the name of the organization to use in the base DN
- Press the Tab key to go to Ok, then press Enter.
Enter the password (previously created) for your LDAP directory
- Press the Tab key to go to Ok, then press Enter.
Enter the password again for your LDAP directory
- Press the Tab key to go to Ok, then press Enter.
Select Yes to remove the database when slapd is purged.
- Press the Tab key to go to Yes, then press Enter.
Select Yes to move old database
- Press the Tab key to go to Yes, then press Enter.
Enable firewall rules

sudo ufw allow ldap

Create a file named basedn.ldif with the following contents

Run the following command to populate LDAP.
- Enter the password for LDAP previously created, then press Enter.

ldapadd -x -D cn=admin,dc=wcupa,dc=edu -W -f basedn.ldif

Enter LDAP Password:

adding new entry "ou=People,dc=wcupa,dc=edu"



adding new entry "ou=Groups,dc=wcupa,dc=edu"



adding new entry "cn=CSC,ou=Groups,dc=wcupa,dc=edu"

Run the following command to generate a password hash
- The password is rammy

slappasswd

New password:

Re-enter new password:

{SSHA}N8Rfc9lvnKb8A3oUOxUOBlDen4v8FYL/

Create a file named users.ldif using the following content
- Replace the hash in userPassword field with the password hash you just created.

Populate LDAP with user info

ldapadd -x -D cn=admin,dc=wcupa,dc=edu -W -f users.ldif

Enter LDAP Password:

adding new entry "uid=student,ou=People,dc=wcupa,dc=edu"

Test LDAP

ldapsearch -x -LLL -b dc=wcupa,dc=edu 'uid=student' cn gidNumber

dn: uid=student,ou=People,dc=wcupa,dc=edu

cn: student

gidNumber: 5000

8. Hands-on: Setup SSO on client

Connect to webserver node and run the following commands

bash

clear

sudo apt update

sudo apt-get update

sudo apt install -y libnss-ldap libpam-ldap ldap-utils

Configure ldap-auth-config
- Based on the profile.py, ldap will have 192.168.1.3 as a predefined IP address.
- You can test by run cat /etc/hosts on ldap.
- Enter ldap://192.168.1.3 as LDAP server Uniform Resource Identifier.
  - PAY ATTENTION TO ALL CHARACTERS
- Distinguished name of the search base: dc=wcupa,dc=edu
- LDAP version to use: 3
- Make local root Database admin: Yes
- Does the LDAP database require login? No
- LDAP account for root: cn=admin,dc=wcupa,dc=edu
- LDAP root account password: Use the password you created earlier
Enable LDAP profile for NSS
- Run sudo nano /etc/nsswitch.conf
- Change the configurations of passwd and group to: compat systemd ldap
- Save and quit
Enable LDAP profile PAM
- Run sudo nano /etc/pam.d/common-password
  - Find the line with the phrase use_authtok and delete that phrase
Run sudo nano /etc/pam.d/common-session
- Add the following line to the end: session optional pam_mkhomedir.so skel=/etc/skel umask=077
Test that now you can authenticate user student on webserver via LDAP

getent passwd student

student:x:10000:5000:Golden Ram:/home/student:/bin/dash

lngo@webserver:~$ su student

Password:

$

Content from Practice Scenarios

Last updated on 2024-06-24 | Edit this page

1. Preparation

Launch the webserver-ldap experiment on CloudLab. We will assume that there are three connected nodes:
- webserver
- observer
- ldap.
The webserver should have an apache server ready (setup_apache.sh).
The ldap should have an ldap server ready. There should be one user account (student/rammy) created.

2. Scenario 1: LDAP-protected web server

Enable the public_html directory and create an index.html page in that directory that displays Hello World when access.
Secure this location by requiring viewer to authenticate via the ldap server with the student/rammy login/password.

3. Scenario 2: Shared home directory

Add the following users to the LDAP server
- Make sure to change the password hash (password remains rammy)
- Confirm that the users were added correctly by view the page from scenario 1 using users merino and dorper.

Set up the NFS server on ldap.
- Create a directory called nfs/home and make it available via NFS for both webserver and observer.
Setup NFS clients on webserver and observer.
- Create nfs/home and mount /nfs/home from ldap
Using su (do not use sudo), confirm that you can switch users, and that their home directories are shared across ldap, webserver, and observer.

4. Scenario 3: Webserver/Ansible

Review https://github.com/CSC586-WCU/csc586cloud/tree/webserver for correction against previous class’ errors.
Instantiate the webserver profile.
Setup the Ansible control node to be an LDAP node.
Configure the apache server on the Ansible host node (previously installed via Ansible lamp stack) to be authenticated with the LDAP server on the control node.
- You can use the template users.ldif file from the lecture.

Content from Introduction to Docker Containers

Last updated on 2024-06-24 | Edit this page

0. Setup

Go to your GitHub project repository (on the first day), create a new branch called docker from the main branch, and modify to add the following components from this link:
- The docker_config directory and its content (daemon.json).
- The install_docker.sh file.
- The profile.py file.
Check and make sure all the contents are correctly copied!
Go to CloudLab, open your profile, switch to Edit mode and click Update. The new docker branch should show up.
Instantiate an experiment from this branch.
Only login after the Startup column becomes Finished and type the following command: sudo docker info | grep "Docker Root Dir"
Confirm that you have something similar to the screenshot below

9. Hands-on: Getting started

SSH into your CloudLab experiment.
Check version of Docker:

$ docker version

Docker is client-server application.
- Docker daemon (Engine): receives and processes incoming Docker API request and requires root privilege.
- Docker Hub registry: collection of public images (https://hub.docker.com/).
- Docker client : Talks to the Docker daemon via the docker API and the registry API.

10. Hands-on: Hello world

Docker containers are instantiated from Docker images.
You can check availability of local images and containers.

$ docker image ls

$ docker container ls

We can issue the following to start a service that will echo hello world to the screen.
This requires a Linux container to run the echo command.

$ docker run alpine echo hello world

docker: invoke the container engine.
run: subcommand to run a container.
alpine: name of the image based on which a container will be launched.
echo hello world: the command to be executed in the container environment.

$ docker image ls

$ docker container ls

$ docker container ls --all

$ docker run alpine echo hello world

$ docker container ls --all

11. Hands-on: Interactive container

We can launch a container and get into the shell of the container.

$ docker run -it ubuntu bash

You are now in a new prompt: a shell inside the container
-it: combination of -i and -t.
- -i tells Docker to connect to the container’s stdin for interactive mode
- -t tells Docker that we want a pseudo-terminal

12. Hands-on: run something interactively

The following commands are done inside the container.
Let’s attempt to run figlet

# figlet hello

There will be an error.
The current container does not have the figlet program yet.

13. Hands-on: installing and then running

The following commands are done inside the container.

# apt-get update

# apt-get install -y figlet

# figlet hello

14. Exercise

Type exit to shutdown the container and back to your normal terminal.
Repeat the process of launching an interactive container from start and try running figlet again.
Is the program still there?

15. Hands-on: Background container

You should have already exited out of the container shell and back to the CloudLab environment.
Run the following command
Press Ctrl-C to stop after a few time stamps.

$ docker run jpetazzo/clock

16. Hands-on: Background container

Run the following command

$ docker run -d jpetazzo/clock

$ docker ps

17. Hands-on: View log of your background container

Use the first four characters of your container ID to view the log of the running Docker container
Use --tail N to only look at the tail of the log.

$ docker container ls

$ docker logs --tail 5 YOUR_CONTAINER_ID

18. Exercise

Find out how to kill a running container by using docker kill. {: .challenge}

19. Docker images

Image = files + metadata
The files form the root filesystem of the container
The metadata describes things such as:
- The author of the image
- The command to execute in container when starting it
- Environment variables to be set
- …
Images are made of layers, conceptually stacked on top of each other.
Each layer can add, change, and remove files and/or metadata.
Images can share layers to optimize disk usage, transfer times, and memory use.

20. Example of a Java webapp

CentOS base layer
Packages and configuration files added by our local IT
JRE
Tomcat
Our application’s dependencies
Our application code and assets
Our application configuration

21. The read-write layer

22. Containers versus images

An image is a read-only filesystem.
A container is an encapsulated set of processes running in a read-write copy of that filesystem.
To optimize container boot time, copy-on-write is used instead of regular copy.
docker run starts a container from a given image.

Object-oriented analogy
- Images are conceptually similar to classes
- Layers are conceptually similar to inheritance
- Containers are conceptually similar to instances

23. How do we change an image?

It is read-only, we don’t.
We create a new container from the image
We make changes to the container.
When we are satisfied with the changes, we transform them into a new layer.
A new image is created by stacking the new layer on top of the old image.

24. Image namespaces

Official images (ubuntu, busybox, …)
- Root namespace.
- Small, distro images to be used as bases for the building process.
- Ready-to-use components and services (redis, postgresl …)
User (and organizations) images: <registry_name>/<image_name>:[version]
- jpetazzo/clock:latest
- linhbngo/csc331:latest
Self-hosted images
- Images hosted by third party registry
- URL/<image_name>

25. Hands-on: show current images

If this is a new experiment, go ahead and run the following commands to get some images loaded.

$ docker run hello-world

$ docker run alpine echo This is alpine

$ docker run ubuntu echo This is ubuntu

$ docker image ls

26. Hands-on: search images

We can search for available images in the public Docker Hub

$ docker search mysql

27. General steps to create an image

Create a container using an appropriate base distro
Inside the container, install and setup the necessary software
Review the changes in the container
Turn the container into a new image
Tag the image

28. Hands-on: create a container with a base distro

Remember to note your container ID.

$ docker run -it ubuntu

29. Hands-on: install software inside the container

# apt-get update

# apt-get install -y figlet

# exit

30. Hands-on: check for differences

Remember to note your container ID.

$ docker diff 16b0

A: A file or directory was added
D: A file or directory was deleted
C: A file or directory was changed

31. Hands-on: commit changes into a new image

Remember to note your container ID.

BASH

$ docker commit 16b0 ubuntu_figlet_$USER

$ docker image ls

$ docker history fe101

From the screenshot:
- The docker commit ... command created a new image named ubuntu_figlet_lngo that has the following unique id: fe101865e2ed.
- The docker image ls command shows this image.
- The docker history fe101 shows the layers making up this image, which include the layer that is the base ubuntu image 54c9d.

32. Exercise

Test run the new ubuntu_figlet image by launching an interactive container using this image, then immediately run figlet hello world.

33. Automatic image construction: Dockerfile

A build recipe for a container image.
Contains a series of instructions telling Docker/Podman how an image is to be constructed.
The docker build command builds an image from a Dockerfile.

34. Hands on: writing the first Dockerfile

The following commands are done in the terminal (Ubuntu WSL on Windows/Mac Terminal).

$ cd

$ mkdir myimage

$ cd myimage

$ nano Dockerfile

Type the following contents into the nano editor

FROM: the base image for the build
RUN: represents one layer of execution.
RUN commands must be non-interactive.
Save and quit after you are done.
To build the image

35. Hands on: build the image

The following commands are done in the terminal (Ubuntu WSL on Windows/Mac Terminal).
Check that you are still inside myimage

$ pwd

$ docker build -t figlet_$USER .

-t indicates a tag named figlet will be applied to the image.
. indicates that the Dockerfile file is in the current directory.

The build context is the Dockerfile file in the current directory (.) and is sent to the container engine. This context allows constructions of images with additional resources from local files inside the build context.
The base image is Ubuntu.
For each RUN statement, a container is created from the base image for the execution of the
commands. Afterward, the resulting container is committed into an image that becomes the base for the next RUN.

36. Exercise

Use docker image ls and docker history ... to check which layer is reused for this image.
Test run the new ubuntu_figlet image by launching an interactive container using this image, then immediately run figlet hello world.

37. Hands on: CMD

Edit your Dockerfile so that it has the following content

CMD: The command to be run if the container is invoked without any command.
Rebuild the image with the tag figlet_cmd_$USER.
Run the following command

$ docker run figlet_cmd_$USER

Question: Did we use any additional storage for this new image?

38. Hands on: Overriding CMD

With CMD, the -it flag does not behave as expected without a parameter.
To override CMD, we can provide a command

$ docker run -it figlet_cmd_$USER

$ docker run -it figlet_cmd_$USER bash

39. Hands on: ENTRYPOINT

-ENTRYPOINT defines a base command (and its parameters) for the container. - The command line arguments are appended to those parameters. - Edit Dockerfile as follows:

Rebuild the image with the tag figlet_entry_$USER.
Run the followings:

BASH

docker run figlet_entry_$USER golden rams

40. Hands on: Why not both

ENTRYPOINT and CMD can be used together.
The command line arguments are appended to those parameters.
Edit Dockerfile as follows:

Rebuild the image with the tag figlet_both_$USER.
Run the followings:

$ docker run figlet_both_$USER golden rams

$ docker run figlet_both_$USER

41. Hands on: Caveat

/bin/bash does not work as expected.

$ docker run -it figlet_both_$USER bash

$ docker run -it --entrypoint bash figlet_both_$USER

# exit

42. Hands on: Importing and building external code

Create the following file called hello.c:

Create the following Dockerfile called Dockerfile.hello:

You can build an image with a specific Dockerfile

$ docker build -t hello_$USER -f Dockerfile.hello .

$ docker run hello_$USER

43. Challenge

Create an account on Docker Hub.
Find out how to login from the command line and push the recently created hello image to your Docker Hub account.

44. Networking for container

How can services provided by a container become available to the world?

45. Hands on: a simple web server

$ docker run -d -P nginx

$ docker ps

-P: make this service reachable from other computers (--publish-all)
-d : run in background
Where is the port?

47. Hands on: How does the container engine know which port to map?

This is described in the Dockerfile and can be inspected.
The keyword for this action is EXPOSE.

Why do we have to map ports?
- Containers cannot have public IPv4 addresses.
- We are running low on IPv4 addresses anyway.
- Internally to host, containers have their own private addresses
  - Services have to be exposed port by port.
  - These have to be mapped to avoid conflicts.

48. Hands on: manual allocation of port numbers

$ docker run -d -p 8000:80 nginx

$ docker run -d -p 8080:80 -p 8888:80 nginx

Convention: port-on-host:port-on-container
Check out the web servers at all of these ports

49. Integrating containers into your infrastructure

Manually add the containers to the infrastructure via container-generated public port.
Predetermine a port on the infrastructure, then set the corresponding port mapping when run the containers.
Use a network plugin to connect the containers with network tunnels/VLANS …
Deploy containers across a physical cluster using Kubernetes.

50. Container network model

Provide the notion of a network to connect containers
Provide top level command to manipulate and observe these networks:
- docker network

$ docker network

$ docker network ls

What’s in a container network?
- Conceptually, it is a virtual switch
- It can be local to a single Engine (on a single host) or global (spanning multiple hosts).
- It has an associated IP subnet.
- The container engine will allocate IP addresses to the containers connected to a network.
- Containers can be connected to multiple networks.
- Containers can be given per-network names and aliases.
- The name and aliases can be resolved via an embedded DNS server.

51. Hands on: create a network

$ docker network create ramnet

$ docker network ls

52. Hands on: placing containers on a network

$ docker run -d --name es --net ramnet elasticsearch:2

$ docker run -it --net ramnet alpine sh

# ping es

# exit