Page Comparison

This page should serve as a reference for the many "things Linux" we use in this course. It is by no means complete – Linux is **huge** – but offers introductions to many important topics.

...

Macs and Linux have a Terminal program built-in
Windows options:
- Windows 10+
  - Command Prompt and PowerShell programs have ssh and scp (may require latest Windows updates)
    - Start menu → Search for Command
  - Putty – http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
    - simple Terminal and file copy programs
    - download either the Putty installer or just putty.exe (Terminal) and pscp.exe (secure copy client)
  - Windows Subsystem for Linux – Windows 10 Professional includes a Ubuntu-like bash shells
    - See https://docs.microsoft.com/en-us/windows/wsl/install-win10
    - We recommend the Ubuntu Linux distribution, but any Linux distribution will have an SSH client

Use ssh (secure shell) to login to a remote computers.

Code Block

language	bash
title	SSH to a remote computer

# General form:
ssh <user_name>@<full_host_name>

# For example
ssh abattenh@ls6.tacc.utexas.edu

...

cat outputs all the contents of its input (one or more files and/or standard input) or the specified file
- cat -n prefixes each line of output with its line number
- CAUTION – only use on small files!
zcat <file.gz> like cat, but understands the gzip (.gz) format, and decompresses the data before writing it to standard output
- CAUTION – only use on small files!
- Another CAUTION – does not understand .zip or .bz2 compression formats
more and less pagers
- both display their (possibly very long) input one Terminal "page" at a time
- in more:
  - use spacebar to advance a page
  - use q or Ctrl-c to exit more
- in less:
  - q – quit
  - Ctrl-f or space – page forward
  - Ctrl-b – page backward
  - /<pattern> – search for <pattern> in forward direction
    - n – next match
    - N – previous match
  - ?<pattern> – search for <pattern> in backward direction
    - n – previous match going back
    - N – next match going forward
- use less -N to display line numbers
- less -I says to do pattern matching ignoring case
- can be used directly on .gz format files
head and tail
- show you the first or last 10 lines (by default) of their input
- head -n 20 or just head -20 shows the first 20 lines
- tail -n 2 or just tail -2 shows the last 2 lines
- tail -n +100 shows lines starting at line 100
- tail -n +100 | head -20 shows 20 lines starting at line 100
- tail -f shows the last lines of a file, then follows the output as more lines are written (Ctrl-c to quit)
gunzip -c <file.gz> | more (or less) – like zcat, un-compresses lines of <file.gz> and outputs them to standard output
- <file.gz> is not altered on disk
- always pipe the output to a pager!

...

cut command lets you isolate ranges of data from its input lines
- cut -f <field_number(s)> extracts one or more fields (-f) from each line of its input
  - use -d <delim> to change the field delimiter (Tab by default)
- cut -c <character_number(s)> extracts one or more characters (-c) from each line of input
- the <numbers> can be
  - a comma-separated list of numbers (e.g. 1,4,7)
  - a hyphen-separated range (e.g. 2-5)
  - a trailing hyphensays "and all items after that" (e.g. 3,7-)
- cut does not re-order fields, so cut -f 5,3,1 acts like -f 1,3,5
sort sorts its input lines using an efficient algorithm
- by default sorts each line lexically (as strings), low to high
  - use -n sort numerically (-n)
  - use -V for Version sort (numbers with surrounding text)
  - use -r to reverse the sort order
- use one or more -k <start_field_number>,<end_field_number> options to specify a range of "keys" (fields) to sort on
  - e.g. -k1,1 -k2,2nr to sort field 1 lexically and field 2 as a number high-to-low
  - by default, fields are delimited by whitespace -- one or more spaces or Tabs
    - use -t <delim> to change the field delimiter (e.g. -t "\t" for Tab only; ignore spaces)
uniq -c counts groupings of its input (which must be sorted) and reports the text and count for each group
- use cut | sort | uniq -c for a quick-and-dirty histogram

awk

awk is a powerful scripting language that is easily invoked from the command line. Its field-oriented capabilities make it the go-to tool for manipulating table-like delimited lines of text.

awk '<script>' - the '<script>' is applied to each line of input (generally piped in)
always enclose '<script>' in single quotes to inhibit shell evaluation, because awk has its own set of metacharacters that are different from the shell's

Example that prints the average of its input numbers (echo -e converts backslash escape characters like newline \n to the ASCII newline character so that the numbers appear on separate lines)

...

Here is an excellent awk tutorial, very detailed and in-depth

cut versus awk

The basic functions of cut and awk are similar – both are field oriented. Here are the main differences:

Default field separators
- Tab is the default field separator for cut
- whitespace (one or more spaces or tabs Tabs) is the default field separator for awk
Re-ordering
- cut cannot re-order fields
- awk can re-order fields, based on the order you specify
awk is a full-featured programming language while cut is just a single-purpose utility.

...

grep -P '<pattern>' searches for <pattern> in its input, and only outputs lines containing it
- always enclose '<pattern>' in single quotes to inhibit shell evaluation!
  - pattern-matching metacharacters in grep are very different from those in the shell
- -P says to use Perl patterns, which are much more powerful (and standard) than standard default grep patterns
- -v (inverse match) – only print lines with no match
- -n (line number) – prefix output with the line number of the match
- -i (case insensitive) – ignore case when matching
- -l says return only the names of files that do contain the pattern match
- -L says return only the names of files that do not contain the pattern match
- -c says just return a count of line matches
- -A <n> (After) and -B <n> (Before) – output ' <n>' number of lines after or before a match

A regular expression (regex) is a pattern of literal characters to search for and metacharacters that control and modify how matching is done.

...

A regex <pattern> can contain special match metacharacters and modifiers. The ones below are Perl metacharacters, which are the "gold standard", supported by most languages (e.g. grep -P)

^ – matches beginning of line
$ – matches end of line
. – (period) matches any single character
* – modifier; place after an expression to match 0 or more occurrences
+ – modifier, place after an expression to match 1 or more occurrences
? – modifier, place after an expression to match 0 or 1 occurrences
\s – matches any whitespace whitespace character (\S any non-whitespace)
\d – matches digits 0-9
\w – matches any word character: A-Z, a-z, 0-9 and _ (underscore)
\t matches Tab;
\n matches Linefeed linefeed; \r matches Carriage carriage return
[xyz123] – matches any single character (including special characters) among those listed between the brackets [ ]
this is called a character class.
use [^xyz123] to match any single character not listed in the class
(Xyz|Abc) – matches either Xyz or Abc or any text or expressions inside parentheses separated by | characters
note that parentheses ( ) may also be used to capture matched sub-expressions for later use

Regular expression modules are available in nearly every programming language (Perl, Python, Java, PHP, awk, even R)

...

utility	default delimiter	how to change	example
cut	Tab	-d or --delimiter option	`cut -d ':' -f 1 /etc/passwd`
sort	whitespace (one ore or more spaces or Tabs Tabs)	-t or --field-separator option	`sort -t ':' -k1,1 /etc/passwd`
awk	whitespace (one ore or more spaces or Tabs Tabs) Note: some older versions of awk do not treat Tabs Tabs as field separators.	In the BEGIN { } block FS= (input field separator) OFS= (*output field separator) -F or --field-separator* option	`cat /etc/fstab \| grep -v '^#' \| awk 'BEGIN{OFS="\t"}{print $2,$1}'` `cat /etc/passwd \| awk -F ":" '{print $1}'`
join	one or more spaces	-t option	`join -t $'\t' -j 2 file1 file12`
perl	whitespace (one ore or more spaces or Tabs Tabs) when auto-splitting input with -a	-F'/<pattern>/' option	`cat /etc/fstab \| grep -v '^#' \| perl -F'/\s+/' -a -n -e 'print "$F[1]\t$F[0]\n";'`
read	whitespace (one or more spaces or Tabs Tabs)	IFS= (input field separator) option	Note that a bare IFS= removes any field separator, so whole lines are read each loop iteration.

...

Versions Compared

Old Version 114

New Version 115

Key

awk

cut versus awk