Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page should serve as a reference for the many "things Linux" we use in this course. It is by no means complete – Linux is **huge** – but offers introductions to many important topics.

...

  • Macs and Linux have a Terminal program built-in
  • Windows options:

Use ssh (secure shell) to login to a remote computers.

Code Block
languagebash
titleSSH to a remote computer
# General form:
ssh <user_name>@<full_host_name>

# For example
ssh abattenh@ls6.tacc.utexas.edu

...

  • cat outputs all the contents of its input (one or more files and/or standard input) or the specified file
    • cat -n prefixes each line of output with its line number
    • CAUTION – only use on small files!
  • zcat <file.gz> like cat, but understands the gzip (.gz) format, and decompresses the data before writing it to standard output
    • CAUTION – only use on small files!
    • Another CAUTION – does not understand .zip or .bz2 compression formats
  • more and less pagers
    • both display their (possibly very long) input one Terminal "page" at a time
    • in more:
      • use spacebar to advance a page
      • use q or Ctrl-c to exit more
    • in less:
      • q – quit
      • Ctrl-f or space – page forward
      • Ctrl-b – page backward
      • /<pattern> – search for <pattern> in forward direction
        • n – next match
        • N – previous match
      • ?<pattern> – search for <pattern> in backward direction
        • n – previous match going back
        • N – next match going forward
    • use less -N to display line numbers
    • less -I says to do pattern matching ignoring case
    • can be used directly on .gz format files
  • head and tail
    • show you the first or last 10 lines (by default) of their input
    • head -n 20 or just head -20 shows the first 20 lines
    • tail -n 2 or just tail -2 shows the last 2 lines
    • tail -n +100 shows lines starting at line 100
    • tail -n +100 | head -20 shows 20 lines starting at line 100
    • tail -f shows the last lines of a file, then follows the output as more lines are written (Ctrl-c to quit)
  • gunzip -c <file.gz> | more (or less) – like zcat, un-compresses lines of <file.gz> and outputs them to standard output
    • <file.gz> is not altered on disk
    • always pipe the output to a pager!

...

  • cut command lets you isolate ranges of data from its input lines
    • cut -f <field_number(s)> extracts one or more fields (-f) from each line of its input
      • use -d <delim> to change the field delimiter (Tab by default)
    • cut -c <character_number(s)> extracts one or more characters (-c) from each line of input
    • the <numbers> can be
      • a comma-separated list of numbers (e.g. 1,4,7)
      • a hyphen-separated range (e.g. 2-5)
      • a trailing hyphensays "and all items after that" (e.g. 3,7-)
    • cut does not re-order fields, so cut -f 5,3,1 acts like -f 1,3,5
  • sort sorts its input lines using an efficient algorithm
    • by default sorts each line lexically (as strings), low to high
      • use -n sort numerically (-n)
      • use -V for Version sort (numbers with surrounding text)
      • use -r to reverse the sort order
    • use one or more -k <start_field_number>,<end_field_number> options to specify a range of "keys" (fields) to sort on
      • e.g. -k1,1 -k2,2nr  to sort field 1 lexically and field 2 as a number high-to-low
      • by default, fields are delimited by whitespace -- one or more spaces or Tabs 
        • use -t <delim> to change the field delimiter (e.g. -t "\t" for Tab only; ignore spaces)
  • uniq -c counts groupings of its input (which must be sorted) and reports the text and count for each group
    • use cut | sort | uniq -c for a quick-and-dirty histogram

awk

awk is a powerful scripting language that is easily invoked from the command line. Its field-oriented capabilities make it the go-to tool for manipulating table-like delimited lines of text.

  • awk '<script>' - the '<script>'  is applied to each line of input (generally piped in)
  • always enclose '<script>' in single quotes to inhibit shell evaluation, because awk has its own set of metacharacters that are different from the shell's

Example that prints the average of its input numbers (echo -e converts backslash escape characters like newline \n to the ASCII newline character so that the numbers appear on separate lines)

...

Here is an excellent awk tutorial, very detailed and in-depth

cut versus awk

The basic functions of cut and awk are similar – both are field oriented. Here are the main differences:

  • Default field separators
    • Tab is the default field separator for cut
    • whitespace (one or more spaces or tabs Tabs) is the default field separator for awk
  • Re-ordering
    • cut cannot re-order fields
    • awk can re-order fields, based on the order you specify
  • awk is a full-featured programming language while cut is just a single-purpose utility.

...

  • grep -P '<pattern>' searches for <pattern> in its input, and only outputs lines containing it
    • always enclose '<pattern>' in single quotes to inhibit shell evaluation!
      • pattern-matching metacharacters in grep are very different from those in the shell
    • -P says to use Perl patterns, which are much more powerful (and standard) than standard default grep patterns
    • -v (inverse match) – only print lines with no match
    • -n (line number) – prefix output with the line number of the match
    • -i  (case insensitive) – ignore case when matching
    • -l says return only the names of files that do contain the pattern match
    • -L says return only the names of files that do not contain the pattern match
    • -c says just return a count of line matches
    • -A <n> (After) and -B <n> (Before) – output ' <n>' number of lines after or before a match

A regular expression (regex) is a pattern of literal characters to search for and metacharacters that control and modify how matching is done.

...

A regex <pattern> can contain special match metacharacters and modifiers. The ones below are Perl metacharacters, which are the "gold standard", supported by most languages (e.g. grep -P)

  • ^ – matches beginning of line
  • $ – matches end of line
  • .  – (period) matches any single character
  • * – modifier; place after an expression to match 0 or more occurrences
  • + – modifier, place after an expression to match 1 or more occurrences
  • ? – modifier, place after an expression to match 0 or 1 occurrences
  • \s – matches any whitespace whitespace character (\S any non-whitespace)
  • \d – matches digits 0-9
  • \w – matches any word character: A-Z, a-z, 0-9 and _ (underscore)
  • \t matches Tab
  • \n matches Linefeed linefeed\r matches Carriage carriage return
  • [xyz123] – matches any single character (including special characters) among those listed between the brackets [ ]
    • this is called a character class.
    • use [^xyz123] to match any single character not listed in the class
  • (Xyz|Abc) – matches either Xyz or Abc or any text or expressions inside parentheses separated by | characters
    • note that parentheses ( ) may also be used to capture matched sub-expressions for later use

Regular expression modules are available in nearly every programming language (Perl, Python, Java, PHP, awk, even R)

...

utilitydefault delimiterhow to changeexample
cutTab-d or --delimiter optioncut -d ':' -f 1 /etc/passwd
sortwhitespace
(one ore or more spaces or Tabs Tabs)
-t or --field-separator optionsort -t ':' -k1,1 /etc/passwd
awk

whitespace (one ore or more spaces or Tabs Tabs)

Note: some older versions of awk do not treat Tabs Tabs as field separators.

  • In the BEGIN {  } block
    • FS= (input field separator)
    • OFS= (output field separator)
  • -F or --field-separator option

cat /etc/fstab | grep -v '^#' | awk 'BEGIN{OFS="\t"}{print $2,$1}'

cat /etc/passwd | awk -F ":" '{print $1}'
joinone or more spaces-t option
join -t $'\t' -j 2 file1 file12
perlwhitespace
(one ore or more spaces or Tabs Tabs)
when auto-splitting input with -a
-F'/<pattern>/' optioncat /etc/fstab | grep -v '^#' | perl -F'/\s+/' -a -n -e 'print "$F[1]\t$F[0]\n";'
readwhitespace
(one or more spaces or Tabs Tabs)
IFS= (input field separator) optionNote that a bare IFS= removes any field separator, so whole lines are read each loop iteration.

...