Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • echo <text> prints the specified text on standard output
    • evaluation of metacharacters (special characters) inside the text may be performed first
    • -e says to enable interpretation of backslash escapes such as \t (tab) and \n newline
    • -n says not to output the trailing newline
  • wc -l  reports the number of lines (-l) in its input
    • wc -c reports the number of characters (-c) in its input
    • wc -w reports the number of words (-w) in its input
  • history lists your command history to the terminal
    • redirect to a file to save a history of the commands executed in a shell session
    • pipe to grep to search for a particular command
    • re-execute a previous command via !<NN> where <NN> is the history line number
  • env lists all the environment variables currently defined in your login session
  • seq N produce N numbers, 1-N, each on a separate line
  • xargs transfers data on its standard input to the command line of the specified command
    • e.g. ls ~/*.txt | xargs echo
  • which <pgm> searches all $PATH directories to find <pgm> and reports its full pathname
    • will report all the places it looked if <pgm> was not found
    • type <pgm> is more general and works for functions and aliases
  • du <file_or_directory..file_or_directory>
    • shows the disk usage (size) of the specified files/directories
    • -h says report the size in human-readable form (e.g. 12M instead of 12201749)
    • -s says summarize the directory size for directories
    • -c says print a grand total when multiple items are specified
  • groups - lists the Unix groups you belong to

...

  • cut command lets you isolate ranges of data from its input lines
    • cut -f <field_number(s)> extracts one or more fields (-f) from each line of its input
      • use -d <delim> to change the field delimiter (Tab by default)
    • cut -c <character_number(s)> extracts one or more characters (-c) from each line of input
    • the <numbers> can be
      • a comma-separated list of numbers (e.g. 1,4,7)
      • a hyphen-separated range (e.g. 2-5)
      • a trailing hyphen says "and all items after that" (e.g. 3,7-)
    • cut does not re-order fields, so cut -f 5,3,1 acts like -f 1,3,5
  • sort sorts its input lines using an efficient algorithm
    • by default sorts each line lexically (as strings), low to high
      • use -n sort numerically (-n)
      • use -V for Version sort (numbers with consistent surrounding text)
      • use -r to reverse the sort order
    • use one or more -k <start_field_number>,<end_field_number> options to specify a range of "keys" (fields) to sort on
      • e.g. -k1,1 -2,2nr  to sort field 1 lexically and field 2 as a number high-to-low
      • by default, fields are delimited by whitespace -- one or more spaces or Tabs 
        • use -t <delim> to change the field delimiter (e.g. -t "\t" for Tab only; ignore spaces)
  • uniq -c counts groupings of its input (which must should be sorted) and reports the text and count for each group
    • use cut | sort | uniq -c for a quick-and-dirty histogram

...

  • awk is a powerful scripting language that is easily invoked from the command line
    • awk '<script>' - the '<script>'  is applied to each line of input (generally piped in)
      • always enclose '<script>' in single quotes to inhibit shell evaluation
      • awk has its own set of metacharacters that are different from the shell's
  • General structure of an awk script:
    • BEGIN {<expressions>}  –  use to initialize variables before any script body lines are executed
      • e.g. BEGIN {FS=":"; OFS="\t"; sum=0}
        • says use colon ( : ) as the input field separator (FS), and Tab ( \t ) as the output field separator (OFS)
          • the default input field separator (FS) is whitespace
            • one or more spaces or Tabs
          • the default output field separator (OFS) is a single space
        • initializes the variable sum to 0
    • {<body expressions>}  – expressions to apply to each line of input
      • use $1, $2, etc. to pick out specific input fields
      • e.g. {print $3,$4} outputs fields 3 and 4 of the input, separated by the output field separator.
      • some special expressions:
        • NF - Number of Fields on the line
        • NR - Number of the Record (i.e., line number)
    • END {<expressions>} – executed after all input is complete
      • e.g. END {print sum}

...

The grep command

  • grep -P '<pattern>' searches for <pattern> in its input, and only outputs lines containing it
    • always enclose '<pattern>' in single quotes to inhibit shell evaluation!
      • pattern-matching metacharacters are very different from those in the shell
    • -P says to use Perl patterns, which are much more powerful than standard grep patterns
    • -v (inverse match) – only print lines with no match
    • -n (line number) – prefix output with the line number of the match
    • -i  (case insensitive) – ignore case when matching
    • -l says return only the names of files that do contain the pattern match
    • -L says return only the names of files that do not contain the pattern match
    • -c says just return a count of line matches
    • -A <n> (After) and -B <n> (Before) – output '<n>' number of lines after or before a match

...

A regular expression (regex) is a pattern of characters to search for and metacharacters that control and modify how search matching is done.

  • <pattern> (a regular expression, or regex) can contain special match metacharacters and modifiers. The ones below are Perl metacharacters, which are supported by most languages (e.g. grep -P)
    • ^ – matches beginning of line
    • $ – matches end of line
    • .  – (period) matches any single character
    • * – modifier; place after an expression to match 0 or more occurrences
    • + – modifier, place after an expression to match 1 or more occurrences
    • ? – modifier, place after an expression to match 0 or 1 occurrences
    • \s – matches any whitespace character (\S any non-whitespace)
    • \d – matches digits 0-9
    • \w – matches any word character: A-Z, a-z, 0-9 and _ (underscore)
    • \t matches Tab
    • \n matches Linefeed\r matches Carriage return
    • [xyz123] – matches any single character (including special characters) among those listed between the brackets [ ]
      • this is called a character class.
      • use [^xyz123] to match any single character not listed in the class
    • (Xyz|Abc) – matches either Xyz or Abc or any text or expressions inside parentheses separated by | characters
      • note that parentheses ( ) may also be used to capture matched sub-expressions for later use
  • in Perl, where a pattern is delimited by //, modifiers appear after the pattern:
    • i - perform case-insensitive text matching
    • g - perform the specified substitution globally on each input record, not just on the 1st match

...

utilitydefault delimiterhow to changeexample
cutTab-d or --delimiter optioncut -d ':' -f 1 /etc/passwd
sortwhitespace
(one ore more spaces or Tabs)
-t or --field-separator optionsort -t ':' -k1,1 /etc/passwd
awk

whitespace (one ore more spaces or Tabs)

Note: some older versions of awk do not treat Tabs as field separators.

  • In the BEGIN {  } block
    • FS= (input field separator)
    • OFS= (output field separator)
  • -F or --field-separator option

cat /etc/fstab | grep -v '^#' | awk 'BEGIN{OFS="\t"}{print $2,$1}'

cat /etc/passwd | awk -F ":" '{print $1}'
joinone or more spaces-t option
join -t $'\t' -j 2 file1 file12
perlwhitespace
(one ore more spaces or Tabs)
when auto-splitting input with -a
-F'/<pattern>/' optioncat /etc/fstab | grep -v '^#' | perl -F'/\s+/' -a -n -e 'print "$F[1]\t$F[0]\n";'
readwhitespace
(one or more spaces or Tabs)
IFS= (input field separator) optionNote that a bare IFS= removes any field separator, so whole lines are read each loop iteration.

...