Page Comparison

...

cut command lets you isolate ranges of data from its input lines
- cut -f <field_number(s)> extracts one or more fields (-f) from each line of its input
  - use -d <delim> to change the field delimiter (Tab by default)
- cut -c <character_number(s)> extracts one or more characters (-c) from each line of input
- the <numbers> can be
  - a comma-separated list of numbers (e.g. 1,4,7)
  - a hyphen-separated range (e.g. 2-5)
  - a trailing hyphen says "and all items after that" (e.g. 3,7-)
- cut does not re-order fields, so cut -f 5,3,1 acts like -f 1,3,5
sort sorts its input lines using an efficient algorithm
- by default sorts each line lexically (as strings), low to high
  - use -n sort numerically (-n)
  - use -V for Version sort (numbers with surrounding text)
  - use -r to reverse the sort order
- use one or more -k <start_field_number>,<end_field_number> options to specify a range of "keys" (fields) to sort on
  - e.g. -k1,1 -2,2nr to sort field 1 lexically and field 2 as a number high-to-low
  - by default, fields are delimited by whitespace -- one or more spaces or Tabs
    - use -t <delim> to change the field delimiter (e.g. -t "\t" for Tab only; ignore spaces)
uniq -c counts groupings of its input (which must be sorted) and reports the text and count for each group
- use cut | sort | uniq -c for a quick-and-dirty histogram

sed (string editor) can be used to edit text using pattern substitution.
- general form: sed 's/<search pattern>/<replacement>/'
- note that sed's pattern matching syntax is quite different from grep's
- Grymore sed tutorial

awk is a powerful scripting language that is easily invoked from the command line
- awk '<script>' - the '<script>' is applied to each line of input (generally piped in)
  - always enclose '<script>' in single quotes to inhibit shell evaluation
  - awk has its own set of metacharacters that are different from the shell's
General structure of an awk script:
- BEGIN {<expressions>} – use to initialize variables before any script body lines are executed
  - e.g. BEGIN {FS=":"; OFS="\t"; sum=0}
    - says use colon ( : ) as the input field separator (FS), and Tab ( \t ) as the output field separator (OFS)
      - the default input field separator (FS) is whitespace
        one or more spaces or Tabs
      - the default output field separator (OFS) is a single space
    - initializes the variable sum to 0
- {<body expressions>} – expressions to apply to each line of input
  - use $1, $2, etc. to pick out specific input fields
  - e.g. {print $3,$4} outputs fields 3 and 4 of the input, separated by the output field separator.
- END {<expressions>} – executed after all input is complete
  - e.g. END {print sum}

...

Versions Compared