Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • cut command lets you isolate ranges of data from its input lines
    • cut -f <field_number(s)> extracts one or more fields (-f) from each line of its input
      • use -d <delim> to change the field delimiter (Tab by default)
    • cut -c <character_number(s)> extracts one or more characters (-c) from each line of input
    • the <numbers> can be
      • a comma-separated list of numbers (e.g. 1,4,7)
      • a hyphen-separated range (e.g. 2-5)
      • a trailing hyphen says "and all items after that" (e.g. 3,7-)
    • cut does not re-order fields, so cut -f 5,3,1 acts like -f 1,3,5
  • sort sorts its input lines using an efficient algorithm
    • by default sorts each line lexically (as strings), low to high
      • use -n sort numerically (-n)
      • use -V for Version sort (numbers with surrounding text)
      • use -r to reverse the sort order
    • use one or more -k <start_field_number>,<end_field_number> options to specify a range of "keys" (fields) to sort on
      • e.g. -k1,1 -2,2nr  to sort field 1 lexically and field 2 as a number high-to-low
      • by default, fields are delimited by whitespace -- one or more spaces or Tabs 
        • use -t <delim> to change the field delimiter (e.g. -t "\t" for Tab only; ignore spaces)
  • uniq -c counts groupings of its input (which must be sorted) and reports the text and count for each group
    • use cut | sort | uniq -c for a quick-and-dirty histogram

sed

  • sed (string editor) can be used to edit text using pattern substitution.
    • general form: sed 's/<search pattern>/<replacement>/'
    • note that sed's pattern matching syntax is quite different from grep's
    • Grymore sed tutorial

awk

  • awk is a powerful scripting language that is easily invoked from the command line
    • awk '<script>' - the '<script>'  is applied to each line of input (generally piped in)
      • always enclose '<script>' in single quotes to inhibit shell evaluation
      • awk has its own set of metacharacters that are different from the shell's
  • General structure of an awk script:
    • BEGIN {<expressions>}  –  use to initialize variables before any script body lines are executed
      • e.g. BEGIN {FS=":"; OFS="\t"; sum=0}
        • says use colon ( : ) as the input field separator (FS), and Tab ( \t ) as the output field separator (OFS)
          • the default input field separator (FS) is whitespace
            • one or more spaces or Tabs
          • the default output field separator (OFS) is a single space
        • initializes the variable sum to 0
    • {<body expressions>}  – expressions to apply to each line of input
      • use $1, $2, etc. to pick out specific input fields
      • e.g. {print $3,$4} outputs fields 3 and 4 of the input, separated by the output field separator.
    • END {<expressions>} – executed after all input is complete
      • e.g. END {print sum}

...