...
- cut command lets you isolate ranges of data from its input lines
- cut -f <field_number(s)> extracts one or more fields (-f) from each line of its input
- use -d <delim> to change the field delimiter (Tab by default)
- cut -c <character_number(s)> extracts one or more characters (-c) from each line of input
- the <numbers> can be
- a comma-separated list of numbers (e.g. 1,4,7)
- a hyphen-separated range (e.g. 2-5)
- a trailing hyphen says "and all items after that" (e.g. 3,7-)
- cut does not re-order fields, so cut -f 5,3,1 acts like -f 1,3,5
- cut -f <field_number(s)> extracts one or more fields (-f) from each line of its input
- sort sorts its input lines using an efficient algorithm
- by default sorts each line lexically (as strings), low to high
- use -n sort numerically (-n)
- use -V for Version sort (numbers with surrounding text)
- use -r to reverse the sort order
- use one or more -k <start_field_number>,<end_field_number> options to specify a range of "keys" (fields) to sort on
- e.g. -k1,1 -2,2nr to sort field 1 lexically and field 2 as a number high-to-low
- by default, fields are delimited by whitespace -- one or more spaces or Tabs
- use -t <delim> to change the field delimiter (e.g. -t "\t" for Tab only; ignore spaces)
- by default sorts each line lexically (as strings), low to high
- uniq -c counts groupings of its input (which must be sorted) and reports the text and count for each group
- use cut | sort | uniq -c for a quick-and-dirty histogram
sed
- sed (string editor) can be used to edit text using pattern substitution.
- general form: sed 's/<search pattern>/<replacement>/'
- note that sed's pattern matching syntax is quite different from grep's
- Grymore sed tutorial
awk
- awk is a powerful scripting language that is easily invoked from the command line
- awk '<script>' - the '<script>' is applied to each line of input (generally piped in)
- always enclose '<script>' in single quotes to inhibit shell evaluation
- awk has its own set of metacharacters that are different from the shell's
- awk '<script>' - the '<script>' is applied to each line of input (generally piped in)
- General structure of an awk script:
- BEGIN {<expressions>} – use to initialize variables before any script body lines are executed
- e.g. BEGIN {FS=":"; OFS="\t"; sum=0}
- says use colon ( : ) as the input field separator (FS), and Tab ( \t ) as the output field separator (OFS)
- the default input field separator (FS) is whitespace
- one or more spaces or Tabs
- the default output field separator (OFS) is a single space
- the default input field separator (FS) is whitespace
- initializes the variable sum to 0
- says use colon ( : ) as the input field separator (FS), and Tab ( \t ) as the output field separator (OFS)
- e.g. BEGIN {FS=":"; OFS="\t"; sum=0}
- {<body expressions>} – expressions to apply to each line of input
- use $1, $2, etc. to pick out specific input fields
- e.g. {print $3,$4} outputs fields 3 and 4 of the input, separated by the output field separator.
- END {<expressions>} – executed after all input is complete
- e.g. END {print sum}
- BEGIN {<expressions>} – use to initialize variables before any script body lines are executed
...