Page Comparison

Table of Contents

Overview

So we've covered a number of "framework" topics – argument handling, stream handling, error handling – in a systematic way. This section presents various tips and tricks for actually manipulating data, which can be useful both in writing scripts and in command line manipulations.

...

The IFS= clears all of read's default input field separators (whitespace, which is normally whitespace (one or more space characters or tabs).
- This is needed so that read will set the line variable to exactly the contents of the input line, and not strip leading whitespace from it.
Note the odd syntax for incrementing the line number variable.

...

The double quotes around the text that "$line" are important to preserve special characters inside the original line (here tab characters).
- without Without the double quotes, the line fields would be separated by spaces, and the cut field delimiter would need to be changed.
Some lines have an empty job name field; we replace job and sample names in this case.
We assign file descriptor 4 to the file data being read (4< sampleinfo.txt after the done keyword), and read from it explicitly (read line <&4 in the while line).
- This avoids conflict with any global redirection of standard output (e.g. from automatic logging).

Code Block

language	bash

# be sure to use a different file descriptor (here 4)
while IFS= read line <&4; do
  jobName=$( echo "$line" | cut -f 1 )
  sampleName=$( echo "$line" | cut -f 3 )
  if [[ "$jobName" == "" ]]; then
    sampleName="Undetermined"; jobName="none"
  fi
  echo "job $jobName - sample $sampleName"
done 4< sampleinfo.txt

...

Strip the header line off the input using anonymous pipe syntax <(tail -n +2 sampleinfo.txt).
- This syntax takes the standard output of the expression in parentheses (a sub-shell) and that can be used as input instead of a file:
  - 4< <(tail -n +2 sampleinfo.txt) instead of 4< sampleinfo.txt.
Save all output from the while loop to a file by piping to tee.

...

As we've already seen, field delimiters are tricky! Be aware of the default field delimiter for the various bash utilities, and how to change them:

utility	default delimiter	how to change	example
cut	tab	-d or --delimiter option	`cut -d ':' -f 1 /etc/passwd`
sort	whitespace (one ore more spaces or tabs)	-t or --field-separator option	`sort -t ':' -k1,1 /etc/passwd`
awk	spaces (one ore more spaces) for both input and output	FS (input field separator) and/or OFS (output field separator) variable in BEGIN{ } block -F or --field-separator option	`cat sampleinfo.txt \| awk 'BEGIN{ FS=OFS="\t" }` `{print $1,$3}'` `cat sampleinfo.txt \| awk -F "\t" '{ print $1,$3 }'`
join	one or more spaces	-t option	`join -t $'\t' -j 2 file1 file12`
perl	whitespace (one ore more spaces or

tabs

tabs) when auto-splitting input with -a	-F'/<pattern>/' option	`cat sampleinfo.txt \| perl -F'/\t/' -ane 'print "$F[0]\t$F[2]\n";'`
read	whitespace (one ore more spaces or tabs	IFS= option	see example above

Viewing special characters in text

...

Versions Compared

Old Version 30

New Version 31

Key

Overview

Viewing special characters in text