Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Overview

So we've covered a number of "framework" topics – argument handling, stream handling, error handling – in a systematic way. This section presents various tips and tricks for actually manipulating data, which can be useful both in writing scripts and in command line manipulations.

...

  • The IFS= clears all of read's default input field separators (whitespace, which is normally whitespace (one or more space characters or tabs).
    • This is needed so that read will set the line variable to exactly the contents of the input line, and not strip leading whitespace from it.
  • Note the odd syntax for incrementing the line number variable.

...

  • The double quotes around the text that "$line" are important to preserve special characters inside the original line (here tab characters).
    • without Without the double quotes, the line fields would be separated by spaces, and the cut field delimiter would need to be changed.
  • Some lines have an empty job name field; we replace job and sample names in this case.
  • We assign file descriptor 4 to the file data being read (4< sampleinfo.txt after the done keyword), and read from it explicitly (read line <&4 in the while line).
    • This avoids conflict with any global redirection of standard output (e.g. from automatic logging).

Code Block
languagebash
# be sure to use a different file descriptor (here 4)
while IFS= read line <&4; do
  jobName=$( echo "$line" | cut -f 1 )
  sampleName=$( echo "$line" | cut -f 3 )
  if [[ "$jobName" == "" ]]; then
    sampleName="Undetermined"; jobName="none"
  fi
  echo "job $jobName - sample $sampleName"
done 4< sampleinfo.txt

...

  • Strip the header line off the input using anonymous pipe syntax  <(tail -n +2 sampleinfo.txt).
    • This syntax takes the standard output of the expression in parentheses (a sub-shell) and that can be used as input instead of a file
      • 4< <(tail -n +2 sampleinfo.txt) instead of 4< sampleinfo.txt.
  • Save all output from the while loop to a file by piping to tee.

...

As we've already seen, field delimiters are tricky! Be aware of the default field delimiter for the various bash utilities, and how to change them:

utilitydefault delimiterhow to changeexample
cuttab-d or --delimiter optioncut -d ':' -f 1 /etc/passwd
sortwhitespace
(one ore more spaces or tabs)
-t or --field-separator optionsort -t ':' -k1,1 /etc/passwd
awkspaces
(one ore more spaces)
for both input and output
  • FS (input field separator) and/or OFS (output field separator) variable in BEGIN{ } block
  • -F or --field-separator option

cat sampleinfo.txt | awk 'BEGIN{ FS=OFS="\t" } {print $1,$3}'

cat sampleinfo.txt | awk -F "\t" '{ print $1,$3 }'


joinone or more spaces-t option
join -t $'\t' -j 2 file1 file12 
perlwhitespace
(one ore more spaces or
tabs
tabs) when auto-splitting input with -a-F'/<pattern>/' optioncat sampleinfo.txt | perl -F'/\t/' -ane 'print "$F[0]\t$F[2]\n";'
readwhitespace
(one ore more spaces or tabs
IFS= optionsee example above

Viewing special characters in text

...