Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The read function can be used to read input one line at a time. While the full details of read are complicated (see https://unix.stackexchange.com/questions/209123/understanding-ifs-read-r-line) this read-a-line-at-a-time idiom works nicely. 

Code Block
languagebash
lineNo=1
while IFS= read line; do
  echo "Line $lineNo: '$line'"
  lineNo=$(( lineNo + 1 ))
done < sampleinfo.txt
  • The IFS= clears all of read's default input field separators (whitespace).
    • This is needed so that read will set the line variable to exactly the contents of the input line, and not strip leading whitespace from it.
  • Note the odd syntax for incrementing the line number variable.

Once a line has been read, it can be parsed, for example, using cut, as shown below. Other notes:

  • The double quotes around "$line" are important to preserve special characters inside the original line (here tab characters)
    • without the double quotes, the line fields would be separated by spaces, and the cut field delimiter would need to be changed.
  • Some lines have an empty job name field; we replace job and sample names in this case.
  • We assign file descriptor 4 to the file data being read (4< sampleinfo.txt after the done keyword), and read from it explicitly (read line <&4 in the while line). This avoids conflict with any global redirection of standard output (e.g. from automatic logging).

Code Block
languagebash
# be sure to use a different file descriptor (here 4)
while IFS= read line <&4; do
  jobName=$( echo "$line" | cut -f 1 )
  sampleName=$( echo "$line" | cut -f 3 )
  if [[ "$jobName" == "" ]]; then
    sampleName="Undetermined"; jobName="none"
  fi
  echo "job $jobName - sample $sampleName"
done 4< sampleinfo.txt | tee read_line_output.txt

Two final modifications:

  • Strip the header line off the input using anonymous pipe syntax  <(tail -n +2 sampleinfo.txt).
    • This syntax takes the standard output of the expression in parentheses (a sub-shell) and that can be used as input instead of a file
      • <4< <(tail -n +2 sampleinfo.txt) instead of 4< sampleinfo.txt.
  • Save all output from the while loop to a file by piping to tee.

Code Block
languagebash
# be sure to use a different file descriptor (here 4)
while IFS= read line <&4; do
  jobName=$( echo "$line" | cut -f 1 )
  sampleName=$( echo "$line" | cut -f 3 )
  if [[ "$jobName" == "" ]]; then
    sampleName="Undetermined"; jobName="none"
  fi
  echo "job $jobName - sample $sampleName"
done 4< <(tail -n +2 sampleinfo.txt) | tee read_line_output.txt

exercise 2

Using the above code as a guide, use the job name and sample name information to construct a pathname of the form Project_<job name>/<sample name>.fastq.gz, and write these paths to a file. Skip any entries with no job name.

Expand
titleSolution
Code Block
languagebash
# be sure to use a different file descriptor (here 4)
while IFS= read line <&4; do
  jobName=$( echo "$line" | cut -f 1 )
  sampleName=$( echo "$line" | cut -f 3 )
  if [[ "$jobName" == "" ]]; then continue; fi
  echo "Project_${jobName}/${sampleName}.fastq.gz"
done 4< <(tail -n +2 sampleinfo.txt) | tee pathnames.txt

field delimiter issues

Always be aware of the default field delimiter for the various bash utilities, and how to change them:

...