Page Comparison

...

The read function can be used to read input one line at a time. While the full details of read are complicated (see https://unix.stackexchange.com/questions/209123/understanding-ifs-read-r-line) this read-a-line-at-a-time idiom works nicely.

Code Block

language	bash

lineNo=1
while IFS= read line; do
  echo "Line $lineNo: '$line'"
  lineNo=$(( lineNo + 1 ))
done < sampleinfo.txt

The IFS= clears all of read's default input field separators (whitespace).
- This is needed so that read will set the line variable to exactly the contents of the input line, and not strip leading whitespace from it.
Note the odd syntax for incrementing the line number variable.

Once a line has been read, it can be parsed, for example, using cut, as shown below. Other notes:

The double quotes around "$line" are important to preserve special characters inside the original line (here tab characters)
- without the double quotes, the line fields would be separated by spaces, and the cut field delimiter would need to be changed.
Some lines have an empty job name field; we replace job and sample names in this case.
We assign file descriptor 4 to the file data being read (4< sampleinfo.txt after the done keyword), and read from it explicitly (read line <&4 in the while line). This avoids conflict with any global redirection of standard output (e.g. from automatic logging).

Code Block

language	bash

# be sure to use a different file descriptor (here 4)
while IFS= read line <&4; do
  jobName=$( echo "$line" | cut -f 1 )
  sampleName=$( echo "$line" | cut -f 3 )
  if [[ "$jobName" == "" ]]; then
    sampleName="Undetermined"; jobName="none"
  fi
  echo "job $jobName - sample $sampleName"
done 4< sampleinfo.txt | tee read_line_output.txt

Two final modifications:

Strip the header line off the input using anonymous pipe syntax <(tail -n +2 sampleinfo.txt).
- This syntax takes the standard output of the expression in parentheses (a sub-shell) and that can be used as input instead of a file:
  - <4< <(tail -n +2 sampleinfo.txt) instead of 4< sampleinfo.txt.
Save all output from the while loop to a file by piping to tee.

Code Block

language	bash

# be sure to use a different file descriptor (here 4)
while IFS= read line <&4; do
  jobName=$( echo "$line" | cut -f 1 )
  sampleName=$( echo "$line" | cut -f 3 )
  if [[ "$jobName" == "" ]]; then
    sampleName="Undetermined"; jobName="none"
  fi
  echo "job $jobName - sample $sampleName"
done 4< <(tail -n +2 sampleinfo.txt) | tee read_line_output.txt

exercise 2

Using the above code as a guide, use the job name and sample name information to construct a pathname of the form Project_<job name>/<sample name>.fastq.gz, and write these paths to a file. Skip any entries with no job name.

Expand

title	Solution

Code Block

language	bash

# be sure to use a different file descriptor (here 4)
while IFS= read line <&4; do
  jobName=$( echo "$line" | cut -f 1 )
  sampleName=$( echo "$line" | cut -f 3 )
  if [[ "$jobName" == "" ]]; then continue; fi
  echo "Project_${jobName}/${sampleName}.fastq.gz"
done 4< <(tail -n +2 sampleinfo.txt) | tee pathnames.txt

field delimiter issues

Always be aware of the default field delimiter for the various bash utilities, and how to change them:

...

Versions Compared

Old Version 8

New Version 9

Key

exercise 2

field delimiter issues