...
The read function can be used to read input one line at a time. While the full details of read are complicated (see https://unix.stackexchange.com/questions/209123/understanding-ifs-read-r-line) this read-a-line-at-a-time idiom works nicely.
Code Block |
---|
|
lineNo=1
while IFS= read line; do
echo "Line $lineNo: '$line'"
lineNo=$(( lineNo + 1 ))
done < sampleinfo.txt |
- The IFS= clears all of read's default input field separators (whitespace).
- This is needed so that read will set the line variable to exactly the contents of the input line, and not strip leading whitespace from it.
- Note the odd syntax for incrementing the line number variable.
Once a line has been read, it can be parsed, for example, using cut, as shown below. Other notes:
- The double quotes around "$line" are important to preserve special characters inside the original line (here tab characters)
- without the double quotes, the line fields would be separated by spaces, and the cut field delimiter would need to be changed.
- Some lines have an empty job name field; we replace job and sample names in this case.
- We assign file descriptor 4 to the file data being read (
4< sampleinfo.txt
after the done keyword), and read from it explicitly (read line <&4 in
the while line). This avoids conflict with any global redirection of standard output (e.g. from automatic logging).
Code Block |
---|
|
# be sure to use a different file descriptor (here 4)
while IFS= read line <&4; do
jobName=$( echo "$line" | cut -f 1 )
sampleName=$( echo "$line" | cut -f 3 )
if [[ "$jobName" == "" ]]; then
sampleName="Undetermined"; jobName="none"
fi
echo "job $jobName - sample $sampleName"
done 4< sampleinfo.txt | tee read_line_output.txt |
Two final modifications:
- Strip the header line off the input using anonymous pipe syntax
<(tail -n +2 sampleinfo.txt)
.
- This syntax takes the standard output of the expression in parentheses (a sub-shell) and that can be used as input instead of a file
:
<4< <(tail -n +2 sampleinfo.txt)
instead of 4< sampleinfo.txt
.
- Save all output from the while loop to a file by piping to tee.
Code Block |
---|
|
# be sure to use a different file descriptor (here 4)
while IFS= read line <&4; do
jobName=$( echo "$line" | cut -f 1 )
sampleName=$( echo "$line" | cut -f 3 )
if [[ "$jobName" == "" ]]; then
sampleName="Undetermined"; jobName="none"
fi
echo "job $jobName - sample $sampleName"
done 4< <(tail -n +2 sampleinfo.txt) | tee read_line_output.txt |
exercise 2
Using the above code as a guide, use the job name and sample name information to construct a pathname of the form Project_<job name>/<sample name>.fastq.gz, and write these paths to a file. Skip any entries with no job name.
Expand |
---|
|
Code Block |
---|
| # be sure to use a different file descriptor (here 4)
while IFS= read line <&4; do
jobName=$( echo "$line" | cut -f 1 )
sampleName=$( echo "$line" | cut -f 3 )
if [[ "$jobName" == "" ]]; then continue; fi
echo "Project_${jobName}/${sampleName}.fastq.gz"
done 4< <(tail -n +2 sampleinfo.txt) | tee pathnames.txt |
|
field delimiter issues
Always be aware of the default field delimiter for the various bash utilities, and how to change them:
...