Table of Contents |
---|
Overview
So we've covered a number of "framework" topics – argument handling, stream handling, error handling – in a systematic way. This section presents various tips and tricks for actually manipulating data, which can be useful both in writing scripts and in command line manipulations.
...
- The IFS= clears all of read's default input field separators (whitespace, which is normally whitespace (one or more space characters or tabs).
- This is needed so that read will set the line variable to exactly the contents of the input line, and not strip leading whitespace from it.
- Note the odd syntax for incrementing the line number variable.
...
- The double quotes around the text that "$line" are important to preserve special characters inside the original line (here tab characters).
- without Without the double quotes, the line fields would be separated by spaces, and the cut field delimiter would need to be changed.
- Some lines have an empty job name field; we replace job and sample names in this case.
- We assign file descriptor 4 to the file data being read (
4< sampleinfo.txt
after the done keyword), and read from it explicitly (read line <&4 in
the while line).- This avoids conflict with any global redirection of standard output (e.g. from automatic logging).
- This avoids conflict with any global redirection of standard output (e.g. from automatic logging).
Code Block | ||
---|---|---|
| ||
# be sure to use a different file descriptor (here 4) while IFS= read line <&4; do jobName=$( echo "$line" | cut -f 1 ) sampleName=$( echo "$line" | cut -f 3 ) if [[ "$jobName" == "" ]]; then sampleName="Undetermined"; jobName="none" fi echo "job $jobName - sample $sampleName" done 4< sampleinfo.txt |
...
- Strip the header line off the input using anonymous pipe syntax
<(tail -n +2 sampleinfo.txt)
.- This syntax takes the standard output of the expression in parentheses (a sub-shell) and that can be used as input instead of a file
:
4< <(tail -n +2 sampleinfo.txt)
instead of4< sampleinfo.txt
.
- This syntax takes the standard output of the expression in parentheses (a sub-shell) and that can be used as input instead of a file
- Save all output from the while loop to a file by piping to tee.
...
As we've already seen, field delimiters are tricky! Be aware of the default field delimiter for the various bash utilities, and how to change them:
utility | default delimiter | how to change | example |
---|---|---|---|
cut | tab | -d or --delimiter option | cut -d ':' -f 1 /etc/passwd |
sort | whitespace (one ore more spaces or tabs) | -t or --field-separator option | sort -t ':' -k1,1 /etc/passwd |
awk | spaces (one ore more spaces) for both input and output |
|
cat sampleinfo.txt | awk -F "\t" '{ print $1,$3 }' |
join | one or more spaces | -t option |
|
perl | whitespace (one ore more spaces or |
tabs) when auto-splitting input with -a | -F'/<pattern>/' option | cat sampleinfo.txt | perl -F'/\t/' -ane 'print "$F[0]\t$F[2]\n";' | |
read | whitespace (one ore more spaces or tabs | IFS= option | see example above |
Viewing special characters in text
...