...
For data, we'll use a couple of files from the GSAF's (Genome Sequencing and Analysis Facility) automated processing to deliver sequencing data to customers. These files have information about sequencing runs (a machine run, with many samplescustomer jobs), sequencing jobs (representing a set of customer samples), and samples (a library of DNA molecules to sequence on the machine).
...
Code Block | ||
---|---|---|
| ||
mkdir ~/test; cd ~/test ln -s -f ~/workshop/stor/work/CCBB_Workshops_1/data/sampleinfo.txt ls -l |
...
Code Block | ||
---|---|---|
| ||
cd; rm -f test/*.* ln -s -f -t test ~/workshop/stor/work/CCBB_Workshops_1/data/*.txt ls -l |
What about the case where the files you want are scattered in sub-directories? Here's a solution using find and xargs:
...
Expand | |||||
---|---|---|---|---|---|
| |||||
|
Are all the job/run pairs unique?
...
Code Block | ||
---|---|---|
| ||
set +o pipefail # only the exit code of the last pipe component is returned cat joblist.txt | head -5000 | cut -f 2 | sort | uniq -c | sort -k1,1nr | head -1 | awk '{print $1}' echo $? # exit code will be 0 |
Quotes matter
We've already seen In the see Quoting subtleties section, we see that quoting variable evaluation preserves the caller's argument quoting (see Quoting subtleties). But more specifically, quoting preserves any special characters in the variable value's text (e.g. tab or linefeed characters).
...
Code Block | ||
---|---|---|
| ||
runs=$( grep 'SA1903.$' joblist.txt | cut -f 2 )
echo "$runs" # preserves linefeeds
echo $runs # linefeeds converted to spaces
for run in $runs; do
echo "Run name is: $run"
done |
...