Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

For data, we'll use a couple of files from the GSAF's (Genome Sequencing and Analysis Facility) automated processing to deliver sequencing data to customers. These files have information about sequencing runs (a machine run, with many samplescustomer jobs), sequencing jobs (representing a set of customer samples), and samples (a library of DNA molecules to sequence on the machine).

...

Code Block
languagebash
mkdir ~/test; cd ~/test
ln -s -f ~/workshop/stor/work/CCBB_Workshops_1/data/sampleinfo.txt
ls -l

...

Code Block
languagebash
cd; rm -f test/*.*
ln -s -f -t test ~/workshop/stor/work/CCBB_Workshops_1/data/*.txt
ls -l

What about the case where the files you want are scattered in sub-directories? Here's a solution using find and xargs:

...

Expand
titleSolution
Code Block
languagebash
cut -f 1 joblist.txt | sort | uniq | wc -l
# there are 38413167

Are all the job/run pairs unique?

...

Code Block
languagebash
set +o pipefail # only the exit code of the last pipe component is returned
cat joblist.txt | head -5000 | cut -f 2 | sort | uniq -c | sort -k1,1nr | head -1 | awk '{print $1}'
echo $?         # exit code will be 0

Quotes matter

We've already seen In the see Quoting subtleties section, we see that quoting variable evaluation preserves the caller's argument quoting (see Quoting subtleties). But more specifically, quoting preserves any special characters in the variable value's text (e.g. tab or linefeed characters).

...

Code Block
languagebash
runs=$( grep 'SA1903.$' joblist.txt | cut -f 2 )
echo "$runs"   # preserves linefeeds
echo $runs     # linefeeds converted to spaces

for run in $runs; do
  echo "Run name is: $run"
done

...