Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • -p tells perl to print its substitution results
  • -e introduces the perl script (always encode it in single quotes to protect it from shell evaluation)
  • ~s is the perl pattern substitution operator
  • forward slashes ("/  /  /") enclose the regex search pattern and the replacement text

Handling multiple FASTQ files example

Here's an example of how to work with multiple FASTQ files, produced when a Next Generation Sequencing (NGS) core facility such as the GSAF sequences a library of DNA molecules provided to them. These FASTQ files (generally compressed with gzip to save space), are often provided in sub-directories, each associated with a sample. For example, the output of running tree on one such directory is shown below.

Code Block
languagebash
tree /stor/work/CCBB_Workshops_1/bash_scripting/fastq/


Image Added

There are 4 FASTQ files we want to manipulate. Let's start with a for loop to get their full paths, and just the FASTQ file names without the _R1_001.fastq.gz suffix:

Code Block
languagebash
# This is how to get all 4 full pathnames:
find /stor/work/CCBB_Workshops_1/bash_scripting/fastq -name "*.fastq.gz"


# In a for loop, strip off the directory and the common _R1_001.fastq.gz file suffix
for path in $( find /stor/work/CCBB_Workshops_1/bash_scripting/fastq -name "*.fastq.gz" ); do
  pfx=`basename $path`
  pfx=${pfx%%_R1_001.fastq.gz}
  echo "$pfx"
done

Now shorten the lane numbers by removing "00", and remove the _S\d+ sample number (which is not part of the user's sample name):

Code Block
languagebash
# This is how to get all 4 full pathnames:
find /stor/work/CCBB_Workshops_1/bash_scripting/fastq -name "*.fastq.gz"


# In a for loop, strip off the directory and the common _R1_001.fastq.gz file suffix
for path in $( find /stor/work/CCBB_Workshops_1/bash_scripting/fastq -name "*.fastq.gz" ); do
  pfx=`basename $path`
  pfx=${pfx%%_R1_001.fastq.gz}
  pfx=$( echo $pfx | sed 's/00//' | perl -pe '~s/_S\d+//')
  echo "$pfx"
done