...
- -p tells perl to print its substitution results
- -e introduces the perl script (always encode it in single quotes to protect it from shell evaluation)
- ~s is the perl pattern substitution operator
- forward slashes ("/ / /") enclose the regex search pattern and the replacement text
Handling multiple FASTQ files example
Here's an example of how to work with multiple FASTQ files, produced when a Next Generation Sequencing (NGS) core facility such as the GSAF sequences a library of DNA molecules provided to them. These FASTQ files (generally compressed with gzip to save space), are often provided in sub-directories, each associated with a sample. For example, the output of running tree on one such directory is shown below.
Code Block |
---|
|
tree /stor/work/CCBB_Workshops_1/bash_scripting/fastq/ |
Image Added
There are 4 FASTQ files we want to manipulate. Let's start with a for loop to get their full paths, and just the FASTQ file names without the _R1_001.fastq.gz suffix:
Code Block |
---|
|
# This is how to get all 4 full pathnames:
find /stor/work/CCBB_Workshops_1/bash_scripting/fastq -name "*.fastq.gz"
# In a for loop, strip off the directory and the common _R1_001.fastq.gz file suffix
for path in $( find /stor/work/CCBB_Workshops_1/bash_scripting/fastq -name "*.fastq.gz" ); do
pfx=`basename $path`
pfx=${pfx%%_R1_001.fastq.gz}
echo "$pfx"
done |
Now shorten the lane numbers by removing "00", and remove the _S\d+ sample number (which is not part of the user's sample name):
Code Block |
---|
|
# This is how to get all 4 full pathnames:
find /stor/work/CCBB_Workshops_1/bash_scripting/fastq -name "*.fastq.gz"
# In a for loop, strip off the directory and the common _R1_001.fastq.gz file suffix
for path in $( find /stor/work/CCBB_Workshops_1/bash_scripting/fastq -name "*.fastq.gz" ); do
pfx=`basename $path`
pfx=${pfx%%_R1_001.fastq.gz}
pfx=$( echo $pfx | sed 's/00//' | perl -pe '~s/_S\d+//')
echo "$pfx"
done |