work with some simple bash scripting from the command line (for loops) to generate multiple fastqc reports simultaneously and look at 200+ 272 plasmid samples.
work with MultiQC to make decisions about read preprocessing.
identify outlier files that are clearly different from the group as a whole and determine how to deal with these files.

Get some data and load fastqc

Copy the plasmid sequencing files found in the BioITeam directory gva_course/plasmid_qc/ to a new directory named GVA_multiqc. There are 2 main ways to do this particularlly since there are so many files (544 total).

Code Block

language	bash
title	Click here for help with copying the files recursively in a single step
collapse	true

cp -r $BI/gva_course/plasmid_qc/ $SCRATCH/GVA_multiqc

Code Block

language	bash
title	Click here for help with copying the files using a wildcard after making a new directory
collapse	true

mkdir $SCRATCH/GVA_multiqc 
cp $BI/gva_course/plasmid+qc/* $SCRATCH/GVA_multiqc

Code Block

language	bash
title	You may remember from the first tutorial on read preprocessing that fastqc is a module you can load
collapse	true

module load fastqc

Use a bash for loop on the command line to generate a fastQC command for all plasmid samples

We are going to construct a single commands file with 544 lines that will launch all commands without having to know the name of any single file. To do so we will use the bash 'for' command.

For loops on the command line have 3 parts:

A list of something to deal with 1 at a time. Followed by a ';'
1. for f in *.gz; in the following example
Something to do with each item in the list. this must start with the word 'do'
1. do echo "fastqc -o fastqc_output $f &"; in the following example
The word "done" so bash knows to stop looking for more commands.
1. done in the following example, but we add a final redirect (>) so rather than printing to the screen the output goes to a file (fastqc_commands in this case)

Code Block

language	bash
title	Putting it all together
collapse	true

for f in *.gz; do echo "fastqc -o fastqc_output $f &" ;done  > fastqc_commands

Use the linux commands head and wc -l to see what the output is.

Next we need to make the output directory for all the fastqc reports to go and make the fastqc_commands file executable.

Code Block

language	bash
title	Do the analysis

mkdir fastqc_output
chmod +

Run MultiQC tool on all fastQC output

...

Versions Compared

Old Version 1

New Version 2

Key

Get some data and load fastqc

Use a bash for loop on the command line to generate a fastQC command for all plasmid samples

Run MultiQC tool on all fastQC output

Page Comparison

Versions Compared

Old Version 1

New Version 2

Key

Get some data and load fastqc

Use a bash for loop on the command line to generate a fastQC command for all plasmid samples

Run MultiQC tool on all fastQC output