Page Comparison

...

MultiQC produces neat, interactive plots in an HTML file.
- So it can be used as a basic plotting tool for many kinds of reports and data, not just those produced by NGS tools!

Tip

I recommend using Chrome to view MultiQC reports.

The HTML reports generated by MultQC rely heavily on JavaScript and other dynamic web content scripting tools, and not all browsers support them equally well.

Code Workshop

ATAC-seq is a transposon-insertion sequencing method where an engineered, activate transposon inserts in accessible ("open") chromatin. It is considered to be a much simpler protocol to standard DNase-seq, and requires less starting material as well.

...

Setup to follow along

Login to ls5 or stampede at TACCTACCTACC. Execute these commands to set up access to the the multiqc binary:

Code Block

language	bash
title	lonestar5 setup for multiqc

module load python
export PATH="/work/projects/BioITeam/ls5/binopt/multiqc-1.0:$PATH"
export PYTHONPATH="/work/projects/BioITeam/ls5/lib/python2.7/annab-packages:$PYTHONPATH"
 
# make sure it is working...
multiqc --help

Produce a consolidated FastQC report

The FastQC took is great for producing detailed reports for every individual fastq file. For example, for Igor's 2 PE datasets, 4 reports are produced from running fastqc (http://web.corral.tacc.utexas.edu/iyer/igor/fastqc/).

The shortcoming is that you have to browse through all the individual reports one at a time, which can be tedious for large experiments.

This is where MultiQC's power comes in. You can point MultiQC to a directory where FastQC has been run and it will magically produce a consolidated report.

For example, logged in to ls5 at TACC, first stage a directory where FastQC has been run:

Code Block

language	bash

mkdir -p $SCRATCH/byteclub/multiqc
cd $SCRATCH/byteclub/multiqc
ln -s -f /work/01063/abattenh/projects/byteclub/multiqc/fastqc

...

Code Block

language	bash
title	stampede setup for multiqc

module load python
export PATH="/work/projects/BioITeam/stampede/opt/multiqc-1.0:$PATH"
export PYTHONPATH="/work/projects/BioITeam/stampede/lib/python2.7/annab-packages:$PYTHONPATH"
 
# make sure it is working...
multiqc --help

Produce a consolidated FastQC report

The FastQC took is great for producing detailed reports for every individual fastq file. For example, for Igor's 2 PE datasets, 4 reports are produced from running fastqc (http://web.corral.tacc.utexas.edu/iyer/igor/fastqc/).

The shortcoming is that you have to browse through all the individual reports one at a time, which can be tedious for large experiments.

This is where MultiQC's power comes in. You can point MultiQC to a directory where FastQC has been run and it will magically produce a consolidated report.

For example, logged in to ls5 at TACC, first stage a directory where FastQC has been run:

Code Block

language	bash

mkdir -p $SCRATCH/byteclub/multiqc
cd $SCRATCH/byteclub/multiqc
ln multiqc .

...

-s -f /work/projects/BioITeam/projects/byteclub/multiqc/fastqc

Now this is all it takes to produce a basic MultiQC report:

Code Block

language	bash

cd $SCRATCH/byteclub/multiqc
multiqc .

When this completes you'll see a new file and directory:

...

Expand

title	Catch up

To catch up, just stage Anna's pre-made files:

Code Block

language	bash

mkdir -p $SCRATCH/byteclub/multiqc/
cd $SCRATCH/byteclub/multiqc/
rsync -avrP --delete /work/01063projects/abattenhBioITeam/projects/byteclub/multiqc/0102_fastq/ .

After saving this file, remove the previous MultiQC outputs and re-run the program:

...

Always use spaces (not tabs!) in the multiqc_config.yaml file.
Make sure the file is saved with Unix line endings (not Windows or Mac).
Pay attention to the output when running multiqc. It will tell you if there are issues parsing the config file.
Always delete any previous MultiQC output files before running multiqc
- While their documentation says existing files will just be updated, I have seen MultiQC get confused when previous reports exist.
It is a good idea to change the name of the MultiQC output files
- If output files with those names are not created, something went wrong!
Consult example config files
- An example multiqc_config.yaml file: https ://github.com/ewels/MultiQC/blob/master/multiqc_config_example.yaml
- All multiqc_config.yaml defaults: https://github.com/ewels/MultiQC/blob/master/multiqc/utils/config_defaults.yaml
Avoid running multiqc on large complex directory trees.
- Instead, create a separate directory (or directory tree) only for MultiQC
  - Copy or link the files you want MultiQC to look for there, and use it as MultiQC's target directory.
- MultiQC will run much faster and have fewer confusions.

...

Code Block

language	bash

cd $SCRATCH/byteclub/multiqc
rsync -avrP /work/01063projects/abattenhBioITeam/projects/byteclub/multiqc/bowtie2/ bowtie2/

...

<prefix>.flagstat.txt - output from running samtools flagstat
<prefix>.idxstats.txt - output from running samtools idxstats
<prefix>.dupinfo.txt - output from running Picard MarkDuplicates

Expand

title	Catch up

To catch up, just use Anna's pre-made files:

Code Block

language	bash

mkdir -p $SCRATCH/byteclub/multiqc/
cd $SCRATCH/byteclub/multiqc/
rsync -avrP --delete /work/01063projects/abattenhBioITeam/projects/byteclub/multiqc/03_bowtie/ .

...

Expand

title	Catch up

To catch up, just use Anna's pre-made files:

Code Block

language	bash

mkdir -p $SCRATCH/byteclub/multiqc
cd $SCRATCH/byteclub/multiqc
rsync -avrP --delete /work/01063projects/abattenhBioITeam/projects/byteclub/multiqc/04_picard_fixed/ .

...

Expand

title	Catch up

To catch up, just use Anna's pre-made files:

Code Block

language	bash

mkdir -p $SCRATCH/byteclub/multiqc
cd $SCRATCH/byteclub/multiqc
rsync -avrP --delete /work/01063projects/abattenhBioITeam/projects/byteclub/multiqc/05_section_order/ .

...

Code Block

language	bash

cd $SCRATCH/byteclub/multiqc/for_multiqc
for f in ../bowtie2/*.insertsz.txt; do
  bn=`basename $f`
  pfx=${bn%%.insertsz.txt}
  echo "$f - $pfx"
  tail -n +2 $f | grep -v -P '^-' | cut -f 1,3 > ${pfx}.bowtie2_isizes.tsv
done

Next we edit the multiqc_config.yaml configuration file to add appropriate custom data sections:

Code Block

title	multiqc_config.yaml

# Titles to use for the report.
title: "ATAC-Seq QC Reports"
subtitle: null
intro_text: "MultiQC reports for Igor's ATAC-Seq proof-of-concept project."
report_header_info:
    - Sequenced by: 'GSAF'
    - Job: 'JA17277'
    - Run: 'SA17121'
    - Setup: '2x150'

# Change the output filenames
output_fn_name: mqc_report.html
data_dir_name: mqc_report_data

# Ignore these files / directories / paths when searching for reports
fn_ignore_files:
    - '*.dupinfo.txt'

# Modules that should come at the top of the report
top_modules:
    - 'generalstats'
    - 'fastqc'
    - 'samtools'
    - 'picard'

# --------------------------------
# Custom data
# --------------------------------

custom_data:
    bowtie2_isize:
        id: 'bowtie2_isize_section'
        section_name: 'Bowtie2 insert size'
        description: 'distribution for alignments (bowtie2 --local -X2000 --no-mixed --no-discordant)'
        file_format: 'tsv'
        plot_type: 'linegraph'
        pconfig:
            id: 'bowtie2_isize_plot'
            title: 'Insert sizes for proper pairs'
            xlab: 'Insert size'
            ylab: 'Count'
sp:
 bn=`basename $f`   pfx=${bn%%.insertsz.txt}bowtie2_isize_section:
   echo "$f - $pfx"   tail -n +2 $f | grep -v -P '^-' | cut -f 1,3 > ${pfx}.bowtie2_isizes.tsv
done

Next we edit the multiqc_config.yaml configuration file to add appropriate custom data sections:

...

fn: '*.bowtie2_isizes.tsv'

Expand

title	Catch up

To catch up, just use Anna's pre-made files:

Code Block

language	bash

mkdir -p $SCRATCH/byteclub/multiqc
cd $SCRATCH/byteclub/multiqc
rsync -avrP --delete /work/projects/BioITeam/projects/byteclub/multiqc/06_custom_linegraph/ .

Then the usual...

Code Block

language	bash

cd $SCRATCH/byteclub/multiqc; rm -rf mqc_report*; multiqc .

Resulting in a report that includes our inset size distribution data the custom data section we configured: http://web.corral.tacc.utexas.edu/iyer/byteclub/multiqc/06_custom_linegraph.mqc_report.html, with a new section called Bowtie2 insert size.

What's cool is that this "sawtooth" insert size distribution occurs because of the way transposons insert into the major groove of DNA at regular intervals. So this graph shows Igor that his ATAC-seq proof-of-concept experiment worked!

Adding

...

custom

...

bargraphs

Here we'll create two custom bargraph reports, one for bowtie2 mapping qualities and a second showing genome coverage of the alignments.

...

Code Block

language	bash

cd $SCRATCH/byteclub/multiqc
cp /work/01063projects/abattenhBioITeam/projects/byteclub/multiqc/07_custom_bargraph/for_multiqc/*mapq*      for_multiqc/
cp /work/01063projects/abattenhBioITeam/projects/byteclub/multiqc/07_custom_bargraph/for_multiqc/*genomecov* for_multiqc/

...

There is just one data file for genome coverage. Unlike the per-sample files, it has a header, with an arbitrary tag for the categories dataset names in the 1st column, then dataset followed by category names and their counts in subsequent columns. (I've re-formatted the data below for readability, but remember that all .tsv file data must be tab-separated.)

Code Block

title	combined_genomecov.tsv

countsample     5k_nuclei none   50k_nuclei (a) none  2140984435   2175228345 (b) 1-2   237947623    351105871 (c) 3-10  308665107    186361275 (d) 11-50 38729079     51+
5k_nuclei   2140984435  237947623  308665107  38729079  17356704
(e) 51+   4545530    4545530
50k_nuclei  2175228345  351105871  186361275  17356704  819579

Here we edit the multiqc_config.yaml configuration file to add appropriate custom data sections:

...

Expand

title	Catch up

To catch up, just use Anna's pre-made files:

Code Block

language	bash

mkdir -p $SCRATCH/byteclub/multiqc
cd $SCRATCH/byteclub/multiqc
rsync -avrP --delete /work/01063projects/abattenhBioITeam/projects/byteclub/multiqc/07_custom_bargraph/ .

...

Expand

title	Catch up

To catch up, just use Anna's pre-made files:

Code Block

language	bash

mkdir -p $SCRATCH/byteclub/multiqc
cd $SCRATCH/byteclub/multiqc
rsync -avrP --delete /work/01063/abattenh/projects/byteclub/multiqc/08_final/ .

Run MultiQC again, but this time just point it

Code Block

language	bash

cd $SCRATCH/byteclub/multiqc
rm -rf mqc_report*
multiqc for_multiqc

Alternatively, you could exclude the bowtie2 directory entirely via a fn_ignore_dirs section list item in multiqc_config.yaml, like this:

Code Block
fn_ignore_dirs: - 'bowtie2'

...

/projects/BioITeam/projects/byteclub/multiqc/08_final/ .

Run MultiQC again, but this time just point it

Code Block

language	bash

cd $SCRATCH/byteclub/multiqc
rm -rf mqc_report*
multiqc for_multiqc

Alternatively, you could exclude the bowtie2 directory entirely via a fn_ignore_dirs section list item in multiqc_config.yaml, like this:

References

Main MultiQC links

...

Below are descriptions of two projects I've assisted with lately using MultiQC to help pull together visualizations assessing experiment quality.

...

I recommend using Chrome to view MultiQC reports.

The HTML reports generated by MultQC rely heavily on JavaScript and other dynamic web content scripting tools, and not all browsers support them equally well.

These example MultiQC reports below were generated by running the multiqc binary on a command line.
After inspecting them locally (by just opening them as files in a web browser), they were copied to a web-accessible location to share with others. Here, that location is Iyer Lab's web-accessible directory on on corral.

Igor Ponomarev ATAC-seq data

...

Versions Compared

Old Version 30

New Version Current

Key

Code Workshop

Setup to follow along

Produce a consolidated FastQC report

Produce a consolidated FastQC report

Adding

custom

bargraphs

References

Main MultiQC links

Igor Ponomarev ATAC-seq data