...
- MultiQC produces neat, interactive plots in an HTML file.
- So it can be used as a basic plotting tool for many kinds of reports and data, not just those produced by NGS tools!
References
Main MultiQC links
...
Code Workshop
ATAC-seq is a transposon-insertion sequencing method where an engineered, activate transposon inserts in accessible ("open") chromatin. It is considered to be a much simpler protocol to standard DNase-seq, and requires less starting material as well.
For data, we will use some ATAC-seq datasets produced in Igor Ponomarev's lab in WCAAR. As a proof-of-concept for future work, they performed the ATAC-seq protocol on 5k and 50k cell nuclei from mouse brain, producing 2 paired-end datasets.
Setup to follow along
Login to ls5 at TACC. Execute these commands to set up access to the multiqc binary:
Code Block | ||
---|---|---|
| ||
module load python
export PATH="/work/projects/BioITeam/ls5/bin/multiqc-1.0:$PATH"
export PYTHONPATH="/work/projects/BioITeam/ls5/lib/python2.7/annab-packages:$PYTHONPATH"
# make sure it is working...
multiqc --help |
Produce a consolidated FastQC report
The FastQC took is great for producing detailed reports for every individual fastq file. For example, for Igor's 2 PE datasets, 4 reports are produced from running fastqc (http://web.corral.tacc.utexas.edu/iyer/igor/fastqc/).
The shortcoming is that you have to browse through all the individual reports one at a time, which can be tedious for large experiments.
This is where MultiQC's power comes in. You can point MultiQC to a directory where FastQC has been run and it will magically produce a consolidated report.
For example, logged in to ls5 at TACC, first stage a directory where FastQC has been run:
Code Block | ||
---|---|---|
| ||
mkdir -p $SCRATCH/byteclub/multiqc/01_fastq
cd $SCRATCH/byteclub/multiqc/01_fastq
ln -s -f /work/01063/abattenh/projects/byteclub/multiqc/fastqc
ln -s -f $SCRATCH ~/scratch |
Now this is all it takes to produce a basic MultiQC report:
Code Block | ||
---|---|---|
| ||
cd $SCRATCH/byteclub/multiqc/01_fastq
multiqc . |
When this completes you'll see a new file and directory:
- multiqc_report.html – the MultiQC HTML report with its default name
- multiqc_data – directory with text files containing MultiQC data used in the report as well as a log file
Here's what this basic FastQC report looks like: http://web.corral.tacc.utexas.edu/iyer/byteclub/multiqc/01_basic.multiqc_report.html
Expand | |||||
---|---|---|---|---|---|
| |||||
To view the file you created in a web browser, it must be copied somwhere where a browser can open it. An easy way to do this is to copy it to your laptop like this, for example, changing the user name from abattenh and scratch path as appropriate.
|
xx
Code Block | ||
---|---|---|
| ||
code |
xx
References
Main MultiQC links
- Website: http://multiqc.info/
- Documentation: http://multiqc.info/docs
- MultiQC Github repo: https://github.com/ewels/MultiQC
- MultiQC test data repo: https://github.com/ewels/MultiQC_TestData
MultiQC MultiQC configuration files
- an example multiqc_config.yaml file file: https://github.com/ewels/MultiQC/blob/master/multiqc_config_example.yaml
- all all multiqc_config.yaml defaults defaults: https://github.com/ewels/MultiQC/blob/master/multiqc/utils/config_defaults.yaml
MultiQC MultiQC custom data support
- structure of the custom data area of of multiqc_config.yaml:
- available plot types:
- http://multiqc.info/docs/#plotting-functions
- while this section is written for Python programming, the options listed in each plot type's "config" block can be specified declaratively in any plot's s pconfig section in the the multiqc_config.yaml.
- example custom data files from their test data repo:
...
- These example MultiQC reports below were generated by running the the multiqc binary on a command line.
- After inspecting them locally (by just opening them as files in a web browser), they were copied to a web-accessible location to share with others. Here, that location is Iyer Lab's web-accessible directory on on corral
Igor Ponomarev ATAC-seq data
...
The Marcotte lab is working on a deep mutational screening project of a human gene transformed into yeast as an amplicon on a plasmid. Here, the gene is is MVK, a gene in the yeast cholesterol biosynthesis pathway. The The hsMVK gene gene is amplified with an error-prone polymerase to produce point mutations. Both the native yeast gene and the human ortholog (with which it shares no sequence similarity) are under on/off promoter control. The idea is to compare the mutations that accumulate in the active active hsMVK gene gene, after many growth cycles, with a background in which the the hsMVK gene gene is present but not active (the yeast yeast MVKis doing the work) to see which mutations are favored or disfavored. As part of this project, Riddhiman Garge produced 19 datasets.
- basic FastQC report
- report on on BWA mem alignments alignments of the datasets to to hsMVK amplicon amplicon and plasmid backbone contigs
- http://web.corral.tacc.utexas.edu/iyer/mvk/mvk_mqc_report.bwa.html
- standard reports from from samtools flagstat, samtools idxstats, Picard MarkDuplicates
- custom data reports from from bedtools genomecov and and from insert size distribution data Anna computed
- report using custom data from a specialized deep mutational screening tool from the Jesse Bloom lab
- http://web.corral.tacc.utexas.edu/iyer/mvk/mvk_mqc_report.jbloom.html
- this tool looks only at the overlapping portions of paired-end R1 and R2 reads
Code Workshop
Setup to follow along
Login to ls5 at TACC. Execute these commands to set up access to the multiqc binary:
Code Block | ||
---|---|---|
| ||
module load python
export PATH="/work/projects/BioITeam/ls5/bin/multiqc-1.0:$PATH"
export PYTHONPATH="/work/projects/BioITeam/ls5/lib/python2.7/annab-packages:$PYTHONPATH"
# make sure it is working...
multiqc --help |
Produce a consolidated FastQC report
The FastQC took is great for producing detailed reports for every individual fastq file. For example, for Igor's 2 PE datasets, 4 reports are produced from running fastqc (http://web.corral.tacc.utexas.edu/iyer/igor/fastqc/).
The shortcoming is that you have to browse through all the individual reports one at a time, which can be tedious for large experiments.
This is where MultiQC's power comes in. You can point MultiQC to a directory where FastQC has been run and it will magically produce a consolidated report.
For example, logged in to ls5 at TACC, first stage a directory where FastQC has been run:
Code Block | ||
---|---|---|
| ||
mkdir -p $SCRATCH/byteclub/multiqc/01_fastq
cd $SCRATCH/byteclub/multiqc/01_fastq
ln -s -f /work/01063/abattenh/projects/byteclub/multiqc/fastqc
ln -s -f $SCRATCH ~/scratch |
Now this is all it takes to produce a basic MultiQC report:
Code Block | ||
---|---|---|
| ||
cd $SCRATCH/byteclub/multiqc/01_fastq
multiqc . |
When this completes you'll see a new file and directory:
- multiqc_report.html – the MultiQC HTML report with its default name
- multiqc_data – directory with text files containing MultiQC data used in the report as well as a log file
Here's what this basic FastQC report looks like: http://web.corral.tacc.utexas.edu/iyer/byteclub/multiqc/01_basic.multiqc_report.html
Expand | |||||
---|---|---|---|---|---|
| |||||
To view the file you created in a web browser, it must be copied somwhere where a browser can open it. An easy way to do this is to copy it to your laptop like this, for example, changing the user name from abattenh:
|
xx
Code Block | ||
---|---|---|
| ||
code |
xx
...