...
Next we edit the multiqc_config.yaml configuration file to add appropriate custom data sections:
...
title | multiqc_config.yaml |
---|
...
Then the usual...
Resulting in a report that includes our inset size distribution data the custom data section we configured: http://web.corral.tacc.utexas.edu/iyer/byteclub/multiqc/06_custom_linegraph.mqc_report.html, with a new section called Bowtie2 insert size.
What's cool is that this "sawtooth" insert size distribution occurs because of the way transposons insert into the major groove of DNA at regular intervals. So this graph shows Igor that his ATAC-seq proof-of-concept experiment worked!
Adding a custom bargraph
Here we'll create two custom bargraph reports, one for bowtie2 mapping qualities and a second showing genome coverage of the alignments.
The data files for both reports are pretty simple, but it took a bit of scripting to create them. So let's just use pre-made copies:
Code Block | ||
---|---|---|
| ||
cd $SCRATCH/byteclub/multiqc/02_bowtie
cp /work/01063/abattenh/projects/byteclub/multiqc/07_custom_bargraph/for_multiqc/*mapq* for_multiqc/
cp /work/01063/abattenh/projects/byteclub/multiqc/07_custom_bargraph/for_multiqc/*genomecov* for_multiqc/ |
There is one mapping quality histogram for each dataset, with category names in the 1st column and counts in the 2nd. The 50k dataset file looks like this:
Code Block | ||
---|---|---|
| ||
q0 137354
1-9 671546
10-19 1081868
20-29 1945926
30-39 1508496
40+ 12930272 |
There is just one data file for genome coverage. Unlike the per-sample files, it has a header, with an arbitrary tag for the categories in the 1st column, then dataset names and their counts in subsequent columns:
Code Block | ||
---|---|---|
| ||
count 5k_nuclei 50k_nuclei
none 2140984435 2175228345
1-2 237947623 351105871
3-10 308665107 186361275
11-50 38729079 17356704
51-100 3473642 780078
100+ 1071888 39501 |
Next we edit the multiqc_config.yaml configuration file to add appropriate custom data sections:
Code Block | ||
---|---|---|
| ||
# Titles to use for the report.
title: "ATAC-Seq QC Reports"
subtitle: null
intro_text: "MultiQC reports for Igor's ATAC-Seq proof-of-concept project."
report_header_info:
- Sequenced by: 'GSAF'
- Job: 'JA17277'
- Run: 'SA17121'
- Setup: '2x150'
# Change the output filenames
output_fn_name: mqc_report.html
data_dir_name: mqc_report_data
# Ignore these files / directories / paths when searching for reports
fn_ignore_files:
- '*.dupinfo.txt'
# Modules that should come at the top of the report
top_modules:
- 'generalstats'
- 'fastqc'
- 'samtools'
- 'picard'
# --------------------------------
# Custom data
# --------------------------------
custom_content:
order:
- bowtie2_isize_section
custom_data:
bowtie2_isize:
id: 'bowtie2_isize_section'
section_name: 'Bowtie2 insert size'
description: 'distribution for alignments (bowtie2 --local -X2000 --no-mixed --no-discordant)'
file_format: 'tsv'
plot_type: 'linegraph'
pconfig:
id: 'bowtie2_isize_plot'
title: 'Insert sizes for proper pairs'
xlab: 'Insert size'
ylab: 'Count'
sp:
bowtie2_isize_section:
fn: '*.bowtie2_isizes.tsv' |
x
Expand | |||||
---|---|---|---|---|---|
| |||||
To catch up, just use Anna's pre-made files:
|
Then the usual...
Code Block | ||
---|---|---|
| ||
cd $SCRATCH/byteclub/multiqc/02_bowtie; rm -rf mqc_report*; multiqc . |
Resulting in a report that includes our inset size distribution data the custom data section we configured: http://web.corral.tacc.utexas.edu/iyer/byteclub/multiqc/06_custom_linegraph.mqc_report.html, with a new section called Bowtie2 insert size.
What's cool is that this "sawtooth" insert size distribution occurs because of the way transposons insert into the major groove of DNA at regular intervals. So this graph shows Igor that his ATAC-seq proof-of-concept experiment worked!
Making MultiQC run faster and be less confused
By default, MultiQC scans all files in the analysis directory you specify. This can take quite a while for complex directory hierarchies with many files that will not be used by MultiQC.
Additionally, MultiQC can get confused when the same (or similar) data is found in different files, or in different directories.
To address these issues, it is a good practice to copy everything you want MultiQC to process into a single directory, then either specify just that directory on the multiqc command line (e.g. multiqc for_multiqc), or exclude other directories in the multiqc_config.yaml file.
For example, here we can stage all the reports we want MultiQC to process in our for_multiqc directory:
Code Block | ||
---|---|---|
| ||
cd ~/playtime/multiqc/atacseq/for_multiqc
ln -s -f ../fastqc
cp -p ../bowtie2/*.flagstat.txt .
cp -p ../bowtie2/*.idxstats.txt . |
Your for_multiqc directory should now contain:
Code Block |
---|
brain_50k_nuclei.bowtie2_isizes.tsv
brain_50k_nuclei.dupmetrics.txt
brain_50k_nuclei.flagstat.txt
brain_50k_nuclei.idxstats.txt
brain_5k_nuclei.bowtie2_isizes.tsv
brain_5k_nuclei.dupmetrics.txt
brain_5k_nuclei.flagstat.txt
brain_5k_nuclei.idxstats.txt
fastqc |
Then:
Code Block | ||
---|---|---|
| ||
cd ~/playtime/multiqc/atacseq; rm -rf mqc_report*
multiqc for_multiqc |
You can also exclude the bowtie2 directory entirely via a fn_ignore_dirs section list item. Our final multiqc_config.yaml file then looks like this:
Code Block | ||
---|---|---|
| ||
# Titles to use for the report. title: "ATAC-Seq QC Reports" subtitle: null intro_text: "MultiQC reports for Igor's ATAC-Seq proof-of-concept project." report_header_info: - Sequenced by: 'GSAF' - Job: 'JA17277' - Run: 'SA17121' - Setup: '2x150' # Change the output filenames output_fn_name: mqc_report.html data_dir_name: mqc_report_data # Ignore these files / directories / paths when searching for reports fn_ignore_files: - '*.dupinfo.txt' fn_ignore_dirs: - bowtie2 # Modules that should come at the top of the report top_modules: - 'generalstats' - Job: 'JA17277fastqc' - Run: 'SA17121samtools' - Setup: '2x150picard' # Change the output filenames output_fn_name: mqc_report.html data_dir_name: mqc_report_data # Ignore these files / directories / paths when searching for reports fn_ignore_files:-------------------------------- # Custom data # -------------------------------- custom_content: order: - bowtie2_isize_section - '*.dupinfo.txt' # Modules that should come at the top of the report top_modules: iyer_seq_history_section custom_data: bowtie2_isize: id: 'bowtie2_isize_section' - 'generalstats' section_name: - 'fastqc'Bowtie2 insert size' - 'samtools' -description: 'picard'distribution for alignments # -------------------------------- # Custom data # -------------------------------- custom_content: order:(bowtie2 --local -X2000 --no-mixed --no-discordant)' file_format: 'tsv' - bowtie2_isize_section custom_data:plot_type: 'linegraph' bowtie2_isize:pconfig: id: 'bowtie2_isize_sectionplot' section_name: 'Bowtie2 insert size' descriptiontitle: 'distributionInsert sizes for alignmentsproper (bowtie2 --local -X2000 --no-mixed --no-discordant)'pairs' file_formatxlab: 'tsvInsert size' plot_type: 'linegraph' ylab: 'Count' pconfigiyer_seq_history: id: 'bowtie2iyer_seq_isizehistory_plotsection' titlesection_name: 'InsertIyer sizes for proper pairslab sequencing' xlabdescription: 'Insert- size'history of alignments by type' ylabfile_format: 'Counttsv' sp: bowtie2plot_isize_sectiontype: 'bargraph' fn: '*.bowtie2_isizes.tsv' |
x
Expand | |||||
---|---|---|---|---|---|
| |||||
To catch up, just use Anna's pre-made files:
|
Then the usual...
Code Block | ||
---|---|---|
| ||
cd $SCRATCH/byteclub/multiqc/02_bowtie; rm -rf mqc_report*; multiqc . |
Resulting in a report that includes our inset size distribution data the custom data section we configured: http://web.corral.tacc.utexas.edu/iyer/byteclub/multiqc/06_custom_linegraph.mqc_report.html, with a new section called Bowtie2 insert size.
...
pconfig:
id: 'iyer_seq_history_plot'
sp:
bowtie2_isize_section:
fn: '*.bowtie2_isizes.tsv'
iyer_seq_history_section:
fn: 'iyer_sequencing_history.tsv' |
x
References
Main MultiQC links
...
- These example MultiQC reports below were generated by running the multiqc binary on a command line.
- After inspecting them locally (by just opening them as files in a web browser), they were copied to a web-accessible location to share with others. Here, that location is Iyer Lab's web-accessible directory on on corral.
Igor Ponomarev ATAC-seq data
...