Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This is an RNA-Seq analysis pipeline using an annotated genome and it consists of the following steps:

Quality Assessment

 *Deliverables: Reports generated by FastQC.

 Tools Used:

*FastQC(Andrews 2010) used to generate quality summaries of data:

  • Per base sequence quality report: useful for deciding if trimming necessary.
  • Sequence duplication levels: evaluation of library complexity. Higher levels of sequence duplication may be expected for high coverage RNAseq data.
  • Overrepresented sequences: evaluation of adapter contamination.

Fastq Preprocessing

If required, preprocessing of fastq files is performed.

*Deliverables:

...

 Trimmed/filtered fastq files.

Tools Used:

*Fastx-toolkit: Used to preprocess fastq files.

  • Fastq quality trimmer: Trimming reads based on quality.
  • Fastq quality filter: Filtering reads based on quality.

*Cutadapt: Used to remove adaptor from reads.


Mapping

Mapping to genome reference using BWA-mem or Tophat.

*Deliverables: Mapping results, as bam files and mapping statistics.

Tools Used:

*BWA-mem: (Li 2013) primary aligner used to generate read alignments.

*Tophat: (Kim 2011) aligner used to generate read alignments in a splice-aware manner and identify novel junctions.

*Samtools: (Li 2009) used to generate mapping statistics.

Gene/Transcript Counting

Counting the number of reads mapping to annotated intervals to obtain abundance of genes/transcripts.

*Deliverables:

 Raw

 Raw gene/transcript counts

Tools Used:

*HTSeq-count: (Anders 2014) used to count reads overlapping gene intervals.

DEG Identification

Normalization and statistical testing to identify differentially expressed genes.

*Deliverables: DEG Summary and master file containing fold changes and p values for every gene, MA Plots.

Tools Used:

*DESeq2: (Love 2014) used to perform normalization and test for differential expression using the negative binomial distribution.

 

...