Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This pipeline uses an annotated genome to identify differential expressed genes/transcripts. 6 hour 15 hour minimum ($282 internal$1470 internal, $360 $1860 external) per project.

1. Quality Assessment

Data quality assessed using industry standard tools and quality assessment Quality of data assessed by FastQC; results of quality assessment will be evaluated prior to downstream analysis.

  •  Deliverables: Reports
    • reports generated by FastQC
    .

...

  • Tools

...

  • used:
    • FastQC(Andrews 2010) used to generate quality summaries of data:
      • Per base sequence quality report: useful for deciding if trimming necessary.
      • Sequence duplication levels: evaluation of library complexity. Higher levels of sequence duplication may be expected for high coverage RNAseq data.
      • Overrepresented sequences: evaluation of adapter contamination.

2. Fastq Preprocessing

Quality assessment used to decide if any preprocessing of the raw data is required and if so, preprocessing is performed.

  • Deliverables: Trimmed 
    • Trimmed/filtered fastq files.
  • Tools Used:
    • Fastx-toolkit: Used to preprocess fastq files.
      • Fastq quality trimmer: Trimming reads based on quality.
      • Fastq quality filter: Filtering reads based on quality.
    • Cutadapt: Used to remove adaptor from reads.

3. Mapping

Mapping to transcriptome reference performed using Kallisto pseudomapper or mapping to genome reference performed using BWA-mem or TophatHISAT2.

  • Deliverables: 
    • Mapping results, as bam files (when mapped using HISAT2) and mapping statistics.
  • Tools Used:
      BWA-mem
      • Kallisto(
      Li 2013) primary aligner used to generate read alignments.Tophat
      • Bray 2016) pseudoaligner and RNA-Seq quantification tool
      • HISAT2: (Kim
      2011
      • 2015)
       aligner
      • aligner used to generate read alignments in a splice-aware manner and identify novel junctions.
      • Samtools: (Li 2009) used to generate mapping statistics.

    4. Gene/Transcript Counting

    Counting the number of reads mapping to annotated intervals to obtain abundance of genes/transcripts.

    • Deliverables: Raw  
      • Raw gene/transcript counts
      • Variance stabilized gene/transcript counts
    • Tools Used:
      • Kallisto: (Bray 2016) pseudoaligner and RNA-Seq quantification tool 
      • HTSeq-count: (Anders 2014) used to count reads overlapping gene intervals.

    5. DEG Identification

    Normalization and statistical testing to identify differentially expressed genes.

    • Deliverables: DEG  
      • DEG Summary and master file containing fold changes and p values for every gene
      , MA Plots
      • .
    • Tools Used:
      • DESeq2: (Love 2014) used to perform normalization and test for differential expression using the negative binomial distribution.

5. Visualizations

Standard visualizations of the RNA-Seq data using in-house R Scripts. 

Deliverables: 

...

 

 

...

  • Sample dendogram 
  • Sample-Sample correlation plot
  • Pair plot: Matrix of scatter plots showing relationship of every sample metadata variable to every other variable.
  • Expression heatmap with clustering of samples
  • Volcano plot : Scatter plot of fold-change versus significance
  • Box plots of top 10 upregulated and top 10 downregulated genes.
  • PCA plot: Orthogonal transformation of the data to look at underlying structure of data.