Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This is an RNA-Seq analysis pipeline using an annotated genome and it consists of the following steps:This pipeline uses an annotated genome to identify differential expressed genes/transcripts. 15 hour minimum ($1470 internal, $1860 external) per project.

1. Quality Assessment

...

Quality of data assessed by FastQC; results of quality assessment will be evaluated prior to downstream analysis.

  • Deliverables:
    • reports generated by FastQC

...

...

  • Tools

...

  • used:

      ...

        • FastQC(Andrews 2010) used to generate quality summaries of data:
          • Per base sequence quality report: useful for deciding if trimming necessary.
          • Sequence duplication levels: evaluation of library complexity. Higher levels of sequence duplication may be expected for high coverage RNAseq data.
          • Overrepresented sequences: evaluation of adapter contamination.

      2. Fastq Preprocessing

      If required, preprocessing of fastq files Quality assessment used to decide if any preprocessing of the raw data is required and if so, preprocessing is performed.

      ...

      • Deliverables:

      ...

      •  
        • Trimmed/filtered fastq files.
      • Tools Used:

          ...

            • Fastx-toolkit: Used to preprocess fastq files.
              • Fastq quality trimmer: Trimming reads based on quality.
              • Fastq quality filter: Filtering reads based on quality.

          ...

            • Cutadapt: Used to remove adaptor from reads.

          3. Mapping

          Mapping to transcriptome reference performed using Kallisto pseudomapper or mapping to genome reference performed using BWA-mem or TophatHISAT2.

          *
          • Deliverables: 
            • Mapping results, as bam files (when mapped using HISAT2) and mapping statistics.
          • Tools Used:
              *BWA-mem
                • Kallisto(
              Li 2013) primary aligner used to generate read alignments.*Tophat
                • Bray 2016) pseudoaligner and RNA-Seq quantification tool
                • HISAT2: (Kim
              2011
                • 2015)
               aligner
                • aligner used to generate read alignments in a splice-aware manner and identify novel junctions.
              *
                • Samtools: (Li 2009) used to generate mapping statistics.

              4. Gene/Transcript Counting

              Counting the number of reads mapping to annotated intervals to obtain abundance of genes/transcripts.

              *
              • Deliverables:
               Raw
              •  
                • Raw gene/transcript counts
                • Variance stabilized gene/transcript counts
              • Tools Used:
                • Kallisto:
              *
                •  (Bray 2016) pseudoaligner and RNA-Seq quantification tool 
                • HTSeq-count: (Anders 2014) used to count reads overlapping gene intervals.

              5. DEG Identification

              Normalization and statistical testing to identify differentially expressed genes.

              *
              • Deliverables:
               DEG
              •  
                • DEG Summary and master file containing fold changes and p values for every gene
              , MA Plots
                • .
              • Tools Used:
                  *
                    • DESeq2: (Love 2014) used to perform normalization and test for differential expression using the negative binomial distribution.

              5. Visualizations

              Standard visualizations of the RNA-Seq data using in-house R Scripts. 

               

               

               

              ...

              Deliverables

              • Sample dendogram 
              • Sample-Sample correlation plot
              • Pair plot: Matrix of scatter plots showing relationship of every sample metadata variable to every other variable.
              • Expression heatmap with clustering of samples
              • Volcano plot : Scatter plot of fold-change versus significance
              • Box plots of top 10 upregulated and top 10 downregulated genes.
              • PCA plot: Orthogonal transformation of the data to look at underlying structure of data.