/
Differential expression with splice variant analysis aug2012

Differential expression with splice variant analysis aug2012

Differential expression with splice variant analysis at the same time: the Tuxedo pipeline

The Tuxedo Pipeline is a suite of tools for RNA-seq analysis, also known as the Tophat/Cufflinks workflow. It can be run in a variety of ways, optionally including de novo splice variant discovery. If an adequate set of splice variants is also available, it can also be run without splice variant detection to perform simple differential gene expression.

Resources

Useful RNA-seq resources are summarized on our Resources tool list, Transcriptome analaysis section. The most important of these resources for Tuxedo are:

  1. the original RNAseq analysis protocol using Tuxedo article in Nature Protocols, and
  2. the URL for Tuxedo resource bundles for selected organisms (gff annotations, pre-built bowtie references, etc.)
  3. the example data we'll use for this tutorial came from this experiment which has the raw fastq data in the SRA.

Objectives

In this lab, you will explore a fairly typical RNA-seq analysis workflow using the Tuxedo pipeline. Simulated RNA-seq data will be provided to you; the data contains 75 bp paired-end reads that have been generated in silico to replicate real gene count data from Drosophila. The data simulates two biological groups with three biological replicates per group (6 samples total). This simulated data has already been run through a basic RNA-seq analysis workflow. We will look at:

  1. How the workflow was run and what steps are involved.
  2. What genes and isoforms are significantly differentially expressed

Introduction

Overall Workflow Diagram

This overview of tophat cufflinks workflow Diagram outlines the Tuxedo pipeline. We have annotated the image from the original paper to include the important file types at each stage, and to note the steps skippin in the "fast path" (no de novo junction assembly).

This is the full workflow that includes de novo splice variant detection. For simple differential gene expression, Steps 2 (cufflinks) and 3-4 (cuffmerge) can be omitted.

Tuxedo data requirements

What is required for this pipeline?

  • One or more datasets (best with at least two biological replicates and at least two conditions) in fastq format
  • A reference genome, indexed for the Bowtie or Bowtie2 aligner (see this Tophat Resource Bundles page)
  • Optionally, a set of known splice variants in the form of a GTF (gene transfer format) or GFF (gene feature format) file. These are also packaged as part of the resource bundles.

Paths through the Tuxedo workflow

There are three major paths through this w