Differential expression with splice variant analysis aug2012
Differential expression with splice variant analysis at the same time: the Tuxedo pipeline
The Tuxedo Pipeline is a suite of tools for RNA-seq analysis, also known as the Tophat/Cufflinks workflow. It can be run in a variety of ways, optionally including de novo splice variant discovery. If an adequate set of splice variants is also available, it can also be run without splice variant detection to perform simple differential gene expression.
Resources
Useful RNA-seq resources are summarized on our Resources tool list, Transcriptome analaysis section. The most important of these resources for Tuxedo are:
- the original RNAseq analysis protocol using Tuxedo article in Nature Protocols, and
- the URL for Tuxedo resource bundles for selected organisms (gff annotations, pre-built bowtie references, etc.)
- the example data we'll use for this tutorial came from this experiment which has the raw fastq data in the SRA.
Objectives
In this lab, you will explore a fairly typical RNA-seq analysis workflow using the Tuxedo pipeline. Simulated RNA-seq data will be provided to you; the data contains 75 bp paired-end reads that have been generated in silico to replicate real gene count data from Drosophila. The data simulates two biological groups with three biological replicates per group (6 samples total). This simulated data has already been run through a basic RNA-seq analysis workflow. We will look at:
- How the workflow was run and what steps are involved.
- What genes and isoforms are significantly differentially expressed
Introduction
Overall Workflow Diagram
This overview of tophat cufflinks workflow Diagram outlines the Tuxedo pipeline. We have annotated the image from the original paper to include the important file types at each stage, and to note the steps skippin in the "fast path" (no de novo junction assembly).
This is the full workflow that includes de novo splice variant detection. For simple differential gene expression, Steps 2 (cufflinks) and 3-4 (cuffmerge) can be omitted.
Tuxedo data requirements
What is required for this pipeline?
- One or more datasets (best with at least two biological replicates and at least two conditions) in fastq format
- A reference genome, indexed for the Bowtie or Bowtie2 aligner (see this Tophat Resource Bundles page)
- Optionally, a set of known splice variants in the form of a GTF (gene transfer format) or GFF (gene feature format) file. These are also packaged as part of the resource bundles.
Paths through the Tuxedo workflow
There are three major paths through this w
Welcome to the University Wiki Service! Please use your IID (yourEID@eid.utexas.edu) when prompted for your email address during login or click here to enter your EID. If you are experiencing any issues loading content on pages, please try these steps to clear your browser cache. If you require further assistance, please email wikihelp@utexas.edu.