Overview
In this lab, we'll look at how to use cummeRbund to visualize our gene expression results from cuffdiff. CummeRbund is part of the tuxedo pipeline and it is an R package that is capable of plotting the results generated by cuffdiff.
What can you do with cummeRbund?
Get your data
cds cd my_rnaseq_course cp -r /corral-repl/utexas/BioITeam/rnaseq_course/cuffdiff_results . #We need the cuffdiff results because that is the input to cummeRbund.
A) First load R and enter R environment
#module load R will not work for us because we need the latest version of R. So, we'll load a local version. $BI/bin/R What is $BI here?
B) Within R environment, install package cummeRbund
source("http://bioconductor.org/biocLite.R") biocLite("cummeRbund")
C) Load cummeRbund library and read in the differential expression results. If you save and exit the R environment and return, these commands must be executed again.
library(cummeRbund) cuff_data <- readCufflinks('cuffdiff_results')
D) Use cummeRbund to visualize the differential expression results.
NOTE: Any graphic outputs will be automatically saved as "Rplots.pdf" which can create problems when you want to create multiple plots with different names in the same process. To save different plots with different names, preface any of the commands below with the command:
pdf(file="myPlotName.pdf")
And after executing the necessary commands, add the line:
dev.off()
Exercise 1: Generate a scatterplot comparing genes across conditions C1 vs C2.
pdf(file="scatterplot.pdf") csScatter(genes(cuff_data), 'C1', 'C2') dev.off()
Let's look at our scatterplot here.
Exercise 2: Pull out from your data, significantly differentially expressed genes and isoforms.
gene_diff_data <- diffData(genes(cuff_data)) sig_gene_data <- subset(gene_diff_data, (significant == 'yes')) #Count how many we have nrow(sig_gene_data) isoform_diff_data <-diffData(isoforms(cuff_data)) sig_isoform_data <- subset(isoform_diff_data, (significant == 'yes')) #Count how many we have nrow(sig_isoform_data)
Exercise 3: For a gene, regucalcin, plot gene level expression.
pdf(file="regucalcin.pdf") mygene1 <- getGene(cuff_data,'regucalcin') expressionBarplot(mygene1) dev.off()
expressionBarplot(isoforms(mygene1))
pdf(file="rala.pdf") mygene2 <- getGene(cuff_data, 'Rala') expressionBarplot(mygene2) expressionBarplot(isoforms(mygene2)) dev.off()
Take cummeRbund for a spin...
CummeRbund is powerful package with many different functions. Above was an illustration of a few of them. Try any of the suggested exercises below to further explore the differential expression results with different cummeRbund functions.
If you would rather just look at the resulting graphs, they are at the URL: http://loving.corral.tacc.utexas.edu/bioiteam/tophat_cufflinks/ as exercise5_Rplots.pdf, exercise6_Rplots.pdf, and exercise7_Rplots.pdf.
You can refer to the cummeRbund manual for more help and remember that ?functionName will provide syntax information for different functions.
http://compbio.mit.edu/cummeRbund/manual.html
You may need to redo Step C) when you reopen an R session.
Exercise 5: Visualize the distribution of fpkm values across the two different conditions using a boxplot.
Exercise 6: Visualize the significant vs non-significant genes using a volcano plot.
Exercise 7: Pull out a subset of the genes using a ln_fold_change greater than 1.5. Generate expression bar plots for just those genes.
If you have trouble sourcing cummeRbund, try this:
module swap gcc intel
Reenter R and redo above steps.