Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

 

Code Block
titleStarting R and loading modules after they are installed
login1$ R
> library("DESeq")
> library("edgeR")

These commands will work for any Bioconductor package!

DESeq

Input:

DESeq takes as input count data in table form, with each column representing a biological replicate/biological condition.  The count data must be raw counts of sequencing reads, not already normalized data. 

Example:

                  untreated1        untreated2        untreated3        untreated4       treated1         treated2       treated3

        FBgn0000003    0                       0                       0                       0                       0                    0                  1
        FBgn0000008    92                     161                   76                     70                     140                88                70
        FBgn0000014    5                       1                       0                       0                       4                    0                  0

Our gene_counts.gff file is almost there- we can remove some extra gene related information in the counts file to make it less complex.

Code Block
titleUsing DESeq
login1$ R
> library("DESeq")
> counts = read.delim("gene_counts.htseq.final.gff", header=F, row.names=1)
> head(counts)
> colnames(counts) = c("C11","C12","C13","C21","C22","C23")
> head(counts)
> my.design <- data.frame(
  row.names = colnames( counts ),
  condition = c( "C1", "C1", "C1", "C2", "C2", "C2"),
  libType = c( "paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end" )
)
> my.design
> conds <- factor(my.design$condition)
> cds <- newCountDataSet( counts, conds )
> cds

> cds <- estimateSizeFactors( cds )
> sizeFactors( cds )
> cds <- estimateDispersions( cds )

> result <- nbinomTest( cds, "C1", "C2" )
> head(result)
> result = result[order(result$pval), ]
> head(result)
> write.csv(result, "DESeq-C1-vs-C2.csv")


> pdf("DESeq-MA-plot.pdf")
> plot(
  result$baseMean,
  result$log2FoldChange,
  log="x", pch=20, cex=.3,
  col = ifelse( result$padj < .1, "red", "black" ) )
> dev.off()

> q()
Save workspace image? [y/n/c]: n
login1$ head DESeq-wt-vs-mut.csv

edgeR

These commands use the negative binomial model, calculate the false discovery rate (FDR ~ adjusted p-value), and make a plot similar to the one from DESeq.

Code Block
titleUsing edgeR
login1$ R
...
> library("edgeR")
> counts = read.delim("gene_counts.htseq.gff", header=F, row.names=1)
> colnames(counts) = c("C11", "C12", "C13", "C21", "C22", "C23")
> head(counts)
> group <- factor(c("C1", "C1", "C1", "C2", "C2", "C2"))
> dge = DGEList(counts=counts,group=group)
> dge <- estimateCommonDisp(dge)
> dge <- estimateTagwiseDisp(dge)
> et <- exactTest(dge)
> etp <- topTags(et, n=100000)
> etp$table$logFC = -etp$table$logFC

> pdf("edgeR-MA-plot.pdf")
> plot(
  etp$table$logCPM,
  etp$table$logFC,
  xlim=c(-3, 20), ylim=c(-12, 12), pch=20, cex=.3,
  col = ifelse( etp$table$FDR < .1, "red", "black" ) )
> dev.off()

> write.csv(etp$table, "edgeR-wt-vs-mut.csv")
> q()
Save workspace image? [y/n/c]: y
login1$ head edgeR-wt-vs-mut.csv

Note that the "FC" fold change calculated is initially the reverse of that for the DESeq example for the output here. It is wt relative to mut. To fix this, we put a negative in there for the log fold change.

DEXSeq

This package is meant for finding differential exon usage between samples from different conditions.

Relative usage of an exon  =   transcripts from the gene that contain this exon / all transcripts from the gene 

For each exon (or part of an exon) and each sample :

  1. count how many reads map to this exon
  2. count how many reads map to other exons of the same gene.
  3. calculate ratio of 1 to 2. 
  4. Look for changes in this ratio across conditions
  5. Look for statistically significant changes in this ratio across conditions, by using replicates.

This lets you identify changes in alternative splicing, changes in usage of alternative transcript start sites.

Cuffdiff

Cuffdiff (a part of the tuxedo suite) is a popular tool for testing for differential expression.  We will cover this along with the rest of the tuxedo suite tomorrow. 

 

The graphs we generated are located at:

http://web.corral.tacc.utexas.edu/BioITeam/rnaseq_course/DESeq-MA-plot.pdf

http://web.corral.tacc.utexas.edu/BioITeam/rnaseq_course/edgeR-MA-plot.pdf