Page Comparison

...

Code Block

title	Starting R and loading modules after they are installed

login1$ R
> library("DESeq")
> library("edgeR")

These commands will work for any Bioconductor package!

DESeq

Input:

DESeq takes as input count data in table form, with each column representing a biological replicate/biological condition. The count data must be raw counts of sequencing reads, not already normalized data.

Example:

untreated1 untreated2 untreated3 untreated4 treated1 treated2 treated3

FBgn0000003 0 0 0 0 0 0 1
FBgn0000008 92 161 76 70 140 88 70
FBgn0000014 5 1 0 0 4 0 0

Our gene_counts.gff file is almost there- we can remove some extra gene related information in the counts file to make it less complex.

Code Block

title	Using DESeq

login1$ R
#LOAD LIBRARY, READ IN GENE COUNTS, ADD METADATA
> library("DESeq")
> counts = read.delim("gene_counts.htseq.final.gff", header=F, row.names=1)
> head(counts)
> colnames(counts) = c("C11","C12","C13","C21","C22","C23")
> head(counts)
> my.design <- data.frame(
  row.names = colnames( counts ),
  condition = c( "C1", "C1", "C1", "C2", "C2", "C2"),
  libType = c( "paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end" )
)
> my.design
> conds <- factor(my.design$condition)
 
 
#CREATE A DESEQ DATA OBJECT FROM COUNTS
> cds <- newCountDataSet( counts, conds )
> cds

 
#NORMALIZATION: ESTIMATE SIZE FACTORS
> cds <- estimateSizeFactors( cds )
> sizeFactors( cds )
> head( counts( cds, normalized=TRUE ) )

 
#ESTIMATE DISPERSION/VARIANCE
> cds <- estimateDispersions( cds )


#DO TEST FOR DIFFERENTIAL EXPRESSION AND WRITE RESULTS INTO FILE
> result <- nbinomTest( cds, "C1", "C2" )
> head(result)
> result = result[order(result$pval), ]
> head(result)
> write.csv(result, "DESeq-C1-vs-C2.csv")


#GENERATE MA PLOT
> pdf("DESeq-MA-plot.pdf")
> plot(
  result$baseMean,
  result$log2FoldChange,
  log="x", pch=20, cex=.3,
  col = ifelse( result$padj < .1, "red", "black" ) )
> dev.off()
> q()

Save workspace image? [y/n/c]: n
login1$ head DESeq-wt-vs-mut.csv

edgeR

These commands use the negative binomial model, calculate the false discovery rate (FDR ~ adjusted p-value), and make a plot similar to the one from DESeq.

Code Block

title	Using edgeR

login1$ R

#LOAD LIBRARY, READ IN COUNTS, ADD SOME METADATA
> library("edgeR")
> counts = read.delim("gene_counts.htseq.gff", header=F, row.names=1)
> colnames(counts) = c("C11", "C12", "C13", "C21", "C22", "C23")
> head(counts)
> group <- factor(c("C1", "C1", "C1", "C2", "C2", "C2"))
 
 
#CREATE EDGER SPECIFIC DATA OBJECT FROM COUNTS DATA
> dge = DGEList(counts=counts,group=group)
 
 
#ESTIMATE DISPERSIONS(VARIANCE)
> dge <- estimateCommonDisp(dge)
> dge <- estimateTagwiseDisp(dge)
 
 
#DO TEST FOR DIFFERENTIAL EXPRESSION
> et <- exactTest(dge)
> etp <- topTags(et, n=100000)

 
#GENERATE MA PLOT
> pdf("edgeR-MA-plot.pdf")
> plot(
  etp$table$logCPM,
  etp$table$logFC,
  xlim=c(-3, 20), ylim=c(-12, 12), pch=20, cex=.3,
  col = ifelse( etp$table$FDR < .1, "red", "black" ) )
> dev.off()

 
#WRITE OUT GENE RESULTS
> write.csv(etp$table, "edgeR-wt-vs-mut.csv")
> q()
Save workspace image? [y/n/c]: y
login1$ head edgeR-wt-vs-mut.csv

DEXSeq

This package is meant for finding differential exon usage between samples from different conditions.

Relative usage of an exon = transcripts from the gene that contain this exon / all transcripts from the gene

For each exon (or part of an exon) and each sample :

count how many reads map to this exon
count how many reads map to other exons of the same gene.
calculate ratio of 1 to 2.
Look for changes in this ratio across conditions
Look for statistically significant changes in this ratio across conditions, by using replicates.

This lets you identify changes in alternative splicing, changes in usage of alternative transcript start sites.

Cuffdiff

Cuffdiff (a part of the tuxedo suite) is a popular tool for testing for differential expression. We will cover this along with the rest of the tuxedo suite tomorrow.

Lets look at our results

Code Block

title	Look at results

head DESeq-wt-vs-mut.csv
head edgeR-wt-vs-mut.csv

Find the top 10 upregulated genes in both sets

Code Block

title	Find the top 10 upregulated genes

login1$#DESeq Rresults
> headsed 's/,/\t/g' DESeq-wtC1-vs-mutC2.csv|sort >-n head-r edgeR-wt-vs-mut.csv

-k6,6|cut -f 2,6|head

#edgeR results
sed 's/,/\t/g' edgeR-wt-vs-mut.csv |sort -n -r -k2,2|cut -f 1,2|head

2. Select DEGs with following cut offs- Fold Change > 2 (this means log 2 fold change > 1) and p value < 0.05 and count how many DEGs we have.

Code Block

title	Count the number of DEG

#We will use log 2 fold change instead of fold change because edgeR doesnt report fold change
#DESeq results
sed 's/,/\t/g' DESeq-C1-vs-C2.csv|awk '{if (($7>=1)&&($8<=0.05)) print $1,$7,$8}'|head
sed 's/,/\t/g' DESeq-C1-vs-C2.csv|awk '{if (($7>=1)&&($8<=0.05)) print $1,$7,$8}'|wc -l

#edgeR results
sed 's/,/\t/g' edgeR-wt-vs-mut.csv |awk '{if (($2>=1)&&($3<=0.05)) print $1,$2,$3}'|head
sed 's/,/\t/g' edgeR-wt-vs-mut.csv |awk '{if (($2>=1)&&($3<=0.05)) print $1,$2,$3}'|wc -l

The graphs we generated are located at:

http://web.corral.tacc.utexas.edu/BioITeam/rnaseq_course/DESeq-MA-plot.pdf

http://web.corral.tacc.utexas.edu/BioITeam/rnaseq_course/edgeR-MA-plot.pdf

Versions Compared

Old Version 29

New Version 30

Key

DESeq

edgeR

DEXSeq

Cuffdiff

Lets look at our results