Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Let's recap what we learned on Day 3:

APart 1. Annotated genes/transcripts

...

  • When mapping to the genome (using a tool like hisat2), use a tool (eg: bedtools, htseq, ballgownstringtie) to get gene/transcript counts from mapping results.Transcript counting is tricky: It involves assigning a read to a specific transcript of a gene and counting it towards that transcript.
  • When mapping to the transcriptome using kallisto) , you do not need to do an extra step of gene/transcript counting. 

...

  • DESEQ2 will take in raw count data as input, along with sample metadata.  It has convenient readers to read in kallisto and htseq counts, but really you can read in your own gene  expression matrix as well.
  • The design formula tells DESEQ2 what condition you are testing:  ~condition would test for differences among conditions; ~batch + condition would test for differences in condition while controlling for batch. DESEQ2 vignette gives you details on how you can tweak this design formula.
  • DESEQ2 will perform normalization, calculate dispersion, and calculate results will contain log2 fold changes , p values and adjusted p values for every gene.
  • Impose cutoffs on fold change and adjusted p value to get significantly differentially expressed genes.

...

  • MA plots, heat maps, PCA are all good ways to visualize your gene expression data. Make sure to use normalized, log transformed data for these visualizations.

BPart 2. Novel transcripts 

Use a pipeline of hisat2 (mapping to genome), stringtie (transcript assembly, quantification), and ballgown (differential expression testing)

  • If you want to identify novel transcripts in your particular samples
  • If you want to look for differential expression in these novel transcripts



BACK TO COURSE OUTLINE