Day 2 Take Away Points

Let's recap what we learned yesterday:

We looked at finding differentially expressed genes when we are not interested in novel genes.

Spliced mapping: Tophat. We used tophat to map reads from two conditions C1 and C2 to our genome.
1. Spliced mapping is more conducive for rna-seq data.
2. Spliced alignments look different from unspliced alignments in their cigar scores ("N")
How to convert mapping results (spliced or unspliced) to gene counts? We looked at two tools which count overlaps of reads to known genes.
1. Inputs: mapping output (sam or bam file) and annotation file (gff/gtf file)
2. Bedtools- for simple counting. Any time a read overlaps a gene, it's counted towards that gene.
3. HtSeq - for fine tuned counting. You can choose how you want to count reads that map only partially to a gene, that map to multiple genes etc.
  Output File Example
```
FBgn0000008 304 311 273 264 296 296
FBgn0000014 47 40 39 36 63 43
FBgn0000015 41 35 28 22 35 35
```
4. Output: Gene id, following by raw counts for that gene.
How to take the gene counts for different conditions and compare then to identify genes that are differentially expressed?
1. DESeq, edgeR, DEXSeq
2. normalize, calculate variance, statistical test, output genes along with fold change, p value, FDR
3. R- how to install packages, run R commands and plot graphs.
We learned some unix as well!
1. Very useful commands like sed, grep, awk, cut and wc.
BACK TO COURSE OUTLINE