Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Spliced mapping: Tophat. We used tophat to map reads from two conditions C1 and C2 to our genome.
    1. Spliced mapping is more conducive for rna-seq data.
    2. Spliced alignments look different from unspliced alignments in their cigar scores ("N")
  2. How to convert mapping results (spliced or unspliced) to gene counts? We looked at two tools which count overlaps of reads to known genes.
    1. Inputs: mapping output (sam or bam file) and annotation file (gff/gtf file)
    2. Bedtools- for simple counting. Any time a read overlaps a gene, it's counted towards that gene.
    3. HtSeq - for fine tuned counting. You can choose how you want to count reads that map only partially to a gene, that map to multiple genes etc.  

      Code Block
      titleOutput File Example
      FBgn0000008 304 311 273 264 296 296
      FBgn0000014 47 40 39 36 63 43
      FBgn0000015 41 35 28 22 35 35
    4. Output:  Gene id, following by raw counts for that gene.

  3. How to take the gene counts for different conditions and compare then to identify genes that are differentially expressed?

    1. DESeq, edgeR, DEXSeqLots of R packages to do this: DESeq2, edgeR are the most commonly used.
    2. normalize, calculate variance, statistical test, output genes along with fold change, p value, FDR 
    3. R- how to install packages, run R commands and plot graphsDESEQ2 run: read in htseq output, specified the design (i.e. conditions we want to compare against and the levels in the conditions), made a DESEQ object, ran negative binomial test, got back a csv file with log2 fold changes, pvalues and adjusted pvalues for each gene in our input list.

  4. We learned some unix as well!

    1. Very useful commands like sed, grep, awk, cut and wc.

    BACK TO COURSE OUTLINE