Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Based on what you learned about the T-test (that is, using terms associated with a T-test), explain what criteria you might use to consider it "invalid" to combine the multiple raw sequence data files from samples.  (5 points)
  2. Outline the steps needed to reduce the raw data to numbers suitable for evaluation of your criteria in question #1. (5 points)
  3. Perform the steps you outlined in #2 and tell whether or not it was valid to combine the data files. (20 points)
  4. Starting with the raw "count" data, explore the effect on PCA of NOT normalizing.  Turn in a print out of the new PCA plot. (10 points)
  5. Although we did not explore this in class, DNA mutations were automatically tallied during our mapping process.  These results are in the files ending in ".bcf".  Using the tool "bedtools" to view these results, test the hypothesis that transitions are more common than transversions.  Support your answer with data from this experiment. (20 points)
  6. Continuing with the mutation analysis, examine whether the mutation frequency in this sample set differs between protein coding and non-protein coding regions of the genome. Support your answer with data from this experiment. (30 points).

Email your answers/PDFs to shunickesmith <at> gmail.com, cc: Prof. Matouschek no later than Tuesday 12/2, 10:00 am (BETTER: before Thanksgiving break).