Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

This brief tutorial will walk you through data analysis of an RNA-seq experiment.

In this experiment, E. coli was inoculated into culture and the culture was then sampled at 4 hours and 24 hours post inoculation.  The experiment was run in triplicate.

RNA was extracted from the 6 samples, fragmented, and sequenced.   All sequencing runs were of the paired-end 2x100 type, so each RNA fragment is read from both ends, 100 bp from each end.

Here is a table showing the data we have:

SampleConditionReplicateSequencing RunsData Files
MURI_174 hr1SA13172MURI_17_SA13172_ATGTCA_L007
MURI_264 hr2SA14027MURI_26_SA14027_TTAGGC_L006
MURI_984 hr3SA14008MURI_98_SA14008_TTAGGC_L005, MURI_98_SA14008_TTAGGC_L006
MURI_2124 hr1SA13172MURI_21_SA13172_GTGGCC_L007
MURI_3024 hr2SA14027MURI_30_SA14027_CAGATC_L006
MURI_10224 hr3SA14008, SA14032MURI_102_SA14008_CAGATC_L005, MURI_102_SA14008_CAGATC_L006, MURI_102_SA14032_CAGATC_L006

 

In class, we will explore and characterize the raw data.  Here are some elements (programs & techniques) we may use (you will need some of these for the homework):

 Assessing the raw data...

Is this E. coli?

 

Is this E. coli RNA?

 

Does this look like RNA-seq data?

 

Assessing the quality of the raw data:

 

 Mapping data to a reference genome...

To map a pair of data files to the reference E. coli genome:

 

 

 

 

 

 Assessing the read mapping...

Assessing the read-mapping:

 

 Counting reads per gene... (also called "count" data)

Counting the number of sequence reads within a gene, for all genes in the genome:

 Analyze the count data

Normalize (only for mapped reads in this case) 

Check Principle Components Analysis (PCA) and box plots

Calculate fold-change

Log transform

Volcano plot

 

 

 

For your homework, you will investigate the validity of combining data files from different sequencing runs.  Only a few of these questions require working at a computer keyboard, but I encourage you to work in groups to solve the entire set of questions.

  1. Based on what you learned about the T-test (that is, using terms associated with a T-test), explain what criteria you might use to consider it "invalid" to combine the multiple raw sequence data files from samples.
  2. Outline the steps needed to reduce the raw data to numbers suitable for evaluation of your criteria in question #1
  3. Perform the steps you outlined in #2 and tell whether or not it was valid to combine the data files
  4. Starting with the raw "count" data, explore the effect on PCA of NOT normalizing.

 

  • No labels