...
Expand | ||
---|---|---|
| ||
The sorted.bam and sorted.bam.bai files can be used (along with the reference of course) with a mapping visualization program. There are several, but I'll be showing you the Broad Institute's Integrative Genomics Viewer (IGV). Steps (I'm showing you how to do this from scratch on your own LOCAL computer - there are other ways...):
|
Expand | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
OK, now we have reads aligned to a genome. How can we tell which genes those reads belonged to? We need to use the gene annotations for that genome of course! That's the REL606.gff file. One way to do this is to use a program which simply intersects features in the gene annotation file (i.e. genes) with the piled-up reads. We'll use bedtools to do this.
OK, but I'm lazy - I want the computer to do this on all 6 files please...
That's pretty good, but that output file isn't really what we want for concise gene expression measures. We'd like to skinny it down - remove columns we don't need and only look at CDS elements:
And now I'd like to just have one table (Excel!!) with all the gene expression values. Here are some tricks to do that pretty efficiently:
|
Expand | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
| Normalize (only for mapped reads in this case)||||||||||
Now, we'll switch from running bash commands to running commands within the R statistical package. Move into the "finaldata" directory and start R like this:
You should now see a ">" prompt instead of your linux prompt, telling you that you are now in an R shell, not a bash shell. (You type "q()" to exit the R shell). Load some libraries and the raw data and do some basic transforms of the raw data to get it ready for analysis in R
Check Principle Components Analysis (PCA)
To view the plot you just created, go to the "Data" tab in Appsoma, navigate to your scratch/finaldata area and box plots Calculate fold-change Log transform Volcano plotdownload the Rplots.pdf file. Check a box plot
That's not very interesting or useful - we're plotting gene expression data on a linear scale! Let's go to log scale, fixing some issues with the raw data that would throw off the log calculation:
|
For your homework, you will investigate the validity of combining data files from different sequencing runs. Only a few of these questions require working at a computer keyboard, but I encourage you to work in groups to solve the entire set of questions.
...