Exercise time

DO NOT RUN THESE ON IDEV MODE BECAUSE THERE IS JOB SUBMISSION INVOLVED
Let's put some of those new skills you've learned to test...  You may have to go back to prior lessons to dig out some commands.  You may also have to visit external sites for documentation on tools etc.

 

Get set up
cds
cd my_rnaseq_course
cp -r /corral-repl/utexas/BioITeam/rnaseq_course/exercises .
 
 
cd exercises

 

A) We have the fastq file, test.fastq.  Can you find out how many reads we have in this fastq file? Can you think of multiple ways to do this?

 

B) The instructions asked you to copy the directory exercises. But I left out one file. Can you copy that over to your exercises directory?  

The file is at 

/corral-repl/utexas/BioITeam/rnaseq_course/C
 Hint

Do not copy the whole directory over because you will rewrite the current directory.

 

C) We are concerned that a weird artifact sequence may be in our data- ACTACCGATCCA Can you find out what proportion of our reads have this artifact?

 

D) We want to trim this sequence out from our data? Can you do that?

 Hint

Use Fastx_toolkit! Here's the page where we covered that: FASTQ MANIPULATION TOOLS


E) Ok we’ve mapped the data using top hat. We have a bam file (test.bam) and annotation file (genes.formatted.gtf) and want to assemble novel and annotated transcripts using cufflinks. Can you submit a cufflinks job that to lonestar? Of course we are not utilizing lonestar fully because we are just submitting one job, but this is for practice.

 Hint

Having trouble constructing the cufflinks command to use? Load the module and type cufflinks to see the options or look at commands we've used before: Tuxedo Suite For Splice Variant Analysis and Identifying Novel Transcripts II 2014

module load cufflinks

cufflinks

Having trouble with submitting jobs to lonestar. Remember the three steps:

create commands file

create launcher using launcher_creator.py

submit the job using qsub


F) Changing directions a bit, lets go do some parsing of our alignment output file,  test.bam.  I'm concerned about the region in 2L chromosome between 7620-7700. Could you pull out just the alignments in this region from the bam file?

 

 Hint

Samtools has all things needed to parse sam and bam files. You may need to go into the samtools manual (http://samtools.sourceforge.net/samtools.shtml) to figure out how to do this.

 

G) I can't really see what's going on just by looking at the alignments. Can you view this region on IGV to get a clue of what may be happening? This data is from DM3 genome.

 Hint

You'll need to transfer files. Review the first day's lessons if you don't remember how.


If you want to venture even further...

H) Ok we've got some results from a differential expression tool- DESeq_output.csv Can you pull out the top 10 changing genes from this (criteria is pvalue<=0.05, up or downregulated with abs(log2 fold change) >= 1 ) 

 Hint

Using grep and awk on the columns that represent log2 fold change and significant will give you the result