...
- Map the raw data back and examine from both the perspective of the assembly (distribution of coverage, gaps, "variants" (which should agree with the polidy of your organism), etc) and from the perspective of the raw data (i.e. if only 10% of your raw data maps to your assembled transcripts, you should not be feeling particularly proud.)
- Similarity searching: to proteins of similar species, to the Conserved Domain Database which uses rpsblast, available on Lonestar, to nr (which then opens up gene ontology matching), etc. Note that nr and Cdd can be found on Lonestar at
/corral-repl/utexas/BioITeam/blastdb
. - Boot up igv and poke around:
- "Import genome" and load the final transcripts.fa file
- Load your blast results by first converting them to gff files as explained on this page (be sure to expand text...)
- Load the raw data mapping results
- Do some differential expression by simply counting the number of reads hitting each transcript. You'd be wise to make sure your mapper does something useful with reads that hit multiple transcripts. I like to run the analysis twice: once having all non-unique mapping reads placed everywhere possible, and once with all non-unique reads removed completely. If the DE results agree between these two, you have comfort that this hasn't fooled you.