Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Learning Objectives

  • Run velvet to perform de novo assembly on fragment, paired-end, and mate-paired data.
  • Use contig_stats.pl to display assembly statistics.
  • Find proteins of interest in an assembly using Blast.

...

  1. Get a better assembly: maybe add a different library size, or go into a detailed genome completion project (commonly called "finishing") using a sequence assembly editor like consed or gap4 or AMOS. (Be careful though, the amount of data in NGS data sets can be very difficult for these programs to deal with, since many were designed for Sanger sequencing reads.) You may have a lot of PCR products to make to close gaps and/or to order and orient scaffolds. consed in particular makes this pretty easy, but it may still consume a lot more time and money than the initial shotgun assembly. You can identify some misassemblies by mapping the original reads to the assembly and then viewing them in IGV to look for discordant mate pairs, for example.
  2. Look for things: If you're just after a few homologs, an operon, etc. you're probably done. Most assemblers will be able to take 2x100 data and give you full gene sequences since these are non-repetitive and so assemble well. You can turn the contigs.fa into a blast database (formatdb or makeblastdb depending on which version of blast you have) and start blasting away.

...