GS De novo assembler

Summary

Performs assembly of reads and generates contigs. Current Version 2.5.3. Full Roche manual.

Input options

  • Sff files
  • Fasta files
  • Converted Sanger data- Fasta files and corresponding Quality files

Output options

  • Consensus sequence (contigs)
  • Corresponding quality scores
  • ACE files
  • Assembly metrics files
  • Pairwise alignments
  • Read status file
  • Alignment views - GUI only
  • Flowgrams - GUI only
  • For paired end data, scaffold files

Running GS De Novo assembler

GUI Assembler - 

  • Can be accessed by typing gsAssembler

Commandline Assembler - 

  • runAssembly -o /data/filename /data/R_/D_
  • For paired end data, runAssembly -o /data/filename -p /data/R_/D_

Some options

  • Incremental de novo assembly - will allow you to add more data to the assembly when needed.
  • Large or complex genomes - for genomes larger than 15 Mb, use this option.
  • Trimming database file - Provide a file with fasta sequences that need to be removed (trimmed) from reads (like vectors).
  • Screening database file - Provide a file containing contamination sequences for screening.
  • cDNA assembly- use option -cdna

Things to remember

  • Reads shorter than 50 bp long are removed by default. 
  • The tool is more powerful and produces better assemblies when using sff files than just fasta files as input. The flowgrams are used when computing signals.
  • It is a good idea to use Repeatmasker to handle repeats before assembly.
  • The current assembler version uses 3 to 4 bytes of memory per base and is equipped to run only on a single processor. In cases where memory is not enough to do an assembly, try the incremental de novo assembly option.