GS De novo assembler
Summary
Performs assembly of reads and generates contigs. Current Version 2.5.3. Full Roche manual.
Input options
- Sff files
- Fasta files
- Converted Sanger data- Fasta files and corresponding Quality files
Output options
- Consensus sequence (contigs)
- Corresponding quality scores
- ACE files
- Assembly metrics files
- Pairwise alignments
- Read status file
- Alignment views - GUI only
- Flowgrams - GUI only
- For paired end data, scaffold files
Running GS De Novo assembler
GUI Assembler -
- Can be accessed by typing gsAssembler
Commandline Assembler -
- runAssembly -o /data/filename /data/R_/D_
- For paired end data, runAssembly -o /data/filename -p /data/R_/D_
Some options
- Incremental de novo assembly - will allow you to add more data to the assembly when needed.
- Large or complex genomes - for genomes larger than 15 Mb, use this option.
- Trimming database file - Provide a file with fasta sequences that need to be removed (trimmed) from reads (like vectors).
- Screening database file - Provide a file containing contamination sequences for screening.
- cDNA assembly- use option -cdna
Things to remember
- Reads shorter than 50 bp long are removed by default.
- The tool is more powerful and produces better assemblies when using sff files than just fasta files as input. The flowgrams are used when computing signals.
- It is a good idea to use Repeatmasker to handle repeats before assembly.
- The current assembler version uses 3 to 4 bytes of memory per base and is equipped to run only on a single processor. In cases where memory is not enough to do an assembly, try the incremental de novo assembly option.
Welcome to the University Wiki Service! Please use your IID (yourEID@eid.utexas.edu) when prompted for your email address during login or click here to enter your EID. If you are experiencing any issues loading content on pages, please try these steps to clear your browser cache.