Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

Code Block
titleFor a "commands" file - to run four velvet assemblies in parallel. If you copy and paste, be sure that there are ONLY four lines in your file.
velveth single_out 61 -fastq single_end_100_c_50.fastq && velvetg single_out -exp_cov auto -amos_file yes
velveth pairedc20_out 61 -fastq -shortPaired paired_end_2x100_ins_3000_c_20.fastq paired_end_2x100_ins_1500_c_20.fastq paired_end_2x100_ins_400_c_20.fastq && velvetg pairedc20_out -exp_cov auto -amos_file yes
velveth pairedc25_out 61 -fastq -shortPaired paired_end_2x100_ins_3000_c_25.fastq paired_end_2x100_ins_400_c_25.fastq && velvetg pairedc25_out -exp_cov auto -amos_file yes
velveth pairedc50_out 61 -fastq -shortPaired paired_end_2x100_ins_400_c_50.fastq && velvetg pairedc50_out -exp_cov auto -amos_file yes

...

Expand
The results...
The results...
Code Block
titleSingle Set
:
Final graph has 9748 nodes and n50 of 191, max 1427, total 1865207, using 281499/2314900 reads

Median coverage depth = 2.657895
Final graph has 9748 nodes and n50 of 191, max 1427, total 1865207, using 281499/2314900 reads
Code Block
titleSet with one group of reads at 50 coverage
:
Final graph has 271 nodes and n50 of 127086, max 397281, total 4555586, using 1464199/2314900 reads

Median coverage depth = 11.131337
Final graph has 265 nodes and n50 of 127102, max 397974, total 4558511, using 1464201/2314900 reads
Code Block
titleSet with 2 groups of reads both at 25 coverage each
:
Final graph has 203 nodes and n50 of 698134, max 1032531, total 4585717, using 1465818/2314900 reads

Median coverage depth = 11.109244
Final graph has 203 nodes and n50 of 698134, max 1032531, total 4585717, using 1465818/2314900 reads
Code Block
titleSet with 3 groups of reads all at 20 coverage each
:
Final graph has 202 nodes and n50 of 698626, max 1139610, total 4602729, using 1758595/2777880 reads

Median coverage depth = 13.353287
Final graph has 202 nodes and n50 of 698626, max 1139610, total 4602729, using 1758595/2777880 reads

With better read pairs that link more distant locations in the genome, there are fewer contigs, and contigs are are longer, giving us a more complete picture of linkage across the genome.

The complete E. coli genome is about 4.6 Mb. Why weren't we able to assemble it, even with this "perfect" data?

Expand
One possibility Possibilities... One possibility
Possibilities...
  1. Sometimes errors in reads lead to dead-ends in the graphs that are trimmed when they should not be.
  2. There are 7 nearly identical ribosomal RNA operons in E. coli spaced throughout the chromosome. Since each is >3000 bases, contigs cannot be connected across them using this data.

More assembly statistics: contig_stats.pl

...