Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
The results...
The results...
Code Block
titleSingle Set
:
Final graph has 9748 nodes and n50 of 191, max 1427, total 1865207, using 281499/2314900 reads

Median coverage depth = 2.657895
Final graph has 9748 nodes and n50 of 191, max 1427, total 1865207, using 281499/2314900 reads
Code Block
titleSet with one group of reads at 50 coverage
:
Final graph has 271 nodes and n50 of 127086, max 397281, total 4555586, using 1464199/2314900 reads

Median coverage depth = 11.131337
Final graph has 265 nodes and n50 of 127102, max 397974, total 4558511, using 1464201/2314900 reads
Code Block
titleSet with 2 groups of reads both at 25 coverage each
:
Final graph has 203 nodes and n50 of 698134, max 1032531, total 4585717, using 1465818/2314900 reads

Median coverage depth = 11.109244
Final graph has 203 nodes and n50 of 698134, max 1032531, total 4585717, using 1465818/2314900 reads
Code Block
titleSet with 3 groups of reads all at 20 coverage each
:
Final graph has 202 nodes and n50 of 698626, max 1139610, total 4602729, using 1758595/2777880 reads

Median coverage depth = 13.353287
Final graph has 202 nodes and n50 of 698626, max 1139610, total 4602729, using 1758595/2777880 reads

With better read pairs that link more distant locations in the genome, there are fewer contigs, and contigs are are longer, giving us a more complete picture of linkage across the genome.

The complete E. coli genome is about 4.6 Mb. Why weren't we able to assemble it, even with this "perfect" data?

Expand
One possibility Possibilities... One possibility..
Possibilities...
  1. Sometimes errors in reads lead to dead-ends in the graphs that are trimmed when they should not be.
  2. There are 7 nearly identical ribosomal RNA operons in E. coli spaced throughout the chromosome. Since each is >3000 bases, contigs cannot be connected across them using this data.

More assembly statistics: contig_stats.pl

...