Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

SPAdes is a De Bruijn graph assembler works fairly rapidly on short (microbial) genomesassembler which has become the preferred assembler in numerous labs and workflows. In this tutorial we will use velvet SPAdes to assemble an E. coli genome from simulated Illumina reads. Genome assembly is quite difficult (though as if Oxford Nanopore comes online it lowers its error rate assembly will likely get much easier and involve new tools). Genome assembly should only be used when you can not find a reference genome that is close to your own, if you are engaged in metagenomic projects where you don't know what organisms may be present, and in situations where you believe you may have novel sequence insertions into a genome of interest (Note that in this case however you would actually want to grab reads that do not map to your reference genome (and their pair in the case of paired end and mate-pair sequencing) rather than performing these functions on the fastq files you get from the gsafraw sequencing.

Learning Objectives

  • Run SPAdes to perform de novo assembly on fragment, paired-end, and mate-paired data.
  • Use contig_stats.pl to display assembly statistics.
  • Find proteins of interest in an assembly using Blast.

Table of Contents

Table of Contents

Installing SPAdes

Unfortunately, SPAdes does not exist as a module for loading on TACC nor is it available in the BioITeam materials. As it is available through the SPAdes website as binaries, is well supported, and doesn't require complex dependancies making it easy to install.

Expand
titleIf SPAdes is so common a tool, why doesn't the BioITeam install it for everyone?

In my opinion there are a few reasons:

  1. Generally speaking, while SPAdes is commonly used for assemblies, assemblies themselves are not very common as once you have an assembled genome, you use that genome for future analysis rather than redoing the assembly.
  2. Since it is easily installed, it doesn't save people much work to install it for them.

First, navigate to the SPAdes home page http://cab.spbu.ru/software/spades/ and download the linux binary distribution either directly to TACC using wget, or first downloading it to your laptop then transferring it to to TACC using SCP. While you could put the file anywhere on lonestar (and can easily move it around on lonestar with the mv command once it is there), I suggest downloading or transferring the file to a 'src' folder on $WORK.

Code Block
languagebash
titleMaking a DIRectory named SRC in $WORK (the capital letters are your clues)
collapsetrue
mkdir $WORK/src

Try to use 'wget -h' before clicking below. When using wget it is often helpful to right click on a link and select 'copy link address' when the file you want is available through a download link.

Code Block
languagebash
titleHow to use wget to download directly to TACC
collapsetrue
cd $WORK/src
wget http://cab.spbu.ru/files/release3.13.0/SPAdes-3.13.0-Linux.tar.gz
Code Block
languagebash
titleHow to use SCP to transfer the downloaded file to TACC from your laptop
collapsetrue

Data

Tutorial assumes that you are on an idev node. If you are not sure please ask for help.

Code Block
titleMove to scratch, copy the raw data, and change into this directory for the tutorial
cds
mkdir GVA_velvetSPAdes_tutorial
cp $BI/ngs_course/velvet/data/*/* GVA_velvetSPAdes_tutorial
cd BDIBGVA_velvetSPAdes_tutorial

Now we have a bunch of Illumina reads. These are simulated reads. If you'd ever like to simulate some on your own, you might try using Mason.

...