Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

SPAdes is a De Bruijn graph assembler which has become the preferred assembler in numerous labs and workflows. In this tutorial we will use SPAdes to assemble an E. coli genome from simulated Illumina reads. Genome assembly is quite difficult (though if Oxford Nanopore lowers its error rate assembly will likely get much easier and involve new tools). Genome assembly should only be used when you can not find a reference genome that is close to your own, if you are engaged in metagenomic projects where you don't know what organisms may be present, and in situations where you believe you may have novel sequence insertions into a genome of interest (Note that in this case however you would actually want to grab reads that do not map to your reference genome (and their pair in the case of paired end and mate-pair sequencing) rather than performing these functions on the fastq files you get from the raw sequencing.

Note
titleA note about read preprocessing

While not explicitly covered here, the presence of adapter sequences on reads when trying to assemble them can significantly complicate assembly and harm it. If using this tutorial on your own samples make sure you are working with the best data possible .. reads lacking adapters in this case.

For those looking for a real challenge, go through the multiqc tutorial and the trimmomatic tutorial, and use the information provided here to compare assemblies of some of the same samples in both cases.


Learning Objectives

  • Run SPAdes to perform de novo assembly on fragment, paired-end, and mate-paired data.
  • Use contig_stats.pl to display assembly statistics.
  • Find proteins of interest in an assembly using Blast.

...

Code Block
languagebash
titlehint eXtracting a .tar.gz file is the opposite of Creating one (hints are in the capital letters)
collapsetrue
cd $WORK$WORK/src
tar -xvzf SPAdes-3.13.0-Linux.tar.gz
# from the help file:
  # x = Extract
  # v = verbose
  # z = file is also gzipped
  # f = force 

...

Code Block
languagebash
titleCopy executables to somewhere already in your path (IE $HOME/local/bin)
collapsetrue
mkdir -p $HOME/local/bin $HOME/local/share # note deliberately creating 2 folders by using the space between them
cp $WORK/src/SPAdes-3.13.0-Linux/bin/* $HOME/local/bin  #Note that by specifying the full path all the files and the destination, this command can be run from anywhere on TACC.
cp -r $WORK/src/SPAdes-3.13.0-Linux/share/spades $HOME/local/share
Code Block
languagebash
titleSuggested line to add to your .bashrc file to directly access executables
collapsetrue
export PATH=$WORK/src/SPAdes-3.13.0-Linux/bin:$PATH
# This line must be added to the .bashrc file found in your $HOME directory, in section 2. I typically add all modifications 1 after the other so the most recent thing I have added is the last line in this section, but is searched first when looking for commands.

...