Annotating plasmids with pLannotate

Overview

Plasmids are used extensively in both synthetic biology and academic settings, and they are often passed around labs, modified slightly to suit some new purpose, before being handed off to another researcher again often serving some slightly modified purpose. While repositories are getting better about fully sequencing all plasmids, plasmids were used for decades without significant sequencing efforts, followed by decades where sequencing only key inserts was performed via Sanger sequencing. Even in my own work, it has not been uncommon to get a functioning plasmid without getting a reference sequence. Recently the Barrick lab has developed a program capable quickly annotating and visualizing plasmid sequences named pLannotate.

Learning Objectives

Using the assembled plasmid you identified in the SPAdes genome assembly tutorial or the novel DNA identification tutorial or a plasmid of your own:

  1. Use pLannotate website to annotate the sequence.
  2. Compare the results with any existing annotations you are aware of.
  3. Consider how any partial artifacts found on the plasmid could be effecting its known behavior. 

Installing pLannotate

Like with most other programs we have worked with pLannotate has both:

For this tutorial we will also be making use of something new: a dedicated web server capable of doing the analysis for us: http://plannotate.barricklab.org/. While installation instructions are present on the github page, and include a nont surprising conda installation, the use of mamba appears to be required for installation on stampede2 for reasons that are not currently known.

mamba create -n plannotate -c conda-forge -c bioconda plannotate

If you decide to install mamba as discussed in Friday's review tutorial, consider installing plannotate yourself, and using the command line tools. The information on command line use is sufficient for you to figure out how to run the program on whatever plasmid sequence you have at this point in the class.

Get Some Data

Data source is either product of other tutorial, or your own plasmid

As mentioned above the reference sequence(s) you will annotate will come from one (or more) of the following:

If you will use a product of one of the tutorials, you can transfer the plasmid sequence (in fasta or genbank format) back to your laptop possibly with help of the scp tutorial or use cat/more/less to highlight the entire sequence, copy it, and paste it in the next step.

Running pLannotate

As we have not installed the program locally, we will instead use the program's web server to annotate the program. Navigate to http://plannotate.barricklab.org/ and upload the file, or paste the sequence into the appropriate section.

Evaluating output

Plannotate will generate 3 key things:

An interactive graphic of your plasmid 

This allows you to see the location of the genes and gene fragments identified, as well as popup information about each

An annotated genbank file

As we have worked with throughout the class, this format can be downloaded to your local computer (or is produced automatically if you were instead using the program via the command line)

A csv file 

The csv file contains the same information in a generic format that can be useful if you are attempting to work with multiple plasmids, and say want to identify any plasmid that has a certain antibiotic resistance gene

Next steps and optional exercises 

As mentioned above, if you install mamba after consulting the review tutorial you can install plannotate on stampede, and perform the same analysis via the command line.


Return to the GVA2022 home page