...
Glimmer3 has been installed manually in the BioITeam bin $BI/bin
since it is not offered as a module on TACC.
Wiki Markup |
---|
{hidden-data} Remember from day 2 that there are three general steps for installing a linux tool.
1. Download and uncompress the Glimmer3 source code
{code}
login1$ cdh
login1$ wget http://www.cbcb.umd.edu/software/glimmer/glimmer302.tar.gz
login1$ tar xzf glimmer302.tar.gz
{code}
2. Compile the Glimmer3 programs and copy them to {{\~/local/bin/}}
{code}
login1$ cd glimmer3.02/src/
login1$ make
login1$ cd ..
login1$ cp bin/* ~/local/bin/
{code}
3. Your $PATH variable should have been set up on day 2 to look in {{\~/local/bin/}} for executables. If not, update your {{$HOME/.profile_user}} file with the following line.
{code}
export PATH="$HOME/local/bin:$PATH"
{code}
{hidden-data} |
Running Glimmer3
Running Glimmer3 is a two-step process. First, a probability model of coding sequences, called an interpolated context model or ICM, must be built. Once that has been built, the glimmer3 program itself is run to analyze the assembled genome and make gene predictions.
Wiki Markup |
---|
{hidden-data}
Fortunately, Glimmer3 comes with several C-shell scripts that automate the whole process. This tutorial will take advantage of those. The scripts require some minor editing, but we have already done that for you.
{hidden-data} |
We'll run Glimmer on a de novo assembly of the bacterium Acinetobacter baumannii. First copy the contigs.fa
file that velvet produced, and then execute the run_glimmer.sh
script. This script preprocesses the contigs.fa
file and calls the g3-from-scratch.csh
script that was prepackaged with glimmer3. Several files will be created, but the one containing the predicted genes is called contigs.fa.glimmer.predict.genes.
Wiki Markup |
---|
{hidden-data}
but they must be slightly edited first. At the top of each script are specified the directory paths to the Glimmer executables and Awk scripts (the lines beginning with *set glimmerpath* and *set awkpath*). You will need to change these entries to the directories where these files were installed. The Glimmer executables are in {{\~/local/bin/}} and the awk scripts can be found in {{\~/glimmer3.02/scripts}}. You only have to make these changes in the {{g3-from-scratch.csh}} script for now. After editing it, lines 28 and 29 should look something like the following.
{code:title=Edit lines 28 and 29 of ~/glimmer3.02/scripts/g3-from-scratch.csh}
set awkpath = $HOME/glimmer3.02/scripts
set glimmerpath = $HOME/local/bin
{code}
We will actually use another script ({{run_glimmer.sh}}) to preprocess the contigs.fa file (produced by velvet) and to call {{g3-from-scratch.csh}}.
{code}
cdw
mkdir glimmer_example
cd glimmer_example
cp /corral-repl/utexas/BioITeam/sphsmith/run_glimmer.sh .
{code}
{{run_glimmer.sh}} must also be edited so that it knows where to look for the {{g3-from-scratch.csh}} script. After editing, line 11 something like the following.
{code:title=Edit line 11 of ./run_glimmer.sh}
$HOME/glimmer3.02/scripts/g3-from-scratch.csh $infile.cat $infile.glimmer
{code}
The scripts are now ready to be used. We'll run Glimmer on a _de novo_ assembly of the bacterium _Acinetobacter baumannii_. First copy the {{contigs.fa}} file that velvet produced, and then execute the {{run_glimmer.sh}} script. Several files will be created, but the one containing the predicted genes is called contigs.fa.glimmer.predict.genes.
{hidden-data} |
Code Block |
---|
|
cdw
mkdir glimmer_example
cd glimmer_example
cp $BI/ngs_course/velvet/real_set/contigs.fa .
run_glimmer.sh contigs.fa
|
...
Code Block |
---|
title | Command line to run blastx of the predicted genes against the nr database |
---|
|
blastx -query contigs.fa.glimmer.predict.genes -out nr_result.txt -db /corral-repl/utexas/BioITeam/blastdb/nr -outfmt 6
|
...
Expand |
---|
| Need help? Click here to see how to run standard blastx... |
---|
| Need help? Click here to see how to run standard blastx... |
---|
|
Code Block |
---|
title | Commands to blast contigs against nr |
---|
|
echo "blastx -query contigs.fa.glimmer.predict.genes -out nr_result.txt -db /corral-repl/utexas/BioITeam/blastdb/nr -outfmt 6" > commands
module load blast
launcher_creator.py -n blastx -q normal -t 24:00:00 -j commands
qsub launcher.sge
|
|
Expand |
---|
| or click here to see how to run Benni's greatly sped-up blastx... |
---|
| or click here to see how to run Benni's greatly sped-up blastx... |
---|
|
Code Block |
---|
title | Commands to blast contigs against nr using Benni's script split_blast |
---|
|
split_blast -N 6 -a 20130520NGS-FAC -t 1:00:00 blastx -outfmt 6 -db /corral-repl/utexas/BioITeam/blastdb/nr -max_target_seqs 1 -query contigs.fa.glimmer.predict.genes -out contigs.fa.glimmer.predict.genes.nr.blastx.out
|
|
Upon completion the blast results can be converted to GFF format and be viewed in IGV. Instead of waiting for blastx to finish, you can copy our partial search results.
Code Block |
---|
cp $BI/ngs_course/nr_result.txt .
bl2gff.pl nr_result.txt > nr_result.gff
|
...
Other pipelines for automated annotation
The RAST webserver (registration required) provides on-demand annotation of genes in microbial or organellar genomes.
The NCBI Prokaryotic Genomes Automatic Annotation Pipeline (PGAAP) streamlines the whole annotation process for you. The pipeline is currently under development, but a standalone package is available here, however.
...