Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

As breseq is a "all in 1 tool" that works to map, call variants, sift signal from noise, and provide basic visualization, you may think that breseq is the only program you would use for the analysis of appropriate samples. Do not forget that read qc and read processing actually take place before running breseq. Therefor a full environment might contain (such as the one I use in my own analysis): fastqc, fastp, and breseq. Nicely, all 3 programs are in the bioconda channel. Using what you have already learned so far, see if you can you should know how to create a new environment with these 3 programs. Unfortunately there is currently an issue wherein installing breseq does not grab one program that is needed to draw some of the graphs it can normally produce. One way of fixing this is to include "libjpeg-turbo". 

Code Block
languagebash
titleYou can name your new environment anything you want, my suggestion would be GVA-breseq so you remember both that it was part of this class, as well as what is in it.
collapsetrue
conda create --name GVA-breseq -c bioconda fastqc fastp breseq libjpeg-turbo -c conda-forge
conda activate GVA-breseq

Because we are including multiple different packages, the installation will take a few extra minutes to complete. If you just include breseq and libjpeg-turbo the installation will be faster. This is one of the reasons that having 1 environment with all programs installed in it is not always the best idea.


Expand
titleUsing the what you know so far see if you can figure out what versions of the 3 programs you have installed is.

...

Expand
title Click here for expected answer...


No Format
(GVA-breseq) tacc:/scratch/0004/train402/GVA_samtools_tutorial$ fastqc --version
FastQC v0.11.9
(GVA-breseq) tacc:/scratch/0004/train402/GVA_samtools_tutorial$ fastp --version 
fastp 0.23.2
(GVA-breseq) tacc:/scratch/0004/train402/GVA_samtools_tutorial$ breseq --version
breseq 0.36.1


...

The absolute minimal command that breseq can do anything with is a reference file and a fastq file. We've added the -j option to use more processors to speed things up a bit. When you executed the command without any options you saw more options and if you use breseq --help you will see more still. This will finish very quickly (less than 5 ~3 minutes) with a final line of "+++   SUCCESSFULLY COMPLETED". If you instead see something different as the last line before getting your prompt back, get my attention.

...

Code Block
languagebash
titlecopy command
collapsetrue
cp /corral-repl/utexas/BioITeam/gva_course/GVA2021GVA2022.launcher.slurm .

As the file provided is generic you will need to make a few changes to the file. Note that most of these lines have additional text to the right of the line. This commented text is present to help remind you what goes on each line, leaving it alone will not hurt anything, removing it may make it more difficult for you to remember what the purpose of the line is

Line numberAs isTo be
16

#SBATCH -J jobName

#SBATCH -J D3Breseq
21

#SBATCH -t 12:00:00

#SBATCH -t 4:00:00

22

##SBATCH --mail-user=ADD

#SBATCH --mail-user=<YourEmailAddress>

23

##SBATCH --mail-type=all

#SBATCH --mail-type=all

27

conda activate GVA2021

conda activate GVA-breseq

3130

export LAUNCHER_JOB_FILE=commands

export LAUNCHER_JOB_FILE=breseq_commands

The changes to lines 22 and 23 are optional but will give you an idea of what types of email you could expect from TACC if you choose to use these options. Just be sure to pay attention to these 2 lines starting with a single # symbol after editing them.Line 27 assumes that you have installed breseq into an environment named "GVA-breseq" earlier in this tutorial. If you used a different name, you will need to substitute it here. you choose to use these options. Just be sure to pay attention to these 2 lines starting with a single # symbol after editing them.

Again use ctl-o and ctl-x to save the file and exit.

...

Now that we have the 2 things that the job queue system needs (commands file and slurm file to control the computer), all that is left is to submit the job using the sbatch command. It is important to make sure you have the correct conda environment active before subbing the job so that information gets forwarded to the compute node by the queue manager.

Code Block
languagebash
titlesbatch command for submitting your job
conda activate GVA-breseq
sbatch GVA2021GVA2022.launcher.slurm

Your output should be similar to this:

...

The showq command can be used to check the status of any jobs you have submitted. Executing the command without any options will show you the status of all jobs currently on stampede2 (which can be interesting at least once to get a sense of the volume of work that TACC deals with). More useful is to provie provide the '-u' option so that you only see jobs that are related to your userid.

Initially your job will be listed in the "WAITING JOBS------------------------" section, though depending on how busy stampede2 is when you submit your job, it may move directly to the "ACTIVE JOBS--------------------" section. Once the breseq command finishes running, the job will move down into a 3rd section labled labeled something similar to "COMPLETING JOBS--------------------".

...

This tutorial was substantially reformatted from the most recent version found here. Our thanks to the previous instructors.

Return to the GVA2021 GVA2022 page