Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

As genome assembly is important part of analysis but is building a reference file that will be used many times, it makes more sense to install it its own environment. Other potential tools to have in the same environment would be read preprocessing tools, in particular adapter removal tools such as trimmomatic.

Code Block
languagebash
conda create --name GVA-SPAdes -c bioconda spades


Testing SPAdes installation

...

Again while in nano you will edit most of the same lines you edited in the in the breseq tutorial. Note that most of these lines have additional text to the right of the line. This commented text is present to help remind you what goes on each line, leaving it alone will not hurt anything, removing it may make it more difficult for you to remember what the purpose of the line is

Line numberAs isTo be
16

#SBATCH -J jobName

#SBATCH -J spades
17

#SBATCH -n 1

#SBATCH -n 4

18

#SBATCH -N 1

#SBATCH -N 4

21

#SBATCH -t 12:00:00

#SBATCH -t

10

4:

00

30:00

22

##SBATCH --mail-user=ADD

#SBATCH --mail-user=<YourEmailAddress>

23

##SBATCH --mail-type=all

#SBATCH --mail-type=all

27

conda activate GVA2021

conda activate GVA-SPAdes

31

export LAUNCHER_JOB_FILE=commands

export LAUNCHER_JOB_FILE=

breseq

spades_commands

The changes to lines 22 and 23 are optional but will give you an idea of what types of email you could expect from TACC if you choose to use these options. Just be sure to pay attention to these 2 lines starting with a single # symbol after editing them.

...

Code Block
languagebash
titleExample grep commands
# Count the total number of contigs:
grep -c "^>" single_end/contigs.fasta

# Determine the length of the 5 largest contigs:
grep "^>" single_end/contigs.fasta | head -n 5

# Determine the length of the 20 smallest contigs:
grep "^>" single_end/contigs.fasta | tail -n 20

# Determine the length of the 100th through 110th contigs:
grep "^>" single_end/contigs.fasta | head -n 110 | tail -n 10

If Since you ran multiple different combinations of reads for the simulated data how did the insert size effect the number of contigs? the length of the largest contigs? Why might larger insert sizes not help things very much?

...