Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

As genome assembly is important part of analysis but is building a reference file that will be used many times, it makes more sense to install it its own environment. Other potential tools to have in the same environment would be read preprocessing tools, in particular adapter removal tools such as trimmomatic.

Code Block
languagebash
conda create --name GVA-SPAdes -c bioconda spades


Testing SPAdes installation

...

Line numberAs isTo be
16

#SBATCH -J jobName

#SBATCH -J spades
17

#SBATCH -n 1

#SBATCH -n 4

18

#SBATCH -N 1

#SBATCH -N 4

21

#SBATCH -t 12:00:00

#SBATCH -t 104:0030:00

22

##SBATCH --mail-user=ADD

#SBATCH --mail-user=<YourEmailAddress>

23

##SBATCH --mail-type=all

#SBATCH --mail-type=all

27

conda activate GVA2021

conda activate GVA-SPAdes

31

export LAUNCHER_JOB_FILE=commands

export LAUNCHER_JOB_FILE=spades_commands

...

Code Block
languagebash
titleExample grep commands
# Count the total number of contigs:
grep -c "^>" single_end/contigs.fasta

# Determine the length of the 5 largest contigs:
grep "^>" single_end/contigs.fasta | head -n 5

# Determine the length of the 20 smallest contigs:
grep "^>" single_end/contigs.fasta | tail -n 20

# Determine the length of the 100th through 110th contigs:
grep "^>" single_end/contigs.fasta | head -n 110 | tail -n 10

If Since you ran multiple different combinations of reads for the simulated data how did the insert size effect the number of contigs? the length of the largest contigs? Why might larger insert sizes not help things very much?

...