...
As genome assembly is important part of analysis but is building a reference file that will be used many times, it makes more sense to install it its own environment. Other potential tools to have in the same environment would be read preprocessing tools, in particular adapter removal tools such as trimmomatic.
Code Block | ||
---|---|---|
| ||
conda create --name GVA-SPAdes -c bioconda spades |
Testing SPAdes installation
...
Again while in nano you will edit most of the same lines you edited in the in the breseq tutorial. Note that most of these lines have additional text to the right of the line. This commented text is present to help remind you what goes on each line, leaving it alone will not hurt anything, removing it may make it more difficult for you to remember what the purpose of the line is
Line number | As is | To be |
---|---|---|
16 | #SBATCH -J jobName | #SBATCH -J spades |
17 | #SBATCH -n 1 | #SBATCH -n 4 |
18 | #SBATCH -N 1 | #SBATCH -N 4 |
21 | #SBATCH -t 12:00:00 | #SBATCH -t |
4: |
30:00 | ||
22 | ##SBATCH --mail-user=ADD | #SBATCH --mail-user=<YourEmailAddress> |
23 | ##SBATCH --mail-type=all | #SBATCH --mail-type=all |
27 | conda activate GVA2021 | conda activate GVA-SPAdes |
31 | export LAUNCHER_JOB_FILE=commands | export LAUNCHER_JOB_FILE= |
spades_commands |
The changes to lines 22 and 23 are optional but will give you an idea of what types of email you could expect from TACC if you choose to use these options. Just be sure to pay attention to these 2 lines starting with a single # symbol after editing them.
...
Code Block | ||||
---|---|---|---|---|
| ||||
# Count the total number of contigs: grep -c "^>" single_end/contigs.fasta # Determine the length of the 5 largest contigs: grep "^>" single_end/contigs.fasta | head -n 5 # Determine the length of the 20 smallest contigs: grep "^>" single_end/contigs.fasta | tail -n 20 # Determine the length of the 100th through 110th contigs: grep "^>" single_end/contigs.fasta | head -n 110 | tail -n 10 |
If Since you ran multiple different combinations of reads for the simulated data how did the insert size effect the number of contigs? the length of the largest contigs? Why might larger insert sizes not help things very much?
...