...
Warning |
---|
title | Do not run on head node |
---|
|
Use showq -u to verify you are still on the idev node. Code Block |
---|
title | Use this command to restart an idev session if you are not on one |
---|
collapse | true |
---|
| idev -m 120 -r CCBB_5.23.17PM -A UT-2015-05-18 -N 2 -n 8 |
Code Block |
---|
language | bash |
---|
title | Commands to be executed in order... |
---|
| samtools view -b -S -o SRR030257.bam SRR030257.sam
samtools sort SRR030257.bam -o SRR030257.sorted.bam
samtools index SRR030257.sorted.bam |
Tip |
---|
This is a really common sequence of commands, so you might want to add it to your personal cheat sheet. |
|
...
Expand |
---|
title | Optional: For the data we are dealing with, predictions with an allele frequency not equal to 1 are not really applicable. (The reference genome is haploid. There aren't any heterozygotes.) How can we remove these lines from the file? |
---|
|
Try looking at grep --help to see what you can come up with. Code Block |
---|
language | bash |
---|
title | Here for answer |
---|
collapse | true |
---|
| grep -v *something* # The -v flag inverts the match effecitvely showing you everything that does not match your input
|
Expand |
---|
| Code Block |
---|
cat SRR030257.vcf | grep AF1=1 > SRR030257.filtered.vcf
|
Is not practical, since we will lose vital VCF formatting and may not be able to use this file in the future for formats which require that formatting. Code Block |
---|
cat SRR030257.vcf | grep -v AF1=0 > SRR030257.filtered.vcf
|
Will preserve all lines that don't have a AF'AF1=0' value on the line and is one way of doing this. If you look closely at the non-filtered file you will see that the frequencies are given as AF1=0.### so by filtering out lines that have 'AF1=0' in them we get rid of all frequencies that are not 1, including say 'AF1=0.99'. How you would change this to variants that have a frequency of at least 90%? Code Block |
---|
sed -i '/AF1=0/ d' SRR030257.vcf
|
Is a way of doing it in-line and not requiring you to make another file. (But it writes over your existing file!) |
|
...
- Which mapper finds more variants?
- Can you figure out how to filter the VCF files on various criteria, like coverage, quality, ... ?
- How many high quality mutations are there in these E. coli samples relative to the reference genome?
- Look at how the reads supporting these variants were aligned to the reference genome in the Integrative Genomics Viewer (IGV). This will be a separate tutorial for tomorrow.
...
As suggested in the initial introduction, the point of this optional tutorial is to work through getting a different version of samtools to work (the command line expectations, flags, and subcommands (ie bcftools call) were not what they are now in version 0.1.18). To make sure you are starting in the right place:
...