Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Warning
titleDo not run on head node

Use showq -u to verify you are still on the idev node.

Code Block
titleUse this command to restart an idev session if you are not on one
collapsetrue
idev  -m 120 -r CCBB_5.23.17PM -A UT-2015-05-18 -N 2 -n 8
Code Block
languagebash
titleCommands to be executed in order...
samtools view -b -S -o SRR030257.bam SRR030257.sam
samtools sort SRR030257.bam -o SRR030257.sorted.bam
samtools index SRR030257.sorted.bam
Tip
This is a really common sequence of commands, so you might want to add it to your personal cheat sheet.

...

Expand
titleOptional: For the data we are dealing with, predictions with an allele frequency not equal to 1 are not really applicable. (The reference genome is haploid. There aren't any heterozygotes.) How can we remove these lines from the file?

Try looking at grep --help to see what you can come up with.

Code Block
languagebash
titleHere for answer
collapsetrue
grep -v *something*  # The -v flag inverts the match effecitvely showing you everything that does not match your input
Expand
titleGoing farther
Code Block
cat SRR030257.vcf | grep AF1=1 > SRR030257.filtered.vcf

Is not practical, since we will lose vital VCF formatting and may not be able to use this file in the future for formats which require that formatting.

Code Block
cat SRR030257.vcf | grep -v AF1=0 > SRR030257.filtered.vcf

Will preserve all lines that don't have a AF'AF1=0' value on the line and is one way of doing this. If you look closely at the non-filtered file you will see that the frequencies are given as AF1=0.### so by filtering out lines that have 'AF1=0' in them we get rid of all frequencies that are not 1, including say 'AF1=0.99'. How you would change this to variants that have a frequency of at least 90%?

Code Block
sed -i '/AF1=0/ d' SRR030257.vcf

Is a way of doing it in-line and not requiring you to make another file. (But it writes over your existing file!)

...