Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagebash
titlesuggested direcotry directory set up
cds
cp -r $BI/gva_course/structural_variation/data sv_tutorial
cd sv_tutorial

...

Warning
titleDo not run on head node

Many of the commands past this point are computationally intensive. You should run them through an idev shell or by qsub. We recommend idev for the tutorial, but you could also qsub any command that takes more than a few seconds to complete.

Code Block
titleExample command to start an idev shell
idev -m 60 -q development -A CCBB"UT-2015-05-18"
Code Block
bowtie2-build NC_012967.1.fasta NC_012967.1
bowtie2 -p 12 -X 5000 --rf -x NC_012967.1 -1 61FTVAAXX_2_1.fastq -2 61FTVAAXX_2_2.fastq -S 61FTVAAXX.sam

...

SVDetect demonstrates a common strategy in some programs with complex input where instead of including a lot of options on the command line, it reads in a simple text file that sets all of the required options. Lets look at how to create a configuration file:

Note
You'll need to substitute your own paths for /full/path/to/61FTVAAXX.ab.sam and /full/path/to/NC_012967.1.lengths on lines 7 and 8 below.
You'll need to substitute your own paths for /full/path/to/61FTVAAXX.ab.sam and /full/path/to/NC_012967.1.lengths.
Code Block
titleCreate the file svdetect.conf with this text
linenumberstrue
<general>
input_format=sam
sv_type=all
mates_orientation=RF
read1_length=35
read2_length=35
mates_file=/full/path/to/61FTVAAXX.ab.sam
cmap_file=/full/path/to/NC_012967.1.lengths
num_threads=1
</general>

<detection>
split_mate_file=0
window_size=2000
step_length=1000
</detection>

<filtering>
split_link_file=0
nb_pairs_threshold=3
strand_filtering=1
</filtering>

<bed>
  <colorcode>
    255,0,0=1,4
    0,255,0=5,10
    0,0,255=11,100000
  </colorcode>
</bed>
Note

 

You also need to create a tab-delimited file of chromosome lengths. Use the tab key rather than writing out <tab>!!

...

You'll want to submit the first two commands to the TACC queue or do them in an idev shell. They take a while.Consult  Consult the manual for a full description of what these commands and options are doing while the commands are running.

Code Block
titleCommands to run SNVDetect
SVDetect linking -conf svdetect.conf
SVDetect filtering -conf svdetect.conf
SVDetect links2SV -conf svdetect.conf

Take a look at the resulting file: 61FTVAAXX.ab.sam.links.filtered.sv.txt.

We've highlighted a few lines below:

Code Block
chr_type        SV_type BAL_type        chromosome1     start1-end1     average_dist    chromosome2     start2-end2     nb_pairs        score_strand_filtering  score_order_filtering   score_insert_size_filtering     final_score     breakpoint1_start1-end1 breakpoint2_start2-end2
...
INTRA   NORMAL_SENSE    -       chrNC_012967    599566-601025   -       chrNC_012967    663036-664898   430     100%    -       -       1       -       -
...
INTRA   NORMAL_SENSE    -       chrNC_012967    3-2025  -       chrNC_012967    4627019-4628998 288     100%    -       -       1       -       -
...
INTRA   REVERSE_SENSE   -       chrNC_012967    16999-19033     -       chrNC_012967    2775082-2777014 274     100%    -       -       1       -       -
Expand
titleAny idea what sorts of mutations produced these three structural variants?

...

Expand
Answers...Answers...

1. This is a tandem head-to-tail duplication of the region from approximately 600000 to 663000.
2. This is just the origin of the circular chromosome, connecting its end to the beginning!
3. This is a big chromosomal inversion mediated by recombination between repeated IS elements in the genome. It would not have been detected if the insert size of the library wasn't > ~1,500 bp!

... Many of the others are due to new insertions of transposable elements.

...