...
Code Block |
---|
language | bash |
---|
title | suggested direcotry directory set up |
---|
|
cds
cp -r $BI/gva_course/structural_variation/data sv_tutorial
cd sv_tutorial |
...
Warning |
---|
title | Do not run on head node |
---|
|
Many of the commands past this point are computationally intensive. You should run them through an idev shell or by qsub . We recommend idev for the tutorial, but you could also qsub any command that takes more than a few seconds to complete. Code Block |
---|
title | Example command to start an idev shell |
---|
| idev -m 60 -q development -A CCBB"UT-2015-05-18" |
|
Code Block |
---|
bowtie2-build NC_012967.1.fasta NC_012967.1
bowtie2 -p 12 -X 5000 --rf -x NC_012967.1 -1 61FTVAAXX_2_1.fastq -2 61FTVAAXX_2_2.fastq -S 61FTVAAXX.sam |
...
SVDetect demonstrates a common strategy in some programs with complex input where instead of including a lot of options on the command line, it reads in a simple text file that sets all of the required options. Lets look at how to create a configuration file:
Note |
---|
You'll need to substitute your own paths for /full/path/to/61FTVAAXX.ab.sam and /full/path/to/NC_012967.1.lengths on lines 7 and 8 below. |
Code Block |
---|
title | Create the file svdetect.conf with this text |
---|
linenumbers | true |
---|
|
<general>
input_format=sam
sv_type=all
mates_orientation=RF
read1_length=35
read2_length=35
mates_file=/full/path/to/61FTVAAXX.ab.sam
cmap_file=/full/path/to/NC_012967.1.lengths
num_threads=1
</general>
<detection>
split_mate_file=0
window_size=2000
step_length=1000
</detection>
<filtering>
split_link_file=0
nb_pairs_threshold=3
strand_filtering=1
</filtering>
<bed>
<colorcode>
255,0,0=1,4
0,255,0=5,10
0,0,255=11,100000
</colorcode>
</bed>
|
Note |
You'll need to substitute your own paths for /full/path/to/61FTVAAXX.ab.sam
and /full/path/to/NC_012967.1.lengths
.
You also need to create a tab-delimited file of chromosome lengths. Use the tab key rather than writing out <tab>!!
...
You'll want to submit the first two commands to the TACC queue or do them in an idev
shell. They take a while.Consult Consult the manual for a full description of what these commands and options are doing while the commands are running.
Code Block |
---|
title | Commands to run SNVDetect |
---|
|
SVDetect linking -conf svdetect.conf
SVDetect filtering -conf svdetect.conf
SVDetect links2SV -conf svdetect.conf
|
Take a look at the resulting file: 61FTVAAXX.ab.sam.links.filtered.sv.txt
.
We've highlighted a few lines below:
Code Block |
---|
chr_type SV_type BAL_type chromosome1 start1-end1 average_dist chromosome2 start2-end2 nb_pairs score_strand_filtering score_order_filtering score_insert_size_filtering final_score breakpoint1_start1-end1 breakpoint2_start2-end2
...
INTRA NORMAL_SENSE - chrNC_012967 599566-601025 - chrNC_012967 663036-664898 430 100% - - 1 - -
...
INTRA NORMAL_SENSE - chrNC_012967 3-2025 - chrNC_012967 4627019-4628998 288 100% - - 1 - -
...
INTRA REVERSE_SENSE - chrNC_012967 16999-19033 - chrNC_012967 2775082-2777014 274 100% - - 1 - -
|
Expand |
---|
title | Any idea what sorts of mutations produced these three structural variants? |
---|
|
...
Expand |
---|
|
1. This is a tandem head-to-tail duplication of the region from approximately 600000 to 663000. 2. This is just the origin of the circular chromosome, connecting its end to the beginning! 3. This is a big chromosomal inversion mediated by recombination between repeated IS elements in the genome. It would not have been detected if the insert size of the library wasn't > ~1,500 bp! ... Many of the others are due to new insertions of transposable elements. |
...