...
Here's an E. coli genome re-sequencing sample where a key mutation producing a new structural variant was responsible for a new phenotype involving citrate, one of Dacia's favorite topics.
Code Block | ||||
---|---|---|---|---|
| ||||
cds cp -r $BI/gva_course/structural_variation/data BDIB_sv_tutorial cd BDIB_sv_tutorial |
...
File Name | Description | Sample |
---|---|---|
| Paired-end Illumina, First of mate-pair, FASTQ format | Re-sequenced E. coli genome |
| Paired-end Illumina, Second of mate-pair, FASTQ format | Re-sequenced E. coli genome |
| Reference Genome in FASTA format | E. coli B strain REL606 |
NC_012967.1.lengths | Simple tab delimtered file based on the size of the reference needed for SVDetect so you don't have to create it yourself |
Map data using bowtie2
First we need to (surprise!) map the data. This will hopefully reinforce the bowtie2 tutorial you just completed, but if you are feeling adventurous you could use BWA as optional reinforcement.
Warning | |||||||
---|---|---|---|---|---|---|---|
| |||||||
Make sure you are on an idev node using the command: showq -u
|
Code Block |
---|
bowtie2-build NC_012967.1.fasta NC_012967.1 bowtie2 -p 48 -X 5000 --rf -x NC_012967.1 -1 61FTVAAXX_2_1.fastq -2 61FTVAAXX_2_2.fastq -S 61FTVAAXX.sam |
...
Code Block | ||||
---|---|---|---|---|
| ||||
<general> input_format=sam sv_type=all mates_orientation=RF read1_length=35 read2_length=35 mates_file=/full/path/to/61FTVAAXX.ab.sam cmap_file=/full/path/to/NC_012967.1.lengths num_threads=48 </general> <detection> split_mate_file=0 window_size=2000 step_length=1000 </detection> <filtering> split_link_file=0 nb_pairs_threshold=3 strand_filtering=1 </filtering> <bed> <colorcode> 255,0,0=1,4 0,255,0=5,10 0,0,255=11,100000 </colorcode> </bed> |
You also need to create a make sure you have a copy of the tab-delimited file of chromosome lengths named NC_012967.1.lengths. YOU CAN NOT COPY PASTE THIS COMMAND into a new file for 2 reasons! The first reason you can't copy the command is the tab characters don't translate correctly. The second is the nano text editor is not available on the compute nodes, and learning to use the vim editor for a single line is way more work than it is worth. Make sure the the NC_012967.1.lengths file has the following structure up to the comment, and that the <tab> is replaced with an actual tab character.
Code Block | ||
---|---|---|
| ||
1<tab>NC_012967<tab>4629812 # Use the tab key rather than writing out <tab>!! |
...
Expand | |||||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
Optional: Install SVDetectWe have installed SVdetect for you already as installation is a bit difficult (though still much easier than the alternatives listed in the introduction). You can verify it's location using which SVDetect in your Install SVDetect scriptsNavigate to the SVDetect project page More information: Download the code onto TACC.
Move the Perl scripts and make them executable
Install required Perl modulesSVdetect requires a few Perl modules to be installed. In the default TACC environment, you can use the cpan shell to install most well-behaved Perl modules (with the exception of some complicated ones that require other libraries to be installed or things to compile). Here's how:
|