denovo 2bRAD

For full documentation of the 2bRAD de novo pipeline see the github page

The pipeline is very similar to that performed by Stacks (Catchen et al. 2011):

De novo locus generation
#navigate to the directory
cd denovo_2bRAD


#look at starting trimmed fastq files
ls *_trimmed.fastq
 
 
#run uniquerOne.pl
#(this is analogous to making 'stacks' in STACKS (Fig1A Catchen et al. (2011))
#finds the unique RAD tags from each fastq
 
uniquerOne.pl sampleA_trimmed.fastq > sampleA.uni
uniquerOne.pl sampleB_trimmed.fastq > sampleB.uni
uniquerOne.pl sampleC_trimmed.fastq > sampleC.uni
 
# merging uniqued files
#(Fig1B Catchen et al. (2011))
mergeUniq.pl uni minDP=2 >mydataMerged.uniq
 
#generates a merged set of unique tags:
        mergedUniqTags.fasta
 
# clustering allowing for up to 3 mismatches (-c 0.91); the most abundant sequence becomes reference
#This is equivalent to calling loci (Fig1C-D Catchen et al. (2011))
cd-hit-est -i mergedUniqTags.fasta -o cdh_alltags.fas -aL 1 -aS 1 -g 1 -c 0.91 -M 0 -T 0

#now we have called de novo loci based on the tags (70758 total)
#assemble them into an artificial reference for re-mapping and genotyping
concatFasta.pl fasta=cdh_alltags.fas num=8
 
#index the artificial reference with bowtie
module load bowtie
bowtie2-build cdh_alltags_cc.fasta cdh_alltags_cc.fasta
 
#now create mapping commands using a for loop
>mappingCommands
for file in *_trimmed.fastq
 do echo "bowtie2 --no-unal -x cdh_alltags_cc.fasta -U $file -S ${file/_trimmed.fastq/}.sam -p 12 &">>mappingCommands
done


#look at the mapping commands
cat mappingCommands


#returns:
	bowtie2 --no-unal -x cdh_alltags_cc.fasta -U sampleC_trimmed.fastq -S sampleC.sam -p 12 &
	bowtie2 --no-unal -x cdh_alltags_cc.fasta -U sampleB_trimmed.fastq -S sampleB.sam -p 12 &
	bowtie2 --no-unal -x cdh_alltags_cc.fasta -U sampleA_trimmed.fastq -S sampleA.sam -p 12 &


#These will use bowtie to map the reads back to the clustered loci we generated from them
#Double-check you're in an idev session before executing!
#copy the commands and paste them to execute
#The resulting alignment files can now be used for whichever genotyping method you prefer

Genotyping

Return to ddRAD data processing and alter the commands in the mpileup and quality filtering code blocks to genotype these samples

expected results
#Raw calls should have 3 individuals and 14909 sites


#Qaulity filtering these to 'QUAL < 30' reduces to 9689 sites


#Filter for biallelic sites variants, genotyped in 75% of samples, with at least two reads covering the site:
6631 sites