Demultiplexing ddRAD

Demultiplexing ddRAD data with Stacks exercise:

Check out the reads to make sure they make sense:

check out reads

#navigate to the exercise directory
cd intro_to_rad_2018/demultiplexing/process_ddRAD_stacks


#we have two fastq files: Lib01_R1.fastq and Lib01_R2.fastq
ls *.fastq
 
#These files have paired end reads (two separate reads from either end of the same DNA fragment)
#Double-check that the files have the same number of reads. (note that fastq genreally files have 4 lines per read)
expr $(cat Lib01_R1.fastq | wc -l) / 4
expr $(cat Lib01_R2.fastq | wc -l) / 4
	#53042
#(Note these are for demo purposes and shorter than they should be)

#Looking up nlaIII, we see it's cut site:
#     CATG'
#    'GTAC
 
#check if we see this cut site in the forward reads
grep CATG Lib01_R1.fastq | wc -l
 
#compare this with the total number of reads we got above
 
#do these numbers make sense?
 
#look at where in the reads the cut site is (may help to paste into a text editor and character search)
grep CATG Lib01_R1.fastq | head -n 30

Based on the library preparation, we expect that our forward reads were cut with the restriction enzyme nlaIII (NLA3).

Check that the forward reads fit this expectation.

Hint: look up restriction site and use grep

Solution

#Double-check that the files have the same number of reads. (note that fastq genreally files have 4 lines per read)
expr $(cat Lib01_R1.fastq | wc -l) / 4
expr $(cat Lib01_R2.fastq | wc -l) / 4
	#53042
#(Note these are for demo purposes and shorter than they should be)

#Looking up nlaIII, we see it's cut site:
#     CATG'
#    'GTAC
 
#check if we see this cut site in the forward reads
grep CATG Lib01_R1.fastq | wc -l
 
#compare this with the total number of reads we got above

process the reads using process_radtags (part of the Stacks package):

process tags

#make a directory to put the resulting sample fastq files into
mkdir sample_fastqs

#look at the documentation for process_radtags
./process_radtags -h
 
#execute the command to process the rad data
./process_radtags -i 'fastq' -1 Lib01_R1.fastq -2 Lib01_R2.fastq -o ./sample_fastqs/ -b barcodes_Lib1.tsv --inline_index -e 'nlaIII' -r --disable_rad_check

Check the results:

check results

#how many barcode combinations did we have?
#First look at the barcodes file:
cat barcodes_Lib1.tsv 
 
#then count the lines
cat barcodes_Lib1.tsv | wc -l
 
#how r1 and r2 fastq files did we output (ignore the 'rem' files)
ls sample_fastqs/*AGCGAC.1.fq
ls sample_fastqs/*AGCGAC.1.fq
ls sample_fastqs/*AGCGAC.1.fq | wc -l
ls sample_fastqs/*AGCGAC.1.fq | wc -l
 
#are the paired end files still the same length?
cat sample_fastqs/sample_CATAT-AGCGAC.1.fq | wc -l
cat sample_fastqs/sample_CATAT-AGCGAC.2.fq | wc -l