Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Investigate reads. We paid the sequencing facility for 50,000 paired end reads.

1. How can be confirm we got this?

...

Paired end read files should have matching names read for read.

2. Double-check this is case for your reads

Code Block
titlesolution
collapsetrue
#this can be checked many ways, here is one option:


#search for lines that start with @ symbol (the read definition lines in our fastqs)
#and save the top 10 of them as a text file.
#Repeat for the R2 reads.
#Then line them up


grep "^@" Lib01_R1.fastq | head > first_10_R1_read_names.txt
grep "^@" Lib01_R2.fastq | head > first_10_R2_read_names.txt
paste first_10_R1_read_names.txt first_10_R2_read_names.txt



Based on the library preparation, we expect that our forward reads were cut with the restriction enzyme nlaIII (NLA3).

We expect that the vast majority, (say at least 96%) of our reads should have the recognition sequence in them.

3. How can we check that our reads largely fit this expectation? (Hint: we'll need to know the restriction enzymes recognition sequence)

Code Block
titlesolution
collapsetrue
#Looking up nlaIII, we see it's cut site:
#     CATG'
#    'GTAC
  
#check if we see this cut site in the forward reads
grep CATG Lib01_R1.fastq | wc -l


#compare with total read count