Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Overview:

This section provides directions for generating SSCS (Single Strand Consensus Sequence) reads and trimming molecular indexes from raw fastq files. 

Learning Objectives:

  1. Use flexbar to trim molecular indexes from duplex seq libraries.
  2. Use python script to generate SSCS Reads.

Tutorial:

For the purpose of this tutorial, we will be working with flexbar which like breseq is something that we have installed in the BioITeam as it is not a tacc module. Some additional modules must be loaded in order for it to work correctly, and the LD_LIBRARY_PATH variable must be modified as listed below.

module swap intel gcc
export LD_LIBRARY_PATH=/corral-repl/utexas/BioITeam/flexbar_v2.23_linux64:$LD_LIBRARY_PATH

For this tutorial, it is sufficient to simply type these commands out, if this becomes something you want to do more often, or want to submit as a job, it would be important to add these lines to your .profile so they are loaded each time you log in.

If the above commands are executed properly, typing flexbar -h should display a lengthy list of optional arguments which can be used for a variety of purposes. For the purpose of this tutorial, we will only focus on trimming the first 16 bases off each read as this represents the 12 bases of the molecular index and a 4 base constant region. See if you can figure out what the command is based on the help output pay special attention to the -t option.

 If you need a hint without the answer click the triangle...

The following arguments are the ones that are needed to successfully trim the first 16 bases of the sequence:

    -u, --max-uncalled NUM
          Allowed uncalled bases (N or .) in reads, default: 0
    -x, --pre-trim-left NUM
          Trim specified number of bases on 5' end of reads before alignment
    -t, --target STR
          Prefix for output file names
    -s, --source FILE
          Input file with reads, that may contain barcodes
    -p, --source2 FILE
          Second input file for paired read scenario
    -f, --format STR
          Input format of reads: csfasta, csfastq, fasta, fastq, fastq-sanger, fastq-solexa, fastq-i1.3, fastq-i1.5,
          fastq-i1.8 (illumina 1.8+)
 If you are still stuck and want the answer click the triangle...
flexbar -u 100 -x 16 -t trimmed -s DED110_CATGGC_L006_R1_001.fastq -p DED110_CATGGC_L006_R2_001.fastq -f fastq

In an idev shell this should take less than 5 minutes to complete. Once completed there should be 6 new files, all of which begin with "trimmed" if you took the answer from the above help, or whatever string you entered for the -t argument if you did not use the above help. These 6 files represent the trimmed files, the length distribution, and any errors. using the head command, see if you can figure out which file is which. 

 click here for the answer to which file is which

the trimmed_1/2.fastq is the trimmed fastq files

the trimmed_1/2.fastq.lengthdist is the length distribution file (which should have 2 lines: a header line and a line showing that all of the reads are 85 bp long now

the trimmed_1_single.fastq is the error file which should be empty

Next we want to generate SSCS reads. 

  • No labels