Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Warning

When following along here, please start an idev session for running any example commands:

Code Block
idev -m 60 -q development

Illumina sequence data format (FASTQ)

GSAF gives you paired end sequencing data in two matching fastq format files, contining reads for each end sequenced -- for example Sample_ABC_L005_R1.cat.fastq and Sample_ABC_L005_R2.cat.fastq. Each read end sequenced is representd by a 4-line entry in the fastq file.

...

Code Block
titleGet set up for the exercises
cds
mkdir my_rnaseq_course #this is where you'll be doing all the course exercises
cd my_rnaseq_course
cp -r /corral-repl/utexas/BioITeam/rnaseq_course_2015/fastqc_exercise .
cd fastqc_exercise
 
ls data

...

Expand
titleAnswer

The wc -l command says there are16000000 lines. FASTQ files have 4 lines per sequence, so the file has 16,000,000/4 or 4,000,000 sequences.

Code Block
grep '^@HWI' data/Sample1_R1.fastq |wc -l

 

Lets move on to assessing the quality of this data...