Table of Contents |
---|
...
Often, the first thing you (or your boss) want to know about your sequencing run is simply, "how many reads are there?". For the $BI/gva_course/mapping/data/SRR030257_1.fastq file, the answer is 3,800,180. How can we figure that out?
The grep (or Global Regular Expression Print) command can be used to determine the number of lines which match some criteria as shown above. Above we used it to search for:
- anything from the group of ACTGN with the [] marking them as a group
- matching any number of times *
- from the beginning of the line ^
- to the end of the line $
Here, since we are only interested in the number of reads that we have, we can make use of knowing the 3rd line in the fastq file is a + and a + only, and grep's -c option to simply report the number of reads in a file.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
grep -c "^+$" $BI/gva_course/mapping/data/SRR030257_1.fastq |
...
You may recall from today's first tutorial, that we used the conda system to install fastqc in preparation for this tutorial. If you did not complete that, please go back and do so now, and don't hesitate to ask a question if you are having difficulties. Interactive GUI versions are also available for Windows and Macintosh and can be downloaded from the Babraham Bioinformatics web site. We don't want to clutter up our work space so copy the SRR030257_1.fastq file to a new directory named GVA_fastqc_tutorial on scratch, and use fastqc's help option after the module it is loaded installed to figure out how to run the program. Once the program is completed use scp to copy the important file back to your local machine (The bold words are key words that may give you a hint of what steps to take next)
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
# on tacc terminal pwd # on new terminal of local computer scp <username>@ls5<username>@stampede2.tacc.utexas.edu:<pwd_results_from_other_window>/SRR030257_1_fastqc.html ~/Desktop # open the newly transferred file from from the desktop and see how the data looks |
...