Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

Often, the first thing you (or your boss) want to know about your sequencing run is simply, "how many reads are there?". For the $BI/gva_course/mapping/data/SRR030257_1.fastq file, the answer is 3,800,180. How can we figure that out?

The grep (or Global Regular Expression Print) command can be used to determine the number of lines which match some criteria as shown above. Above we used it to search for:

  1. anything from the group of ACTGN with the [] marking them as a group
  2. matching any number of times *
  3. from the beginning of the line ^
  4. to the end of the line $

Here, since we are only interested in the number of reads that we have, we can make use of knowing the 3rd line in the fastq file is a + and a + only, and grep's -c option to simply report the number of reads in a file.

Code Block
languagebash
titleCan you use the information above to write a grep command to count the number of reads in the same file?
collapsetrue
grep -c "^+$" $BI/gva_course/mapping/data/SRR030257_1.fastq

...

You may recall from today's first tutorial, that we used the conda system to install fastqc in preparation for this tutorial. If you did not complete that, please go back and do so now, and don't hesitate to ask a question if you are having difficulties. Interactive GUI versions are also available for Windows and Macintosh and can be downloaded from the Babraham Bioinformatics web site. We don't want to clutter up our work space so copy the SRR030257_1.fastq file to a new directory named GVA_fastqc_tutorial on scratch, and use fastqc's help option after it is installed to figure out how to run the program. Once the program is completed use scp to copy the important file back to your local machine (The bold words are key words that may give you a hint of what steps to take next)

Code Block
languagebash
titleRunning FastQC example
collapsetrue
mkdir $SCRATCH/GVA_fastqc_tutorial
cd $SCRATCH/GVA_fastqc_tutorial
cp $BI/gva_course/mapping/data/SRR030257_1.fastq .
 
fastqc -h  # examine program options
fastqc SRR030257_1.fastq  # run the program


-rwxr-xr-x 1
Expandnote
titleWhat results did you get from your FastQC analysis?
No Format
titlels -l shows something like this
Potential error with fastqc program execution

As noted during Monday's class, several people were experiencing an that included text:


No Format
Can't locate warnings.pm:   /corral-repl/utexas/BioITeam//local/share/perl5/warnings.pm: Permission denied at /home1/08965/vramirez/miniconda3/envs/fastqc-test/bin/fastqc line 2.
BEGIN failed--compilation aborted at /home1/08965/vramirez/miniconda3/envs/fastqc-test/bin/fastqc line 2.

This is believed to be related to the occasional "permission is denied" error some were getting when trying to access the BioITeam contents. If you have circled back to this, and are experiencing this error please get my attention. I have been unable to troubleshoot this fully as I have not been having the problem, but I believe the answer will be found in:

Code Block
languagebash
titlecommands for troubleshooting fastqc error
perl -V  #bottom @INC: section
perldoc -l warnings  # should list the specific version of the warnings module attempted to be being used
echo $PERL5LIB

Solution will likely be modifying $PERL5LIB away from BioITeam contents which should force conda environment versions.



Expand
titleWhat results did you get from your FastQC analysis?


No Format
titlels -l shows something like this
-rwxr-xr-x 1 ded G-802740 498588268 Jun 13 12:06 SRR030257_1.fastq
-rw-r--r-- 1 ded G-802740    291714 Jun 13 12:07 SRR030257_1_fastqc.html
-rw-r--r-- 1 ded G-802740    455677 Jun 13 12:07 SRR030257_1_fastqc.zip

The SRR030257_1.fastq file is what we analyzed, so FastQC created the other two items. SRR030257_1_fastqc.html represents the results in a file viewable in a web browser. SRR030257_1_fastqc.zip is just a Zipped (compressed) version of the results.

...

Info

In our first tutorial we mentioned how knowing what version of a program you are using can be. When we installed the the cutadapt package we didn't specify what version to install. Can you figure out what version you used, and what the most recent version of the program there is? .

Expand
titleHow to figure out the currently installed version

try using the program's help files, or conda's list function

Code Block
languagebash
titleStill not sure?
collapsetrue
cutadapt --version
conda list
conda list cutadapt

Note that all 3 of the above commands will give you the same answer: 1.18

Figuring out the most recent version is a little more complicated. Unlike programs on your computer like Microsoft Office or your internet browser, there is nothing in an installed program that tells you if you have the newest version or even what the newest version is. If you go to the programs website (easily found with google or this link), the changes section lists all the versions that have been list with v3v4.4 1 being released on March 30th the 7th of this yearmonth.

Expand
titleclick here for information regarding why there is such a large discrepancy

If you were to look at the labels section of https://anaconda.org/bioconda/cutadapt you you would see that both v3v4.4 1 and v1.18 are available. Since we didn't specify a version, conda tried to figure out what would work best. If you were to play around with removing the cutadapt package and attempt to force v3v4.4 1 to be installed you would eventually come to find that there is a conflict between the python version we have installed (3.7.10) which is higher than the allowed python versions available with V3.4linux-64 glibc program required with V4.1. Cutadapt version 1.18 however does not have such requirements for installation and therefore was installed as the only available option. 

To install the 34.4 1 version of cutadapt via conda, we would have to explicitly specify both the version of cutadapt that we wanted (34.41) as well as a compatible version of python (less than 3.7)of glibc. The point of using conda is that it is supposed to make installing programs easier, and having to hunt through error messages is anything but easy. This is simplified by specifying an additional conda channel "conda-forge". Given that the version 1.18 did the job, it is not likely to be worth the effort to update the cutadapt version, but if you wanted to:

Code Block
languagebash
titleEasiest known solution to installing version
4.1 of cutadapt
conda install -c bioconda cutadapt=4.1 -c conda-forge

We will discuss conda-forge further in later tutorials.

This won't be the last time we mention different program versions.

...