Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

While you could write your own sequence converter, hopefully it jumps out at you that this is something someone else must have done before. In situations like this, you can often spend a few minutes on google finding a stackoverlow question/answer that deals with something like this. Some will be in reference to how to code such things, but the particularly useful ones will be the ones that point to a program or repository where someone has already done this for you. 

In our case on TACC, the BioITeam has uploaded such a conversion script to one of the shared spaces. Using the which command, can you figure out where the bp_seqconvert.pl script is located?this case the bp_seqconvert.pl perl script is included as part of the bioperl module package. Rather than attempt to find it as part of a conda package, or in some other repository we will use the module version. If needing this script in the future outside of TACC, https://metacpan.org/dist/BioPerl/view/bin/bp_seqconvert

Code Block
languagebash
titleRecall that we have used the which command to determine where executable files are located, and only take 2 pieces of information.
module load bioperl/1.007002
which -a bp_seqconvert.pl

By the end of the class, hopefully you will recognize the "If you run the which command above outside of an idev session you should see 2 results. 1 that points to the BioITeam near where you keep finding your data (/corral-repl/utexas/BioITeam/" ) part of the answer as the BioITeam. The "bin" folder specifically is full of binary or (typically small) bash/python/perl/R scripts that someone has written to help the TACC community.  The bp_seqconvert.pl script is a common script written in bioperl that is a helpful utility for converting between many common sequence formats. The other is in a folder specifically associated with the bioperl module.

If you try to run the BioITeam version of the script howeverfrom the head node, you get the following error message:

...

Info
titleDeciphering error messages

The above error message is pretty helpful, but much less so if you are not familiar with perl. As I doubt anyone in the class is more familiar with perl than I am, and I am not familiar with perl hardly at all, this is a great opportunity to explain how I would tackle the error message to figure out what is going on.

  1. "compilation aborted at /corral-repl/utexas/BioITeam/bin/bp_seqconvert.pl line 8." 
    1. The last line here actually tells us that the script did not get very far, only to line 8.
    2. My experience with other programing language tells me that the beginning of scripts is all about checking that the script has access to all the things it needs to do what it is intended to do, so this has me thinking some kind of package might be missing.
  2. "(@INC contains: ..."
    1. This reads like the PATH variable, but is locations I don't recognize as being in my path, suggesting this is not some external binary or other program.
    2. Many of the individual pathways list "lib" in one form or another. This reinforces the idea from above that some kind of package is missing.
  3. "Can't locate Bio/SeqIO.pm in @INC"
    1. "Can't locate" reads like a plain text version of something being missing, and like something generic that is not related to my system/environment (like all the listed directories), and not related to the details of the script I am trying to run (like the last line that details the name of the script we tried to envoke)
    2. This is what should be googled for help solving the problem. 
      1.  the google results list similar error messages associated with different repositories/programs (github issues) suggesting some kind of common underlying problem.
      2. The 3rd result https://www.biostars.org/p/345331/ reads like a gneric generic problem and sure enough the answers detail needing to have the Bio library installed from cpan (perl's package management system)

...

We get this error message because on TACCstampede2, while perl is installed, but the bioperl library isn'trequired bioperl module is not loaded by default. Rather than having to actually install the library yourself (as we will do for several libraries in the SVDetect tutorial later today), we are lucky that TACC has a bioperl module. So, you must have the bioperl module loaded before the script will run. As it is fairly rare that you need to convert sequence files between different format, bioperl is actually not listed as one of the modules on your .bashrc file in your $HOME directory that you set up yesterday. Additionally it gives an opportunity to have you working with the module system. 

...