Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Anchor
Setup
Setup
Setup

Special stampede login

Before we start, log into ls5 like you did yesterday, but use this special hostname:

login5.ls5.tacc.utexas.edu

Normally you should not perform significant computation on login nodes, since they are shared by all users in the TACC community. Well, there are a  few exceptions, and login5.ls5.tacc.utexas.edu is one of them. Is it a dedicated login node owned by CSSB and CBRS, so we have given you access to it for the duration of this course. This will let us do a few things at the command line that would normally set off alarm bells from the TACC folks if we all did them on a standard login node.

Data staging

Data staging

First login to stampede2 like you did yesterday.

Set ourselves up to process some yeast data data in $SCRATCH, using some of best practices for organizing our workflow.

...

Exercise: What character in the quality score string in the FASTQ entry above represents the best base quality? Roughly what is the error probability estimated by the sequencer?

Expand
titleAnswer

J is the best base quality score character (Q=41)

It represents a probability of error of < 1/10^4 or 1/10,000

About compressed files

Sequencing data files can be very large - from a few megabytes to gigabytes. And with NGS giving us longer reads and deeper sequencing at decreasing price points, it's not hard to run out of storage space. As a result, most sequencing facilities will give you compressed sequencing data files.

...

Expand
titleAnswer

FASTQ's are ~ 150 149 MB
Compressed they are ~ 50 MB
This is about 3x compression

...

If you start less with the -N option, it will display line numbers.q

Exercise: What line of small.fq contains the read name with grid coordinates 2316:10009:100563?

...

Code Block
languagebash
titleUsing the head command
# shows 1st 10 lines
head small.fq

# shows 1st 100 lines -- might want to pipe this to more to see a bit at a time
head -100 small.fq | more

So what if you want to see line numbers on your head or tail output? Neither command seems to have an option to do this.

Expand
titleHint
Code Block
languagebash
less -N small.fq
/2316:10009:100563

piping

So what is that vertical bar ( | ) all about? It is the pipe symbol!

...