Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page should serve as a reference for the many "things Linux" we use in this course. It is by no means complete – Linux is **huge** – but offers introductions to many important topics.

...

  • Macs and Linux have a Terminal program built-in
  • Windows options:

Use ssh (secure shell) to login to a remote computers.

Code Block
languagebash
titleSSH to a remote computer
# General form:
ssh <user_name>@<full_host_name>

# For example
ssh abattenh@ls6.tacc.utexas.edu

...

  • The built-in history command lists the commands you've entered, each with a number.
    • You can re-execute any command in the history by typing an exclamation point ( ! ) then the number
    • e.g. !15 re-executes the 15th command in your history.

...

On most modern Linux shells you use Tab completion by pressing:

  • single Tab – completes file or directory name up to any ambiguous part
    • if nothing shows up, there is no unambiguous match
  • Tab twice – display all possible completions
    • you then decide where to go next
  • shell completion works for commands too (like python)

...

Tip

While it is possible to create file and directory names that have embedded spaces, that creates problems when manipulating them.

To avoid headaches, it is best not to create file/directory names with embedded spaces, or with special characters such as + & # ( )

Pathname wildcards

...

...

The shell has shorthand to refer to groups of files by allowing wildcards in file names.

...

Code Block
languagebash
titlePipe uncompressed output to a pager
# zcat is like cat, except that it understands the gz compressed format,
# and uncompresses the data before writing it to standard output.
# So, like cat, you need to be sure to pipe the output to a pager if
# the file is large.
zcat big.fq.gz | wc -l

piping a histogram

...

But the real power of piping comes when you stitch together a string of commands with pipes – it's incredibly flexible, and fun once you get the hang of it.

...

  • samtools view converts the binary small.bam file to text and writes alignment record lines one at a time to standard output.
    • -F 0x4 option says to filter out any records where the 0x4 flag bit is 0 (not set)
    • since the 0x4 flag bit is set (1) for unmapped records, this says to only report records where the query sequence did map to the reference
  • | head -1000
    • the pipe connects the standard output of samtools view to the standard input of head
    • the -1000 option says to only write the first 1000 lines of input to standard output
  • | cut -f 5
    • the pipe connects the standard output of head to the standard input of cut
    • the -f 5 option says to only write the 5th field of each input line to standard output (input fields are tab-delimited by default)
      • the 5th field of an alignment record is an integer representing the alignment mapping quality
      •  the resulting output will have one integer per line (and 1000 lines)
  • | sort -n
    • the pipe connects the standard output of cut to the standard input of sort
    • the -n option says to sort input lines according to numeric sort order
    • the resulting output will be 1000 numeric values, one per line, sorted from lowest to highest
  • | uniq -c
    • the pipe connects the standard output of sort to the standard input of uniq
    • the -c option option says to just count groups of lines with the same value (that's why they must be sorted) and report the total for each group
    • the resulting output will be one line for each group that uniq sees
    • each line will have the text for the group (here the unique mapping quality values) and a count of lines in each group

Environment variables

...

Environment variables are just like variables in a programming language (in fact bash is a complete programming language), they are "pointers" that reference data assigned to them. In bash, you assign an environment variable as shown below:

...

Tip

Careful – do not put spaces around the equals sign when assigning environment variable values.

Also, always use surround the value with double quotes if your value ( " " ) if it contains (or might contain) spaces.

You set environment variables using the bare name (varname above).

You then refer to or evaluate an environment variables variable using a dollar sign ( $ ) evaluation operator before the name:

Code Block
languagebash
titleRefer to an environment variable
echo $varname

...

Tip

To display a metacharacter as a literal inside double quotes, use the backslash ( \ ) character to escape the following character.

Code Block
languagebash
  # Inside double quotes, use a backslash ( \ ) to escape the dollar sign ( $ ) metacharacter 
echo "the environment variable storing my account name is \$USER"


You can also use either single or double quotes to enclose text you want to appear on multiple lines:

Code Block
languagebash
# this expression will output 4 lines of text
echo "1
2
3
4"

Backtick quoting and sub-shell evaluation

...

An example, using the date function that just writes the current date and time to standard output, which appears on your Terminal.

Code Block
languagebash
date          # Calling the date command just displays date/time information
echo date     # Here "date" is treated as a literal word, and written to standard output
echo `date`   # The date command is evaluated and its standard output replaces the command`date`

A slightly different syntax, called sub-shell evaluation, also evaluates the expression inside $( ) and replaces it with the expression's standard output.

...

Here num is the name I gave the variable that is will be assigned a different number each time through the loop (called the loop's formal argument). The set of such numbers is generated by the seq 4 command. Each number is then referenced as$num inside the loop.

...

Code Block
languagebash
titleFor loop to count sequences in multiple FASTQs
for fname in *.gz; do
   echo "$fname has $((`zcat $fname | wc -l` / 4)) sequences"
done

...

Copying files between TACC and your laptop
Anchor
Copying_files_to_from_TACC
Copying_files_to_from_TACC

...

  • Open a Terminal window on your local computer
  • cd to the directory where you want the files
  • Type something like the following, substituting your user name and absolute path:

Code Block
languagebash
titleExecute this on your laptop
scp abattenh@ls6.tacc.utexas.edu:/scratch/01063/abattenh/core_ngs/fastq_prep/small_fastqc.html .

Windows users can use the free WinSCP program : (https://winscp.net/eng/index.php.

...

) if their Windows version does not support scp.

Editing files

There are several options for editing files at TACC. These fall into three categories:

...