Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page should serve as a reference for the many "things Linux" we use in this course. It is by no means complete – Linux is **huge** – but offers introductions to many important topics.

...

  • Macs and Linux have a Terminal program built-in
  • Windows options:

Use ssh (secure shell) to login to a remote computers.

Code Block
languagebash
titleSSH to a remote computer
# General form:
ssh <user_name>@<full_host_name>

# For example
ssh abattenh@ls6.tacc.utexas.edu

...

Of course Google works on 3rd party tools also (e.g. search for bwa manual)

Terminal input

Literal characters and metacharacters

...

Code Block
languagebash
# You mis-type thea command name, ofor a command that is not installed on your system
ls6:~$ lzcatt
catt: command Command 'lz' not found,
but
can# beYou installedtry with:to aptuse installan mtoolsunsupported Pleaseoption
askls6:~$ yourls administrator.-z
ls: #invalid Youoption enter something that is close to an existing, or known, command
ls6:~$ catt
Command 'catt' not found, did you mean:
  command 'cat' from deb coreutils (8.30-3ubuntu2)
  command 'catty' from deb node-catty (0.0.8-1)
  command 'ratt' from deb ratt (0.0~git20180127.c44413c-2)
Try: apt install <deb name>

# You try to use an unsupported option
ls6:~$ ls -z
ls: invalid option -- 'z'
Try 'ls --help' for more information.

# You specify the name of a file that does not exist
ls6:~$ ls xxx
ls: cannot access 'xxx': No such file or directory-- 'z'
Try 'ls --help' for more information.

# You specify the name of a file that does not exist
ls6:~$ ls xxx
ls: cannot access 'xxx': No such file or directory

# You try to access a file or directory you don't have permissions for
ls6:~$ cat /etc/sudoers
cat: /etc/sudoers: Permission denied

Getting around in the shell

...

  • Right arrow and Left arrow move the cursor forward or backward on the current command line.
  • Use Ctrl-a (holding down the Control key and a) to jump the cursor to the beginning start of the line.
  • Use Ctrl-e to jump the cursor to the end of the line.
  • Arrow keys are also modified by Ctrl- (Windows) or Option- (Mac)
    • Ctrl-right-arrow (Windows) or Option-right-arrow (Mac) will skip by "word" forward
    • Ctrl-left-arrow (Windows) or Option-left-arrow (Mac) will skip by "word" backward

...

  • single Tab – completes file or directory name up to any ambiguous part
    • if nothing shows up, there is no unambiguous match
  • Tab twice – display all possible completions
    • you then decide where to go next
  • shell completion works for commands too (like python bowtie)

Absolute and relative pathname syntax

...

  • ls *.bam – lists all files in the current directory that end in .bam
  • ls [A-Z]*.bam – does the same, but only if the first character of the file is a capital letter
  • ls [ABcd]*.bam – lists all .bam files whose 1st letter is A, B, c or d.
  • ls *.{fastq,fq}.gz – lists all .fastq.gz and .fq.gz files.

...

Streams and Piping

Standard streams and redirection

...

  • samtools view converts the binary small.bam file to text and writes alignment record lines one at a time to standard output.
    • -F 0x4 option says to filter out any records where the 0x4 flag bit is 0 (not set)
    • since the 0x4 flag bit is set (1) for unmapped records, this says to only report records where the query sequence did map to the reference
  • | head -1000
    • the pipe connects the standard output of samtools view to the standard input of head
    • the -1000 option says to only write the first 1000 lines of input to standard output
  • | cut -f 5
    • the pipe connects the standard output of head to the standard input of cut
    • the -f 5 option says to only write the 5th field of each input line to standard output (input fields are tab-delimited by default)
      • the 5th field of an alignment record is an integer representing the alignment mapping quality
      •  the resulting output will have one integer per line (and 1000 lines)
  • | sort -n
    • the pipe connects the standard output of cut to the standard input of sort
    • the -n option says to sort input lines according to numeric sort order
    • the resulting output will be 1000 numeric values, one per line, sorted from lowest to highest
  • | uniq -c
    • the pipe connects the standard output of sort to the standard input of uniq
    • the -c option option says to just count groups of lines with the same value (that's why they must be sorted) and report the total for each group
    • the resulting output will be one line for each group that uniq sees
    • each line will have the text for the group (here the unique mapping quality values) and a count of lines in each group

More Linux concepts

Environment variables

Environment variables are just like variables in a programming language (in fact bash is a complete programming language), they are "pointers" that reference data assigned to them. In bash, you assign an environment variable as shown below:

...

Code Block
languagebash
This text will be output
And this USER environment variable will be evaluated: student01

Arithemetic in bash

Arithmetic in bash is very weird:

Code Block
languagebash
echo $(( 50 * 2 + 1 ))

n=0
n=$(( $n + 5 ))
echo $n

And it only returns integer values, after truncation.

Code Block
languagebash
echo $(( 4 / 2 ))
echo $(( 5 / 2 ))

echo $(( 24 / 5 ))

As a result, if I need to do anything other than the simplest arithemetic, I use awk:

Code Block
languagebash
awk 'BEGIN{print 4/2}'
echo 3 2 | awk '{print ($1+$2)/2}'

You can also use the printf function in awk to control formatting. Just remember that a linefeed ( \n ) has to included in the format string:

Code Block
languagebash
echo 3.1415926 | awk '{ printf("%.2f\n", $1) }'

You can even use it to convert a decimal number to hexadecimal using the %x printf format specifier. Note that the convention is to denote hexadecimal numbers with an initial 0x.

Code Block
languagebash
echo 65 | awk '{ printf("0x%x\n", $1) }'

Bash control flow

the bash for loop

...