Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page should serve as a reference for the many "things Linux" we use in this course. It is by no means complete – Linux is **huge** – but offers introductions to many important topics.

...

  • Macs and Linux have a Terminal program built-in
  • Windows options:

Use ssh (secure shell) to login to a remote computers.

Code Block
languagebash
titleSSH to a remote computer
# General form:
ssh <user_name>@<full_host_name>

# For example
ssh abattenh@ls6.tacc.utexas.edu

...

Of course Google works on 3rd party tools also (e.g. search for bwa manual)

Terminal input

Literal characters and metacharacters

...

You know the command line is ready for input when you see the command line prompt. It can be configured differently on different systems, but on our system it shows your account name, server name, current directory, then a dollar sign ($). Note the tilde character ( ~ ) signifies your Home directory.

The shell executes command line input when it sees a linefeed character (\n, also called a newline), which happens when you press Enter after entering the command.

...

Code Block
languagebash
# You mis-type a thecommand name, ofor a command that is not installed on your system
ls6:~$ lzcatt
 
Command 'lz'catt: command not found,
but
can# beYou installedtry with:to aptuse installan mtoolsunsupported Pleaseoption
askls6:~$ yourls administrator.-z
ls: # You enter something that is close to an existing, or known, command
ls6:~$ catt
Command 'catt' not found, did you mean:
  command 'cat' from deb coreutils (8.30-3ubuntu2)
  command 'catty' from deb node-catty (0.0.8-1)
  command 'ratt' from deb ratt (0.0~git20180127.c44413c-2)
Try: apt install <deb name>

# You try to use an unsupported option
ls6:~$ ls -z
ls: invalid option -- 'z'
Try 'ls --help' for more information.

# You specify the name of a file that does not exist
ls6:~$ ls xxx
ls: cannot access 'xxx': No such file or directory

Getting around in the shell

...

invalid option -- 'z'
Try 'ls --help' for more information.

# You specify the name of a file that does not exist
ls6:~$ ls xxx
ls: cannot access 'xxx': No such file or directory

# You try to access a file or directory you don't have permissions for
ls6:~$ cat /etc/sudoers
cat: /etc/sudoers: Permission denied

Getting around in the shell

Type as little and as accurately as possible by using keyboard shortcuts!

...

  • Right arrow and Left arrow move the cursor forward or backward on the current command line.
  • Use Ctrl-a (holding down the Control key and a) to jump the cursor to the beginning start of the line.
  • Use Ctrl-e to jump the cursor to the end of the line.
  • Arrow keys are also modified by Ctrl- (Windows) or Option- (Mac)
    • Ctrl-right-arrow (Windows) or Option-right-arrow (Mac) will skip by "word" forward
    • Ctrl-left-arrow (Windows) or Option-left-arrow (Mac) will skip by "word" backward

...

  • single Tab – completes file or directory name up to any ambiguous part
    • if nothing shows up, there is no unambiguous match
  • Tab twice – display all possible completions
    • you then decide where to go next
  • shell completion works for commands too (like python bowtie)

Absolute and relative pathname syntax

...

  • ls *.bam – lists all files in the current directory that end in .bam
  • ls [A-Z]*.bam – does the same, but only if the first character of the file is a capital letter
  • ls [ABcd]*.bam – lists all .bam files whose 1st letter is A, B, c or d.
  • ls *.{fastq,fq}.gz – lists all .fastq.gz and .fq.gz files.

...

Streams and Piping

Standard streams and redirection

...

  • samtools view converts the binary small.bam file to text and writes alignment record lines one at a time to standard output.
    • -F 0x4 option says to filter out any records where the 0x4 flag bit is 0 (not set)
    • since the 0x4 flag bit is set (1) for unmapped records, this says to only report records where the query sequence did map to the reference
  • | head -1000
    • the pipe connects the standard output of samtools view to the standard input of head
    • the -1000 option says to only write the first 1000 lines of input to standard output
  • | cut -f 5
    • the pipe connects the standard output of head to the standard input of cut
    • the -f 5 option says to only write the 5th field of each input line to standard output (input fields are tab-delimited by default)
      • the 5th field of an alignment record is an integer representing the alignment mapping quality
      •  the resulting output will have one integer per line (and 1000 lines)
  • | sort -n
    • the pipe connects the standard output of cut to the standard input of sort
    • the -n option says to sort input lines according to numeric sort order
    • the resulting output will be 1000 numeric values, one per line, sorted from lowest to highest
  • | uniq -c
    • the pipe connects the standard output of sort to the standard input of uniq
    • the -c option option says to just count groups of lines with the same value (that's why they must be sorted) and report the total for each group
    • the resulting output will be one line for each group that uniq sees
    • each line will have the text for the group (here the unique mapping quality values) and a count of lines in each group

More Linux concepts

Environment variables

Environment variables are just like variables in a programming language (in fact bash is a complete programming language), they are "pointers" that reference data assigned to them. In bash, you assign an environment variable as shown below:

...

Another method for writing multi-line text that can be useful for composing a large block of text in a script, is the heredoc syntax, where a block of text is specified between two user-supplied block delimiters, and that text block is sent to a command. The general form of a heredoc is:

Code Block
languagebash
COMMAND << DELIMITER
..text...
..text...
DELIMITER
Tip

The 2nd (ending) block delimiter you specify for a heredoc must appear at the start of a new line.

...

:

Code Block
languagebash
COMMAND << DELIMITER
..text...
..text...
DELIMITER


Tip

The 2nd (ending) block delimiter you specify for a heredoc must appear at the start of a new line.

For example, using the (arbitrary) delimiter EOF and the cat command:

Code Block
languagebash
cat << EOF
This text will be output
And this USER environment variable will be evaluated: $USER
EOF

Here the block of text provided to cat is just displayed on the Terminal. To write it to a file just use the 1> or > redirection syntax in the cat command:

Code Block
languagebash
cat 1> out.txt << EOF
This text will be output
And this USER environment variable will be evaluated: $USER
EOF

The out.txt file will then contain this text:

Code Block
languagebash
This text will be output
And this USER environment variable will be evaluated: student01

Arithemetic in bash

Arithmetic in bash is very weird:

Code Block
languagebash
echo $(( 50 * 2 + 1 ))

n=0
n=$(( $n + 5 ))
echo $n

And it only returns integer values, after truncation.

Code Block
languagebash
echo $(( 4 / 2 ))
echo $(( 5 / 2 ))

echo $(( 24 / 5 ))

As a result, if I need to do anything other than the simplest arithemetic, I use awk:

Code Block
languagebash
cat << EOF
This text will be output
And this USER environment variable will be evaluated: $USER
EOF

...

awk 'BEGIN{print 4/2}'
echo 3 2 | awk '{print ($1+$2)/2}'

You can also use the printf function in awk to control formatting. Just remember that a linefeed ( \n ) has to included in the format string:

Code Block
languagebash
cat 1> out.txt << EOF
This text will be output
And this USER environment variable will be evaluated: $USER
EOF

...

echo 3.1415926 | awk '{ printf("%.2f\n", $1) }'

You can even use it to convert a decimal number to hexadecimal using the %x printf format specifier. Note that the convention is to denote hexadecimal numbers with an initial 0x.

Code Block
languagebash
Thisecho text65 will| beawk output
And this USER environment variable will be evaluated: student01
'{ printf("0x%x\n", $1) }'

Bash control flow

the bash for loop

...