This page should serve as a reference for the many "things Linux" we use in this course. It is by no means complete – Linux is **huge** – but offers introductions to many important topics.
...
- Macs and Linux have a Terminal program built-in
- Windows options:
- Windows 10+
- Command Prompt and PowerShell programs have ssh and scp (may require latest Windows updates)
- Start menu → Search for Command
- Putty – http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
- simple Terminal and file copy programs
- download either the Putty installer or just putty.exe (Terminal) and pscp.exe (secure copy client)
- Windows Subsystem for Linux – Windows 10 Professional includes a Ubuntu-like bash shells
- See https://docs.microsoft.com/en-us/windows/wsl/install-win10
- We recommend the Ubuntu Linux distribution, but any Linux distribution will have an SSH client
- Command Prompt and PowerShell programs have ssh and scp (may require latest Windows updates)
- Windows 10+
Use ssh (secure shell) to login to a remote computers.
Code Block | ||||
---|---|---|---|---|
| ||||
# General form: ssh <user_name>@<full_host_name> # For example ssh abattenh@ls6.tacc.utexas.edu |
...
Of course Google works on 3rd party tools also (e.g. search for bwa manual)
Terminal input
Literal characters and metacharacters
...
Code Block | ||
---|---|---|
| ||
# You mis-type thea command name, ofor a command that is not installed on your system ls6:~$ lzcatt catt: command Command 'lz' not found, but can# beYou installedtry with:to aptuse installan mtoolsunsupported Pleaseoption askls6:~$ yourls administrator.-z ls: #invalid Youoption enter something that is close to an existing, or known, command ls6:~$ catt Command 'catt' not found, did you mean: command 'cat' from deb coreutils (8.30-3ubuntu2) command 'catty' from deb node-catty (0.0.8-1) command 'ratt' from deb ratt (0.0~git20180127.c44413c-2) Try: apt install <deb name> # You try to use an unsupported option ls6:~$ ls -z ls: invalid option -- 'z' Try 'ls --help' for more information. # You specify the name of a file that does not exist ls6:~$ ls xxx ls: cannot access 'xxx': No such file or directory-- 'z' Try 'ls --help' for more information. # You specify the name of a file that does not exist ls6:~$ ls xxx ls: cannot access 'xxx': No such file or directory # You try to access a file or directory you don't have permissions for ls6:~$ cat /etc/sudoers cat: /etc/sudoers: Permission denied |
Getting around in the shell
...
- Right arrow and Left arrow move the cursor forward or backward on the current command line.
- Use Ctrl-a (holding down the Control key and a) to jump the cursor to the beginning start of the line.
- Use Ctrl-e to jump the cursor to the end of the line.
- Arrow keys are also modified by Ctrl- (Windows) or Option- (Mac)
- Ctrl-right-arrow (Windows) or Option-right-arrow (Mac) will skip by "word" forward
- Ctrl-left-arrow (Windows) or Option-left-arrow (Mac) will skip by "word" backward
...
- single Tab – completes file or directory name up to any ambiguous part
- if nothing shows up, there is no unambiguous match
- Tab twice – display all possible completions
- you then decide where to go next
- shell completion works for commands too (like python bowtie)
Absolute and relative pathname syntax
...
- ls *.bam – lists all files in the current directory that end in .bam
- ls [A-Z]*.bam – does the same, but only if the first character of the file is a capital letter
- ls [ABcd]*.bam – lists all .bam files whose 1st letter is A, B, c or d.
- ls *.{fastq,fq}.gz – lists all .fastq.gz and .fq.gz files.
...
Streams and Piping
Standard streams and redirection
...
- samtools view converts the binary small.bam file to text and writes alignment record lines one at a time to standard output.
- -F 0x4 option says to filter out any records where the 0x4 flag bit is 0 (not set)
- since the 0x4 flag bit is set (1) for unmapped records, this says to only report records where the query sequence did map to the reference
- | head -1000
- the pipe connects the standard output of samtools view to the standard input of head
- the -1000 option says to only write the first 1000 lines of input to standard output
- | cut -f 5
- the pipe connects the standard output of head to the standard input of cut
- the -f 5 option says to only write the 5th field of each input line to standard output (input fields are tab-delimited by default)
- the 5th field of an alignment record is an integer representing the alignment mapping quality
- the resulting output will have one integer per line (and 1000 lines)
- | sort -n
- the pipe connects the standard output of cut to the standard input of sort
- the -n option says to sort input lines according to numeric sort order
- the resulting output will be 1000 numeric values, one per line, sorted from lowest to highest
- | uniq -c
- the pipe connects the standard output of sort to the standard input of uniq
- the -c option option says to just count groups of lines with the same value (that's why they must be sorted) and report the total for each group
- the resulting output will be one line for each group that uniq sees
- each line will have the text for the group (here the unique mapping quality values) and a count of lines in each group
More Linux concepts
Environment variables
Environment variables are just like variables in a programming language (in fact bash is a complete programming language), they are "pointers" that reference data assigned to them. In bash, you assign an environment variable as shown below:
...
Code Block | ||
---|---|---|
| ||
This text will be output
And this USER environment variable will be evaluated: student01
|
Arithemetic in bash
Arithmetic in bash is very weird:
Code Block | ||
---|---|---|
| ||
echo $(( 50 * 2 + 1 )) n=0 n=$(( $n + 5 )) echo $n |
And it only returns integer values, after truncation.
Code Block | ||
---|---|---|
| ||
echo $(( 4 / 2 ))
echo $(( 5 / 2 ))
echo $(( 24 / 5 )) |
As a result, if I need to do anything other than the simplest arithemetic, I use awk:
Code Block | ||
---|---|---|
| ||
awk 'BEGIN{print 4/2}'
echo 3 2 | awk '{print ($1+$2)/2}' |
You can also use the printf function in awk to control formatting. Just remember that a linefeed ( \n ) has to included in the format string:
Code Block | ||
---|---|---|
| ||
echo 3.1415926 | awk '{ printf("%.2f\n", $1) }' |
You can even use it to convert a decimal number to hexadecimal using the %x printf format specifier. Note that the convention is to denote hexadecimal numbers with an initial 0x.
Code Block | ||
---|---|---|
| ||
echo 65 | awk '{ printf("0x%x\n", $1) }' |
Bash control flow
the bash for loop
...