This page should serve as a reference for the many "things Linux" we use in this course. It is by no means complete – Linux is **huge** – but offers introductions to many important topics.
...
- Macs and Linux have a Terminal program built-in
- Windows options:
- Windows 10+
- Command Prompt and PowerShell programs have ssh and scp (may require latest Windows updates)
- Start menu → Search for Command
- Putty – http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
- simple Terminal and file copy programs
- download either the Putty installer or just putty.exe (Terminal) and pscp.exe (secure copy client)
- Windows Subsystem for Linux – Windows 10 Professional includes a Ubuntu-like bash shells
- See https://docs.microsoft.com/en-us/windows/wsl/install-win10
- We recommend the Ubuntu Linux distribution, but any Linux distribution will have an SSH client
- Command Prompt and PowerShell programs have ssh and scp (may require latest Windows updates)
- Windows 10+
Use ssh (secure shell) to login to a remote computers.
Code Block | ||||
---|---|---|---|---|
| ||||
# General form: ssh <user_name>@<full_host_name> # For example ssh abattenh@ls6.tacc.utexas.edu |
...
Of course Google works on 3rd party tools also (e.g. search for bwa manual)
Terminal input
Literal characters and metacharacters
...
You know the command line is ready for input when you see the command line prompt. It can be configured differently on different systems, but on our system it shows your account name, server name, current directory, then a dollar sign ($). Note the tilde character ( ~ ) signifies your Home directory.
The shell executes command line input when it sees a linefeed character (\n, also called a newline), which happens when you press Enter after entering the command.
...
Code Block | ||
---|---|---|
| ||
# You mis-type a thecommand name, ofor a command that is not installed on your system ls6:~$ lzcatt Command 'lz'catt: command not found, but can# beYou installedtry with:to aptuse installan mtoolsunsupported Pleaseoption askls6:~$ yourls administrator.-z ls: # You enter something that is close to an existing, or known, command ls6:~$ catt Command 'catt' not found, did you mean: command 'cat' from deb coreutils (8.30-3ubuntu2) command 'catty' from deb node-catty (0.0.8-1) command 'ratt' from deb ratt (0.0~git20180127.c44413c-2) Try: apt install <deb name> # You try to use an unsupported option ls6:~$ ls -z ls: invalid option -- 'z' Try 'ls --help' for more information. # You specify the name of a file that does not exist ls6:~$ ls xxx ls: cannot access 'xxx': No such file or directory |
Getting around in the shell
...
invalid option -- 'z'
Try 'ls --help' for more information.
# You specify the name of a file that does not exist
ls6:~$ ls xxx
ls: cannot access 'xxx': No such file or directory
# You try to access a file or directory you don't have permissions for
ls6:~$ cat /etc/sudoers
cat: /etc/sudoers: Permission denied |
Getting around in the shell
Type as little and as accurately as possible by using keyboard shortcuts!
...
- Right arrow and Left arrow move the cursor forward or backward on the current command line.
- Use Ctrl-a (holding down the Control key and a) to jump the cursor to the beginning start of the line.
- Use Ctrl-e to jump the cursor to the end of the line.
- Arrow keys are also modified by Ctrl- (Windows) or Option- (Mac)
- Ctrl-right-arrow (Windows) or Option-right-arrow (Mac) will skip by "word" forward
- Ctrl-left-arrow (Windows) or Option-left-arrow (Mac) will skip by "word" backward
...
- single Tab – completes file or directory name up to any ambiguous part
- if nothing shows up, there is no unambiguous match
- Tab twice – display all possible completions
- you then decide where to go next
- shell completion works for commands too (like python bowtie)
Absolute and relative pathname syntax
...
- ls *.bam – lists all files in the current directory that end in .bam
- ls [A-Z]*.bam – does the same, but only if the first character of the file is a capital letter
- ls [ABcd]*.bam – lists all .bam files whose 1st letter is A, B, c or d.
- ls *.{fastq,fq}.gz – lists all .fastq.gz and .fq.gz files.
...
Streams and Piping
Standard streams and redirection
...
- samtools view converts the binary small.bam file to text and writes alignment record lines one at a time to standard output.
- -F 0x4 option says to filter out any records where the 0x4 flag bit is 0 (not set)
- since the 0x4 flag bit is set (1) for unmapped records, this says to only report records where the query sequence did map to the reference
- | head -1000
- the pipe connects the standard output of samtools view to the standard input of head
- the -1000 option says to only write the first 1000 lines of input to standard output
- | cut -f 5
- the pipe connects the standard output of head to the standard input of cut
- the -f 5 option says to only write the 5th field of each input line to standard output (input fields are tab-delimited by default)
- the 5th field of an alignment record is an integer representing the alignment mapping quality
- the resulting output will have one integer per line (and 1000 lines)
- | sort -n
- the pipe connects the standard output of cut to the standard input of sort
- the -n option says to sort input lines according to numeric sort order
- the resulting output will be 1000 numeric values, one per line, sorted from lowest to highest
- | uniq -c
- the pipe connects the standard output of sort to the standard input of uniq
- the -c option option says to just count groups of lines with the same value (that's why they must be sorted) and report the total for each group
- the resulting output will be one line for each group that uniq sees
- each line will have the text for the group (here the unique mapping quality values) and a count of lines in each group
More Linux concepts
Environment variables
Environment variables are just like variables in a programming language (in fact bash is a complete programming language), they are "pointers" that reference data assigned to them. In bash, you assign an environment variable as shown below:
...
Another method for writing multi-line text that can be useful for composing a large block of text in a script, is the heredoc syntax, where a block of text is specified between two user-supplied block delimiters, and that text block is sent to a command. The general form of a heredoc is:
Code Block | ||
---|---|---|
| ||
COMMAND << DELIMITER
..text...
..text...
DELIMITER |
Tip |
---|
The 2nd (ending) block delimiter you specify for a heredoc must appear at the start of a new line. |
...
:
Code Block | ||
---|---|---|
| ||
COMMAND << DELIMITER
..text...
..text...
DELIMITER |
Tip |
---|
The 2nd (ending) block delimiter you specify for a heredoc must appear at the start of a new line. |
For example, using the (arbitrary) delimiter EOF and the cat command:
Code Block | ||
---|---|---|
| ||
cat << EOF
This text will be output
And this USER environment variable will be evaluated: $USER
EOF |
Here the block of text provided to cat is just displayed on the Terminal. To write it to a file just use the 1> or > redirection syntax in the cat command:
Code Block | ||
---|---|---|
| ||
cat 1> out.txt << EOF
This text will be output
And this USER environment variable will be evaluated: $USER
EOF |
The out.txt file will then contain this text:
Code Block | ||
---|---|---|
| ||
This text will be output
And this USER environment variable will be evaluated: student01
|
Arithemetic in bash
Arithmetic in bash is very weird:
Code Block | ||
---|---|---|
| ||
echo $(( 50 * 2 + 1 ))
n=0
n=$(( $n + 5 ))
echo $n |
And it only returns integer values, after truncation.
Code Block | ||
---|---|---|
| ||
echo $(( 4 / 2 ))
echo $(( 5 / 2 ))
echo $(( 24 / 5 )) |
As a result, if I need to do anything other than the simplest arithemetic, I use awk:
Code Block | ||
---|---|---|
| ||
cat << EOF
This text will be output
And this USER environment variable will be evaluated: $USER
EOF |
...
awk 'BEGIN{print 4/2}'
echo 3 2 | awk '{print ($1+$2)/2}' |
You can also use the printf function in awk to control formatting. Just remember that a linefeed ( \n ) has to included in the format string:
Code Block | ||
---|---|---|
| ||
cat 1> out.txt << EOF
This text will be output
And this USER environment variable will be evaluated: $USER
EOF |
...
echo 3.1415926 | awk '{ printf("%.2f\n", $1) }' |
You can even use it to convert a decimal number to hexadecimal using the %x printf format specifier. Note that the convention is to denote hexadecimal numbers with an initial 0x.
Code Block | ||
---|---|---|
| ||
Thisecho text65 will| beawk output And this USER environment variable will be evaluated: student01 '{ printf("0x%x\n", $1) }' |
Bash control flow
the bash for loop
...