This page should serve as a reference for the many "things Linux" we use in this course. It is by no means complete – Linux is **huge** – but offers introductions to many important topics.
...
- Macs and Linux have a Terminal program built-in
- Windows options:
- Windows 10+
- Command Prompt and PowerShell programs have ssh and scp (may require latest Windows updates)
- Start menu → Search for Command
- Putty – http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
- simple Terminal and file copy programs
- download either the Putty installer or just putty.exe (Terminal) and pscp.exe (secure copy client)
- Windows Subsystem for Linux – Windows 10 Professional includes a Ubuntu-like bash shells
- See https://docs.microsoft.com/en-us/windows/wsl/install-win10
- We recommend the Ubuntu Linux distribution, but any Linux distribution will have an SSH client
- Command Prompt and PowerShell programs have ssh and scp (may require latest Windows updates)
- Windows 10+
Use ssh (secure shell) to login to a remote computers.
Code Block | ||||
---|---|---|---|---|
| ||||
# General form: ssh <user_name>@<full_host_name> # For example ssh abattenh@ls6.tacc.utexas.edu |
...
- The built-in history command lists the commands you've entered, each with a number.
- You can re-execute any command in the history by typing an exclamation point ( ! ) then the number
- e.g. !15 re-executes the 15th command in your history.
...
On most modern Linux shells you use Tab completion by pressing:
- single Tab – completes file or directory name up to any ambiguous part
- if nothing shows up, there is no unambiguous match
- Tab twice – display all possible completions
- you then decide where to go next
- shell completion works for commands too (like python)
...
Tip |
---|
While it is possible to create file and directory names that have embedded spaces, that creates problems when manipulating them. To avoid headaches, it is best not to create file/directory names with embedded spaces, or with special characters such as + & # ( ) |
Pathname wildcards
...
...
The shell has shorthand to refer to groups of files by allowing wildcards in file names.
...
Code Block | ||||
---|---|---|---|---|
| ||||
# zcat is like cat, except that it understands the gz compressed format, # and uncompresses the data before writing it to standard output. # So, like cat, you need to be sure to pipe the output to a pager if # the file is large. zcat big.fq.gz | wc -l |
piping a histogram
...
But the real power of piping comes when you stitch together a string of commands with pipes – it's incredibly flexible, and fun once you get the hang of it.
...
- samtools view converts the binary small.bam file to text and writes alignment record lines one at a time to standard output.
- -F 0x4 option says to filter out any records where the 0x4 flag bit is 0 (not set)
- since the 0x4 flag bit is set (1) for unmapped records, this says to only report records where the query sequence did map to the reference
- | head -1000
- the pipe connects the standard output of samtools view to the standard input of head
- the -1000 option says to only write the first 1000 lines of input to standard output
- | cut -f 5
- the pipe connects the standard output of head to the standard input of cut
- the -f 5 option says to only write the 5th field of each input line to standard output (input fields are tab-delimited by default)
- the 5th field of an alignment record is an integer representing the alignment mapping quality
- the resulting output will have one integer per line (and 1000 lines)
- | sort -n
- the pipe connects the standard output of cut to the standard input of sort
- the -n option says to sort input lines according to numeric sort order
- the resulting output will be 1000 numeric values, one per line, sorted from lowest to highest
- | uniq -c
- the pipe connects the standard output of sort to the standard input of uniq
- the -c option option says to just count groups of lines with the same value (that's why they must be sorted) and report the total for each group
- the resulting output will be one line for each group that uniq sees
- each line will have the text for the group (here the unique mapping quality values) and a count of lines in each group
Environment variables
...
Environment variables are just like variables in a programming language (in fact bash is a complete programming language), they are "pointers" that reference data assigned to them. In bash, you assign an environment variable as shown below:
...
Tip |
---|
Careful – do not put spaces around the equals sign when assigning environment variable values. Also, always use surround the value with double quotes if your value ( " " ) if it contains (or might contain) spaces. |
You set environment variables using the bare name (varname above).
You then refer to or evaluate an environment variables variable using a dollar sign ( $ ) evaluation operator before the name:
Code Block | ||||
---|---|---|---|---|
| ||||
echo $varname |
...
Tip | |||||
---|---|---|---|---|---|
To display a metacharacter as a literal inside double quotes, use the backslash ( \ ) character to escape the following character.
|
You can also use either single or double quotes to enclose text you want to appear on multiple lines:
Code Block | ||
---|---|---|
| ||
# this expression will output 4 lines of text echo "1 2 3 4" |
Backtick quoting and sub-shell evaluation
...
An example, using the date function that just writes the current date and time to standard output, which appears on your Terminal.
Code Block | ||
---|---|---|
| ||
date # Calling the date command just displays date/time information echo date # Here "date" is treated as a literal word, and written to standard output echo `date` # The date command is evaluated and its standard output replaces the command`date` |
A slightly different syntax, called sub-shell evaluation, also evaluates the expression inside $( ) and replaces it with the expression's standard output.
...
Here num is the name I gave the variable that is will be assigned a different number each time through the loop (called the loop's formal argument). The set of such numbers is generated by the seq 4 command. Each number is then referenced as$num inside the loop.
...
Code Block | ||||
---|---|---|---|---|
| ||||
for fname in *.gz; do echo "$fname has $((`zcat $fname | wc -l` / 4)) sequences" done |
...
Copying files between TACC and your laptop
Anchor | ||||
---|---|---|---|---|
|
...
- Open a Terminal window on your local computer
- cd to the directory where you want the files
- Type something like the following, substituting your user name and absolute path:
Code Block | ||||
---|---|---|---|---|
| ||||
scp abattenh@ls6.tacc.utexas.edu:/scratch/01063/abattenh/core_ngs/fastq_prep/small_fastqc.html . |
Windows users can use the free WinSCP program : (https://winscp.net/eng/index.php.
...
) if their Windows version does not support scp.
Editing files
There are several options for editing files at TACC. These fall into three categories:
...