This page should serve as a reference for the many "things Linux" we use in this course. It is by no means complete – Linux is **huge** – but offers introductions to many important topics.
See also this page, which provides lists of the most common Linux commands, by category, as well as their most useful options: Some Linux commands
Table of Contents |
---|
Some Linux commands
...
- Macs and Linux have a Terminal program built-in
- Windows options:
- Windows 10+
- Command Prompt and PowerShell programs have ssh and scp (may require latest Windows updates)
- Start menu → Search for Command
- Putty – http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
- simple Terminal and file copy programs
- download either the Putty installer or just putty.exe (Terminal) and pscp.exe (secure copy client)
- Windows Subsystem for Linux – Windows 10 Professional includes a Ubuntu-like bash shells
- See https://docs.microsoft.com/en-us/windows/wsl/install-win10
- We recommend the Ubuntu Linux distribution, but any Linux distribution will have an SSH client
- Command Prompt and PowerShell programs have ssh and scp (may require latest Windows updates)
- Windows 10+
Use ssh (secure shell) to login to a remote computers.
Code Block | ||||
---|---|---|---|---|
| ||||
# General form: ssh <user_name>@<full_host_name> # For example ssh abattenh@ls6.tacc.utexas.edu |
...
When you type something in at a bash command-line prompt, it Reads the input, Evaluates it, then Prints the results, then does this over and over in a Loop. This behavior is called a REPL – a Read, Eval, Print Loop. The shell executes the command line input when it sees a linefeed, which happens when you press Enter after entering the command.
The input to the bash REPL is a command, which consists of:
- The command name (any of the built-in Linux/Unix commands, or the name of a user-written script or program)
- One or more (optional) options, usually noted with a leading dash (-) or double-dash (--).
- short (1-character) options can be provided separately, prefixed by a single dash (-)
- or can be combined with the combination prefixed by a single dash
- long (multi-character or "word") options are prefixed with a double dash (--) and must be supplied separately.
- Both long and short options can be assigned a value
- short (1-character) options can be provided separately, prefixed by a single dash (-)
- One or more command-line arguments, which are often (but not always) file names
...
- The arguments to ls are one or more file or directory names. If no arguments are provided, the contents of the current directory are listed.
- If an argument is a directory name, the contents of that directory are listed.
- If an argument is a directory name, the contents of that directory are listed.
Some handy options for ls:
- -l shows a long listing, including file permissions, ownership, size and last modified date.
- -a shows all files, including dot files whose names start with a period ( . ) which are normally not listed
- -h says to show file sizes in human readable form (e.g. 12M instead of 12201749)
A good place to start learning built-in Linux commands and their options is on the Some Linux commands page.
Getting help
How do you find out what options and arguments a command uses?
- In the Terminal, type in the command name then the --help long option (e.g. ls --help)
- Works for most Linux commands; 3rd party tools may use -h or -? or even /? instead
- May produce a lot of output, so you may need to scroll up quite a bit, or pipe the output to a pager
- e.g. ls --help | more (a space advances the output by one screen/"page", and typing Ctrl-C will exit more)
- Use the built-in manual system (e.g. type man ls)
- This system uses the less pager described below
- For now, just know that a space advances the output by one screen/"page", and typing q will exit the display.
- Ask the Google, e.g. search for ls man page
- Can be easier to read
Literal characters and metacharacters
In the bash shell, and in most tools and programming environment, there are two kinds of input:
...
- e.g. alphanumeric characters A-Z, a-z, 0-9
...
- e.g. the pound sign ( # ) comment character that tells the shell to ignore everything after the #
There are many metacharacters in bash: # \ $ | ~ [ ] to name a few.
Pay attention to the different metacharacters and their usages – which can depend on the context where they're used.
About command line input
You know the command line is ready for input when you see the command line prompt. The shell executes command line input when it sees a linefeed character (\n, also called a newline), which happens when you press Enter after entering the command.
Expand | ||
---|---|---|
| ||
Note: The Unix linefeed (\n) line delimiter is different from Windows, where the default line ending is carriage-return + linefeed (\r\n), and some Mac text editors that just use a carriage return (\r). |
More than one command can be entered on a single line – just separate the commands with a semi-colon ( ; ).
Code Block | ||||
---|---|---|---|---|
| ||||
cd; ls -lh |
A single command can also be split across multiple lines by adding a backslash ( \ ) at the end of the line you want to continue, before pressing Enter.
Code Block | ||||
---|---|---|---|---|
| ||||
ls6:~$ ls ~/.bashrc \
> ~/.profile |
Notice that the shell indicates that it is not done with command-line input by displaying a greater than sign ( > ). You just enter more text then Enter when done.
Tip | ||
---|---|---|
| ||
At any time during command input, whether on the 1st command line prompt or at a > continuation, you can press Ctrl-c (Control key and the c key at the same time) to get back to the command prompt. |
Text lines and the Terminal
Sometimes a line of text is longer than the width of your Terminal. In this case the text is wrapped. It can appear that the output is multiple lines, but it is not. For example, FASTQ files often have long lines:
Code Block | ||
---|---|---|
| ||
head $CORENGS/misc/small.fq |
Note that most Terminals let you increase/decrease the width/height of the Terminal window. But there will always be single lines too long for your Terminal width (and too many lines of text for its height).
So how long is a line? So how many lines of output are there really? And how long is a line? The wc (word count) command can tell us this.
- wc -l reports the number of lines in its input
- wc -c reports the number of characters in its input (including invisible linefeed characters)
And when you give wc -l multiple files, it reports the line count of each, then a total.
Code Block | ||
---|---|---|
| ||
wc -l $CORENGS/misc/small.fq # Reports the number of lines in the small.fq file
cat $CORENGS/misc/small.fq | wc -l # Reports the number of lines on its standard input
wc -l $CORENGS/misc/*.fq # Reports the number of lines in all matching *.fq files
tail -1 $CORENGS/misc/small.fq | wc -c # Reports the number of characters of the last small.fq line |
Command input errors
You don't always type in commands, options and arguments correctly – you can misspell a command name, forget to type a space, specify an unsupported option or a non-existent file, or make all kinds of other mistakes.
What happens? The shell attempts to guess what kind of error it is and reports an appropriate error message as best it can. Some examples:
Code Block | ||
---|---|---|
| ||
# You type the name of a command that is not installed on your system
ls6:~$ lz
Command 'lz' not found, but can be installed with:
apt install mtools
Please ask your administrator.
# You enter something that is close to an existing, or known, command
ls6:~$ catt
Command 'catt' not found, did you mean:
command 'cat' from deb coreutils (8.30-3ubuntu2)
command 'catty' from deb node-catty (0.0.8-1)
command 'ratt' from deb ratt (0.0~git20180127.c44413c-2)
Try: apt install <deb name>
# You try to use an unsupported option
ls6:~$ ls -z
ls: invalid option -- 'z'
Try 'ls --help' for more information.
# You specify the name of a file that does not exist
ls6:~$ ls xxx
ls: cannot access 'xxx': No such file or directory |
Advanced commands
cut, sort, uniq
...
- use -d <delim> to change the field delimiter (Tab by default)
...
- by default sorts each line lexically (as strings), low to high
- use -n sort numerically (-n)
- use -V for Version sort (numbers with surrounding text)
- use -r to reverse the sort order
- use one or more -k <start_field_number>,<end_field_number> options to specify a range of "keys" (fields) to sort on
- e.g. -k1,1 -k2,2nr to sort field 1 lexically and field 2 as a number high-to-low
- by default, fields are delimited by whitespace -- one or more spaces or Tabs
- use -t <delim> to change the field delimiter (e.g. -t "\t" for Tab only; ignore spaces)
...
- use cut | sort | uniq -c for a quick-and-dirty histogram
awk
awk is a powerful scripting language that is easily invoked from the command line. Its field-oriented capabilities make it the go-to tool for manipulating table-like delimited lines of text.
- awk '<script>' - the '<script>' is applied to each line of input (generally piped in)
- always enclose '<script>' in single quotes to inhibit shell evaluation, because awk has its own set of metacharacters that are different from the shell's
Example that prints the average of its input numbers (echo -e converts backslash escape characters like newline \n to the ASCII newline character so that the numbers appear on separate lines)
Code Block |
---|
echo -e "1\n2\n3\n4\n5" | awk '
BEGIN{sum=0; ct=0}
{ sum = sum + $1
ct = ct + 1 }
END{print sum/ct}' |
General structure of an awk script:
...
- one or more spaces or Tabs
...
- use $1, $2, etc. to pick out specific input fields of each line
- e.g. {sum = sum + $4} adds field 4 of the input to the variable sum
- the built-in variable NF is the number of fields in the current line
- the built-in variable NR is the record (line) number of the current line
...
- e.g. END {print sum,ct} prints the final value of the sum and ct variables, separated by the output field separator.
Here is an excellent awk tutorial, very detailed and in-depth
cut versus awk
The basic functions of cut and awk are similar – both are field oriented. Here are the main differences:
- Default field separators
- Tab is the default field separator for cut
- whitespace (one or more spaces or Tabs) is the default field separator for awk
- Re-ordering
- cut cannot re-order fields
- awk can re-order fields, based on the order you specify
- awk is a full-featured programming language while cut is just a single-purpose utility.
grep and regular expressions
- grep -P '<pattern>' searches for <pattern> in its input, and only outputs lines containing it
- always enclose '<pattern>' in single quotes to inhibit shell evaluation!
- pattern-matching metacharacters in grep are very different from those in the shell
- -P says to use Perl patterns, which are much more powerful (and standard) than default grep patterns
- -v (inverse match) – only print lines with no match
- -n (line number) – prefix output with the line number of the match
- -i (case insensitive) – ignore case when matching
- -l says return only the names of files that do contain the pattern match
- -L says return only the names of files that do not contain the pattern match
- -c says just return a count of line matches
- -A <n> (After) and -B <n> (Before) – output <n> number of lines after or before a match
- always enclose '<pattern>' in single quotes to inhibit shell evaluation!
A regular expression (regex) is a pattern of literal characters to search for and metacharacters that control and modify how matching is done.
A regex <pattern> can contain special match metacharacters and modifiers. The ones below are Perl metacharacters, which are the "gold standard", supported by most languages (e.g. grep -P)
...
- this is called a character class.
- use [^xyz123] to match any single character not listed in the class
...
- note that parentheses ( ) may also be used to capture matched sub-expressions for later use
Regular expression modules are available in nearly every programming language (Perl, Python, Java, PHP, awk, even R)
- each "flavor" is slightly different
- even bash has multiple regex commands: grep, egrep, fgrep.
There are many good online regular expression tutorials, but be sure to pick one tailored to the language you will use.
- here are some good ones:
- a good general one: https://www.regular-expressions.info/
- Ryan's tutorials on Regular Expressions: http://ryanstutorials.net/regular-expressions-tutorial/
- RegexOne: http://regexone.com
- and a perl regex tutorial: http://perldoc.perl.org/perlretut.html
- perl regular expressions are the "gold standard" used in most other languages
perl pattern matching
If grep pattern matching isn't behaving the way I expect, I turn to perl. While Perl, like awk, is a fully functional programming language, Here's how to invoke regex pattern matching from a command line using perl:
perl -n -e 'print $_ if $_=~/<pattern>/'
sed pattern substitution
The sed (string editor) command can be used to edit text using pattern substitution.
sed 's/<search pattern>/<replacement>/'
While sed is very powerful, the regex syntax for its more advanced features is quite different from "standard" grep or perl regular expressions. As a result, I tend to use it only for very simple substitutions, usually as a component of a multi-pipe expression.
perl pattern substitution
If I have a more complicated pattern, or if sed pattern substitution is not working as I expect (which happens frequently!), I again turn to perl. Here's how to invoke perl pattern substitution from a command line:
perl -p -e '~s/<search pattern>/<replacement>/'
Parentheses ( ) around one or more text sections in the <search pattern> will cause matching text to be captured in built-in perl variables $1, $2, etc., following the order of the parenthesized text. The capture variables can then be used in the <replacement>.
Field delimiter summary
Be aware of the default field delimiter for the various bash utilities, and how to change them:
...
whitespace (one or more spaces or Tabs)
Note: some older versions of awk do not treat Tabs as field separators.
...
- In the BEGIN { } block
- FS= (input field separator)
- OFS= (output field separator)
- -F or --field-separator option
...
cat /etc/fstab | grep -v '^#' | awk 'BEGIN{OFS="\t"}{print $2,$1}'
cat /etc/passwd | awk -F ":" '{print $1}'
...
join -t $'\t' -j 2 file1 file12
...
Many 3rd party tools, especially bioinformatics tools, may bundle a number of different functions into one command. For these tools, just typing in the command name then Enter may provide top-level usage information. For example, the bwa tool that aligns sequencing reads to a reference genome:
Code Block | ||
---|---|---|
| ||
bwa |
Produces something like this:
Code Block | ||
---|---|---|
| ||
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.16a-r1181
Contact: Heng Li <lh3@sanger.ac.uk>
Usage: bwa <command> [options]
Command: index index sequences in the FASTA format
mem BWA-MEM algorithm
fastmap identify super-maximal exact matches
pemerge merge overlapping paired ends (EXPERIMENTAL)
aln gapped/ungapped alignment
samse generate alignment (single ended)
sampe generate alignment (paired ended)
bwasw BWA-SW for long queries
shm manage indices in shared memory
fa2pac convert FASTA to PAC format
pac2bwt generate BWT from PAC
pac2bwtgen alternative algorithm for generating BWT
bwtupdate update .bwt to the new format
bwt2sa generate SA from BWT and Occ
Note: To use BWA, you need to first index the genome with `bwa index'.
There are three alignment algorithms in BWA: `mem', `bwasw', and
`aln/samse/sampe'. If you are not sure which to use, try `bwa mem'
first. Please `man ./bwa.1' for the manual. |
bwa, like many bioinformatics programs, is written as a set of sub-commands. This top-level help displays the sub-commands available. You then type bwa <command> to see help for the sub-command:
Code Block | ||
---|---|---|
| ||
bwa index |
Displays something like this:
Code Block | ||
---|---|---|
| ||
Usage: bwa index [options] <in.fasta>
Options: -a STR BWT construction algorithm: bwtsw or is [auto]
-p STR prefix of the index [same as fasta name]
-b INT block size for the bwtsw algorithm (effective with -a bwtsw) [10000000]
-6 index files named as <in.fasta>.64.* instead of <in.fasta>.*
Warning: `-a bwtsw' does not work for short genomes, while `-a is' and |
Of course Google works on 3rd party tools also (e.g. search for bwa manual)
Literal characters and metacharacters
In the bash shell, and in most tools and programming environment, there are two kinds of input:
- literal characters, that just represent (and print as) themselves
- e.g. alphanumeric characters A-Z, a-z, 0-9
- metacharacters- these are special characters that are associated with an operation in the environment
- e.g. the pound sign ( # ) comment character that tells the shell to ignore everything after the #
- e.g. the pound sign ( # ) comment character that tells the shell to ignore everything after the #
There are many metacharacters in bash: # \ $ | ~ [ ] to name a few.
Pay attention to the different metacharacters and their usages – which can depend on the context where they're used.
About command line input
You know the command line is ready for input when you see the command line prompt. The shell executes command line input when it sees a linefeed character (\n, also called a newline), which happens when you press Enter after entering the command.
Expand | ||
---|---|---|
| ||
Note: The Unix linefeed (\n) line delimiter is different from Windows, where the default line ending is carriage-return + linefeed (\r\n), and some Mac text editors that just use a carriage return (\r). |
More than one command can be entered on a single line – just separate the commands with a semi-colon ( ; ).
Code Block | ||||
---|---|---|---|---|
| ||||
cd; ls -lh |
A single command can also be split across multiple lines by adding a backslash ( \ ) at the end of the line you want to continue, before pressing Enter.
Code Block | ||||
---|---|---|---|---|
| ||||
ls6:~$ ls ~/.bashrc \
> ~/.profile |
Notice that the shell indicates that it is not done with command-line input by displaying a greater than sign ( > ). You just enter more text then Enter when done.
Tip | ||
---|---|---|
| ||
At any time during command input, whether on the 1st command line prompt or at a > continuation, you can press Ctrl-c (Control key and the c key at the same time) to get back to the command prompt. |
Text lines and the Terminal
Sometimes a line of text is longer than the width of your Terminal. In this case the text is wrapped. It can appear that the output is multiple lines, but it is not. For example, FASTQ files often have long lines:
Code Block | ||
---|---|---|
| ||
head $CORENGS/misc/small.fq |
Note that most Terminals let you increase/decrease the width/height of the Terminal window. But there will always be single lines too long for your Terminal width (and too many lines of text for its height).
So how long is a line? So how many lines of output are there really? And how long is a line? The wc (word count) command can tell us this.
- wc -l reports the number of lines in its input
- wc -c reports the number of characters in its input (including invisible linefeed characters)
And when you give wc -l multiple files, it reports the line count of each, then a total.
Code Block | ||
---|---|---|
| ||
wc -l $CORENGS/misc/small.fq # Reports the number of lines in the small.fq file
cat $CORENGS/misc/small.fq | wc -l # Reports the number of lines on its standard input
wc -l $CORENGS/misc/*.fq # Reports the number of lines in all matching *.fq files
tail -1 $CORENGS/misc/small.fq | wc -c # Reports the number of characters of the last small.fq line |
Command input errors
You don't always type in commands, options and arguments correctly – you can misspell a command name, forget to type a space, specify an unsupported option or a non-existent file, or make all kinds of other mistakes.
What happens? The shell attempts to guess what kind of error it is and reports an appropriate error message as best it can. Some examples:
Code Block | ||
---|---|---|
| ||
# You type the name of a command that is not installed on your system
ls6:~$ lz
Command 'lz' not found, but can be installed with:
apt install mtools
Please ask your administrator.
# You enter something that is close to an existing, or known, command
ls6:~$ catt
Command 'catt' not found, did you mean:
command 'cat' from deb coreutils (8.30-3ubuntu2)
command 'catty' from deb node-catty (0.0.8-1)
command 'ratt' from deb ratt (0.0~git20180127.c44413c-2)
Try: apt install <deb name>
# You try to use an unsupported option
ls6:~$ ls -z
ls: invalid option -- 'z'
Try 'ls --help' for more information.
# You specify the name of a file that does not exist
ls6:~$ ls xxx
ls: cannot access 'xxx': No such file or directory |
Getting around in the shell
...