This page should serve as a reference for the many "things Linux" we use in this course. It is by no means complete – Linux is **huge** – but offers introductions to many important topics.
...
- Macs and Linux have a Terminal program built-in
- Windows options:
- Windows 10+
- Command Prompt and PowerShell programs have ssh and scp (may require latest Windows updates)
- Start menu → Search for Command
- Putty – http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
- simple Terminal and file copy programs
- download either the Putty installer or just putty.exe (Terminal) and pscp.exe (secure copy client)
- Windows Subsystem for Linux – Windows 10 Professional includes a Ubuntu-like bash shells
- See https://docs.microsoft.com/en-us/windows/wsl/install-win10
- We recommend the Ubuntu Linux distribution, but any Linux distribution will have an SSH client
- Command Prompt and PowerShell programs have ssh and scp (may require latest Windows updates)
- Windows 10+
Use ssh (secure shell) to login to a remote computers.
Code Block | ||||
---|---|---|---|---|
| ||||
# General form: ssh <user_name>@<full_host_name> # For example ssh abattenh@ls6.tacc.utexas.edu |
...
Sometimes a line of text is longer than the width of your Terminal. In this case the text is wrapped. It can appear that the output is multiple lines, but it is not. For example, FASTQ files often have long lines:
...
Code Block | ||
---|---|---|
| ||
wc -l $CORENGS/misc/small.fq # Reports the number of lines in the small.fq file cat $CORENGS/misc/small.fq | wc -l # Reports the number of lines ofon its standard input wc -l $CORENGS/misc/*.fq # Reports the number of lines in all matching *.fq files tail -1 $CORENGS/misc/small.fq | wc -c # Reports the number of characters of the last small.fq line |
...
Tip | ||
---|---|---|
| ||
Most built-in Linux commands that obtain data from a file can also accept the data piped in on their standard input. |
And here's a Linux commands cheat sheet you may find useful:
...
- cat outputs all the contents of its input (one or more files and/or standard input) or the specified file
- cat -n prefixes each line of output with its line number
- CAUTION – only use on small files!
- zcat <file.gz> like cat, but understands the gzip (.gz) format, and decompresses the data before writing it to standard output
- CAUTION – only use on small files!
- Another CAUTION – does not understand .zip or .bz2 compression formats
- more and less pagers
- both display their (possibly very long) input one Terminal "page" at a time
- in more:
- use spacebar to advance a page
- use q or Ctrl-c to exit more
- in less:
- q – quit
- Ctrl-f or space – page forward
- Ctrl-b – page backward
- /<pattern> – search for <pattern> in forward direction
- n – next match
- N – previous match
- ?<pattern> – search for <pattern> in backward direction
- n – previous match going back
- N – next match going forward
- use less -N to display line numbers
- can be used directly on .gz format files
- head and tail
- show you the first or last 10 lines (by default) of their input
- head -n 20 or just head -20 shows the first 20 lines
- tail -n 2 or just tail -2 shows the last 2 lines
- tail -n +100 shows lines starting at line 100
- tail -n +100 | head -20 shows 20 lines starting at line 100
- tail -f shows the last lines of a file, then follows the output as more lines are written (Ctrl-c to quit)
- gunzip -c <file.gz> | more (or less) – like zcat, un-compresses lines of <file.gz> and outputs them to standard output
- <file.gz> is not altered on disk
- always pipe the output to a pager!
File system navigation
- ls - list the contents of the specified directory
- -l says produce a long listing (including file permissions, sizes, owner and group)
- -a says show all files, even normally-hidden dot files whose names start with a period ( . )
- -h says to show file sizes in human readable form (e.g. 12M instead of 12201749)
- -t says to sort files on last modification time
- -r says to reverse the current sort order
- -d says to show directory listing information only, instead of directory contents
- usually combined with -l, e.g.: ls -ld <dirname>
- cd <whereto> - change the current working directory to <whereto>. Some special <wheretos>:
- .. (period, period) means "up one directory level"
- ~ (tilde) means "my home Home directory"
- - (dash) means "the last directory I was in"
- find <in_directory> [ operators ] -name <expression> [ tests ]
- looks for files matching <expression> in <in_directory> and its sub-directories
- <expression> can be a double-quoted string including pathname wildcards (e.g. "[a-g]*.txt")
- there are tons of operators and tests:
- -type f (file) and -type d (directory) are useful tests
- -maxdepth NNis a useful operator to limit the depth of recursion.
- file <file> tells you what kind of file <file> is
- df shows you the top level directory structure of the system you're working on, along with how much disk space is available
- -h says to show sizes in human readable form (e.g. 12G instead of 12318201749)
- pwd - display the present working directory
- -P says to display the full absolute path, resolving any symbolic links or relative path syntax
- -P says to display the full absolute path, resolving any symbolic links or relative path syntax
- tree <directory> - shows the file system hierarchy of the specified directory
- tree is not always available on all Linux systems
...
- touch <file> – create an empty file, or update the modification timestamp on an existing file
- mkdir -p <dirname> – create directory <dirname>.
- -p says to create any needed sub-directories also
- mv <file1> <file2> – renames <file1> to <file2>
- mv <file1> <file2> ... <fileN> <to_dir>/ – moves files <file1> <file2> ... <fileN> into directory <to_dir>
- mv -t <dir> <file1> <file2> ... <fileN> – same as above, but specifies the target directory as an option (-t <to_dir>)
- ln -s <path> creates a symbolic (-s) link (a.k.a symlink) to <path> in the current directory
- default link name corresponds to the last name component in <path>
- always change into (cd) the directory where you want the link before executing ln -s
- a symbolic link can be deleted without affecting the linked-to file
- ln -sf -t <target_dir> <file1> <file2> ... <fileN> – creates symbolic links to <file1> <file2> ... <fileN> in target directory <target_dir>
- rm <file> deletes a file. This is permanent - not a "trash can" deletion.
- rm -rf <dirname> deletes an entire directory – be careful!
...
- echo <text> prints the specified text on standard output
- evaluation of metacharacters (special characters) inside the text may be performed first
- -e says to enable interpretation of backslash escapes such as \t (tabTab) and \n newline
- -n says not to output the trailing newline
- evaluation of metacharacters (special characters) inside the text may be performed first
- wc -l reports the number of lines (-l) in its input
- wc -c reports the number of characters (-c) in its input
- wc -w reports the number of words (-w) in its input
- wc -c reports the number of characters (-c) in its input
- history lists your command history to the terminal
- redirect to a file to save a history of the commands executed in a shell session
- pipe to grep to search for a particular command
- which <pgm> searches all $PATH directories to find the program/command <pgm> and reports its full pathname
- du <file_or_directory..filedirectory><file_or_directory>..
- shows the disk usage (size) of the specified files/directories
- -h says report the size in human-readable form (e.g. 12M instead of 12201749)
- -s says summarize the directory size for directories
- -c says print a grand total when multiple items are specified
- groups - lists the Unix groups you belong to
Advanced commands
...
- cut command lets you isolate ranges of data from its input lines
- cut -f <field_number(s)> extracts one or more fields (-f) from each line of its input
- use -d <delim> to change the field delimiter (Tab by default)
- cut -c <character_number(s)> extracts one or more characters (-c) from each line of input
- the <numbers> can be
- a comma-separated list of numbers (e.g. 1,4,7)
- a hyphen-separated range (e.g. 2-5)
- a trailing hyphensays "and all items after that" (e.g. 3,7-)
- cut does not re-order fields, so cut -f 5,3,1 acts like -f 1,3,5
- cut -f <field_number(s)> extracts one or more fields (-f) from each line of its input
- sort sorts its input lines using an efficient algorithm
- by default sorts each line lexically (as strings), low to high
- use -n sort numerically (-n)
- use -V for Version sort (numbers with surrounding text)
- use -r to reverse the sort order
- use one or more -k <start_field_number>,<end_field_number> options to specify a range of "keys" (fields) to sort on
- e.g. -k1k1,1 -k2,2nr2nr to sort field 1 lexically and field 2 as a number high-to-low
- by default, fields are delimited by whitespace -- one or more spaces or Tabs
- use -t <delim> to change the field delimiter (e.g. -t "\t" for Tab only; ignore spaces)
- by default sorts each line lexically (as strings), low to high
- uniq -c counts groupings of its input (which must be sorted) and reports the text and count for each group
- use cut | sort | uniq -c for a quick-and-dirty histogram
...
- awk is a powerful scripting language that is easily invoked from the command line
- awk '<script>' - the '<script>' is applied to each line of input (generally piped in)
- always enclose '<script>' in single quotes to inhibit shell evaluation, because awk has its own set of metacharacters that are different from the shell's
- awk '<script>' - the '<script>' is applied to each line of input (generally piped in)
Example that prints the average of its input numbers (echo -e converts backslash escape characters like newline \n to the ASCII newline character so that the numbers appear on separate lines)
Code Block |
---|
echo -e "1\n2\n3\n4\n5" | awk '
BEGIN{sum=0; ct=0}
{ sum = sum + $1
ct = ct + 1 }
END{print sum/ct}' |
General structure of an awk script:
- BEGIN {<expressions>} – use to initialize variables before any script body lines are executed
- e.g. BEGIN {FS=":"; OFS="\t"; sum=0; ct=0}
- says use colon ( : ) as the input field separator (FS), and Tab ( \t ) as the output field separator (OFS)
- the default input field separator (FS) is whitespace
- one or more spaces or Tabs
- the default output field separator (OFS) is a single space
- the default input field separator (FS) is whitespace
- initializes the variable variables sum and ct to 0
- says use colon ( : ) as the input field separator (FS), and Tab ( \t ) as the output field separator (OFS)
- e.g. BEGIN {FS=":"; OFS="\t"; sum=0; ct=0}
- {<body expressions>} – expressions to apply to each line of input
- use $1, $2, etc. to pick out specific input fields of each line
- e.g. {print $3,sum = sum + $4} outputs fields 3 and adds field 4 of the input , separated by the output field separator.to the variable sum
- the built-in variable NF is the number of fields in the current line
- the built-in variable NR is the record (line) number of the current line
- use $1, $2, etc. to pick out specific input fields of each line
- END {<expressions>} – executed after all input is complete
- e.g. END {print sum,ct} prints the final value of the sum and ct variables, separated by the output field separator.
- e.g. END {print sum,ct} prints the final value of the sum and ct variables, separated by the output field separator.
Here is an excellent awk tutorial, very detailed and in-depth
...