Page Comparison

One of the steepest Unix/Linux learning curves is the sheer number of built-in commands, all of which have many options (-- most of which you'll never use), and the more . Plus there are a number of advanced commands that are extremely powerful but also extremely complex.

To help address this, this page introduces a number of built-in Linux utilities along with some of their common options, by category.

...

cat outputs all the contents of its input (one or more files and/or standard input) or the specified file
- cat -n prefixes each line of output with its line number
- CAUTION – only use on small files!
zcat <file.gz> like cat, but understands the gzip (.gz) format, and decompresses the data before writing it to standard output
- CAUTION – only use on small files!
- Another CAUTION – does not understand .zip or .bz2 compression formats
more and less pagers
- both display their (possibly very long) input one Terminal "page" at a time
- in more:
  - use spacebar to advance a page
  - use q or Ctrl-c to exit more
- in less:
  - q – quit
  - Ctrl-f or space – page forward
  - Ctrl-b – page backward
  - /<pattern> – search for <pattern> in forward direction
    - n – next match
    - N – previous match
  - ?<pattern> – search for <pattern> in backward direction
    - n – previous match going back
    - N – next match going forward
- use less -N to display line numbers Numbers
- use less -I to use case Insensitive pattern searches
- less can be used directly on .gz format files
head and tail
- show you the first or last 10 lines (by default) of their input
- head -n 20 or just head -20 shows the first 20 lines
- tail -n 2 or just tail -2 shows the last 2 lines
- tail -n +100 or just tail +100 shows lines starting at line 100
- tail -n +100 | head -20 shows 20 lines starting at line 100
- tail -f shows the last lines of a file, then follows the output as more lines are written (Ctrl-c to quit)
gunzip -c <file.gz> | more (or less) – like zcat, un-compresses lines of <file.gz> and outputs them to standard output
- <file.gz> is not altered on disk
- always pipe the output to a pager!

...

ls - list the contents of the specified directory
- -l says produce a long listing (including file permissions, sizes, owner and group)
- -a says show all files, even normally-hidden dot files whose names start with a period ( . )
- -h says to show file sizes in human readable form (e.g. 12M instead of 12201749)
- -t says to sort files on last modification time
- -r says to reverse the current sort order
- -d says to show directory listing information only, instead of directory contents
  - usually combined with -l, e.g.: ls -ld <dirname>
cd <whereto> - change the current working directory to <whereto>. Some special <wheretos>:
- .. (period, period) means "up one level"
- ~ (tilde) means "my home directory"
- - (dash) means "the last directory I was in"
find <in_directory> [ operators ] -name <expression> [ tests ]
- looks for files matching <expression> in <in_directory> and its sub-directories
- <expression> can be a double-quoted string including pathname wildcards (e.g. "[a-g]*.txt")
- there are tons of operators and tests:
  - -type f (file) and -type d (directory) are useful tests
  - -maxdepth NNis a useful operator to limit the depth of recursion.
file <file> tells you what kind of file <file> is
df shows you the top level directory structure of the system you're working on, along with how much disk space is available
- -h says to show sizes in human readable form (e.g. 12G instead of 12318201749)
pwd - display the present working directory
- -P says to display the full absolute path
tree <directory> - shows the file system hierarchy of the specified directory
- tree is not available on all Linux systems

...

touch <file> – create an empty file, or update the modification timestamp on an existing file
mkdir -p <dirname> – create directory <dirname>.
- -p says to create any needed sub-directories also
mv <file1> <file2> – renames <file1> to <file2>
- mv <file1> <file2> ... <fileN> <to_dir>/ – moves files <file1> <file2> ... <fileN> into directory <to_dir>
- mv -t <dir> <file1> <file2> ... <fileN> – same as above, but specifies the target directory as an option (
  - via the -t <to_dir>
  )
  - option
ln -s <path> creates a symbolic (-s) link (symlink) to <path> in the current directory
- default link name corresponds to the a symbolic link can be manipulated as if it is the linked-to file or directory
  - and can be deleted without affecting the linked-to file/directory
- the default link file name corresponds to the last name component in <path>
- always specify the -s option to create a symbolic link
  - without the -s option a difficult-to-manage "hard link" is created
- always change into (cd) the directory where you want the link before executing ln -s
- a symbolic link can be deleted without affecting the linked-to file
- ln -sf -t <target_dir> <file1> <file2> ... <fileN> –
  - creates symbolic links to <file1> <file2> ... <fileN> in target directory <target_dir>
rm <file> deletes a file. This is permanent - not a "trash can" deletion.
- rm -rf <dirname> deletes an entire directory – be careful!

...

cp <source> [<source>...] <destination> copies the file(s) <source> [<source>...] to the directory or file <destination>.
- using . (period) as the destination means "here, with the same name"
- -p option says to preserve file modification timestamps
- cp -r <dirname>/ <destination>/ will recursively copy the directory <dirname>/ and all its contents to the directory <destination>/.
- cp -t <dirname>/ <file> [<file>...] copies one or more specified files to the target directory.
scp <user>@<host>:<remote_source_path> <local_destination_path>
- Works just like cp but copies <remote_source_path> from the remote host machine to the <local_destination_path>
- -p (preserve file times) and -r (recursive) options work the same as cp
- scp <local_source_path>... <user>@<host>:<remote_destination_path> is similar, but copies one or more <local_source_path> to the <remote_destination_path> on the remote host machine.
- A nice scp syntax resource is located here.
wget <url> fetches a file from a valid URL (e.g. http, https, ftp).
- -O <file> specifies the name for the local file (defaults to the last component of the URL)
rsync -arvW <source_directory>/ <target_directory>/
rsync -ptlrvP <source_directory>/ <target_directory>/
- Recursively copies <source_directory> contents to <target_directory>, but only if <source_directory> files are newer or don't yet exist in <target_directory>
- Remote path syntax (<user>@<host>:<absolute_or_home-relative_path>) can be used for either source or target (, but not both).
- Always include a trailing slash ( / ) after the source and target directory names!
- -a means "archive" mode (equivalent to -ptl and some other options)
- -r means recursively copy sub-directoriesdirectories (this is now the default behavior)
- -v means verbose
- -W means Whole file only
  - Normally the rsync algorithm compares the contents of files that need to be copied and only transfers the different portions. This option disables file content comparisons, which are not appropriate for large and/or binary files.
- -p means preserve file permissions
- -t means preserve file times
- -l means copy symbolic links as links (this is the default behavior)
  - vs -L which means dereference the link and copy the file it refers to
  )
- -P means show transfer Progress (useful when large files are being transferred)
- see https://manpages.ubuntu.com/manpages/trusty/man1/rsync.1.html

...

echo <text> prints the specified text on standard output
- evaluation of metacharacters (special characters) inside the text may be performed first
- -e says to enable interpretation of backslash escapes such as \t (tab) and \n newline
- -n says not to output the trailing newline
wc -l reports the number of lines (-l) in its input
- wc -c reports the number of characters (-c) in its input
- wc -w reports the number of words (-w) in its input
history lists your command history to the terminal
- redirect to a file to save a history of the commands executed in a shell session
- pipe to grep to search for a particular command
- re-execute a previous command via !<NN> where <NN> is the history line number
env lists all the environment variables currently defined in your login session
seq N produce N numbers, 1-N, each on a separate line
xargs transfers data on its standard input to the command line of the specified command
- e.g. ls ~/*.txt | xargs echo
which <pgm> searches all $PATH directories to find <pgm> and reports its full pathname
- will report all the places it looked if <pgm> was not found
- type <pgm> is more general and works for functions and aliases
du <file_or_directory..file_or_directory>
- shows the disk usage (size) of the specified files/directories
- -h says report the size in human-readable form (e.g. 12M instead of 12201749)
- -s says summarize the directory size for directories
- -c says print a grand total when multiple items are specified
groups - lists the Unix groups you belong to

...

cut command lets you isolate ranges of data from its input lines
- cut -f <field_number(s)> extracts one or more fields (-f) from each line of its input
  - use -d <delim> to change the field delimiter (Tab by default)
- cut -c <character_number(s)> extracts one or more characters (-c) from each line of input
- the <numbers> can be
  - a comma-separated list of numbers (e.g. 1,4,7)
  - a hyphen-separated range (e.g. 2-5)
  - a trailing hyphen says "and all items after that" (e.g. 3,7-)
- cut does not re-order fields, so cut -f 5,3,1 acts like -f 1,3,5
sort sorts its input lines using an efficient algorithm
- by default sorts each line lexically (as strings), low to high
  - use -n sort numerically (-n)
  - use -V for Version sort (numbers with consistent surrounding text)
  - use -r to reverse the sort order
- use one or more -k <start_field_number>,<end_field_number> options to specify a range of "keys" (fields) to sort on
  - e.g. -k1,1 -2,2nr to sort field 1 lexically and field 2 as a number high-to-low
  - by default, fields are delimited by whitespace -- one or more spaces or Tabs
    - use -t <delim> to change the field delimiter (e.g. -t "\t" for Tab only; ignore spaces)
uniq -c counts groupings of its input (which must should be sorted) and reports the text and count for each group
- use cut | sort | uniq -c for a quick-and-dirty histogram

...

awk is a powerful scripting language that is easily invoked from the command line
- awk '<script>' - the '<script>' is applied to each line of input (generally piped in)
  - always enclose '<script>' in single quotes to inhibit shell evaluation
  - awk has its own set of metacharacters that are different from the shell's
General structure of an awk script:
- BEGIN {<expressions>} – use to initialize variables before any script body lines are executed
  - e.g. BEGIN {FS=":"; OFS="\t"; sum=0}
    - says use colon ( : ) as the input field separator (FS), and Tab ( \t ) as the output field separator (OFS)
      - the default input field separator (FS) is whitespace
        one or more spaces or Tabs
      - the default output field separator (OFS) is a single space
    - initializes the variable sum to 0
- {<body expressions>} – expressions to apply to each line of input
  - use $1, $2, etc. to pick out specific input fields
  - e.g. {print $3,$4} outputs fields 3 and 4 of the input, separated by the output field separator.
  - some special expressions:
    - NF - Number of Fields on the line
    - NR - Number of the Record (i.e., line number)
- END {<expressions>} – executed after all input is complete
  - e.g. END {print sum}

...

The grep command

grep -P '<pattern>' searches for <pattern> in its input, and only outputs lines containing it
- always enclose '<pattern>' in single quotes to inhibit shell evaluation!
  - pattern-matching metacharacters are very different from those in the shell
- -P says to use Perl patterns, which are much more powerful than standard grep patterns
- -v (inverse match) – only print lines with no match
- -n (line number) – prefix output with the line number of the match
- -i (case insensitive) – ignore case when matching
- -l says return only the names of files that do contain the pattern match
- -L says return only the names of files that do not contain the pattern match
- -c says just return a count of line matches
- -A <n> (After) and -B <n> (Before) – output '<n>' number of lines after or before a match

...

A regular expression (regex) is a pattern of characters to search for and metacharacters that control and modify how search matching is done.

<pattern> (a regular expression, or regex) can contain special match metacharacters and modifiers. The ones below are Perl metacharacters, which are supported by most languages (e.g. grep -P)
- ^ – matches beginning of line
- $ – matches end of line
- . – (period) matches any single character
- * – modifier; place after an expression to match 0 or more occurrences
- + – modifier, place after an expression to match 1 or more occurrences
- ? – modifier, place after an expression to match 0 or 1 occurrences
- \s – matches any whitespace character (\S any non-whitespace)
- \d – matches digits 0-9
- \w – matches any word character: A-Z, a-z, 0-9 and _ (underscore)
- \t matches Tab;
- \n matches Linefeed; \r matches Carriage return
- [xyz123] – matches any single character (including special characters) among those listed between the brackets [ ]
  this is called a character class.
  use [^xyz123] to match any single character not listed in the class
- (Xyz|Abc) – matches either Xyz or Abc or any text or expressions inside parentheses separated by | characters
  note that parentheses ( ) may also be used to capture matched sub-expressions for later use
in Perl, where a pattern is delimited by //, modifiers appear after the pattern:
i - perform case-insensitive text matching
g - perform the specified substitution globally on each input record, not just on the 1st match

...

utility	default delimiter	how to change	example
cut	Tab	-d or --delimiter option	`cut -d ':' -f 1 /etc/passwd`
sort	whitespace (one ore more spaces or Tabs)	-t or --field-separator option	`sort -t ':' -k1,1 /etc/passwd`
awk	whitespace (one ore more spaces or Tabs) Note: some older versions of awk do not treat Tabs as field separators.	In the BEGIN { } block FS= (input field separator) OFS= (*output field separator) -F or --field-separator* option	`cat /etc/fstab \| grep -v '^#' \| awk 'BEGIN{OFS="\t"}{print $2,$1}'` `cat /etc/passwd \| awk -F ":" '{print $1}'`
join	one or more spaces	-t option	`join -t $'\t' -j 2 file1 file12`
perl	whitespace (one ore more spaces or Tabs) when auto-splitting input with -a	-F'/<pattern>/' option	`cat /etc/fstab \| grep -v '^#' \| perl -F'/\s+/' -a -n -e 'print "$F[1]\t$F[0]\n";'`
read	whitespace (one or more spaces or Tabs)	IFS= (input field separator) option	Note that a bare IFS= removes any field separator, so whole lines are read each loop iteration.

...

Versions Compared

Old Version 47

New Version Current

Key

The grep command