Part 1: The Bash shell and commands

What's going on in the Terminal

Once you've logged on, you're now at a Linux command line in your Home directory!

It looks as if you're running directly on the remote computer, but really there are two programs communicating:

  1. your local Terminal
  2. the remote Shell

There are many shell programs available in Linux, but the default is bash (Bourne-again shell).

Your Terminal is pretty "dumb" – just sending what you type over its secure sockets layer (SSL) connection to the remote computer, then displaying the text sent back by the remote computer's shell. The real work is being done on the remote computer, by executable programs called by the bash shell (also called commands, since you call them on the command line).

The bash REPL and commands

When you type something in at a bash command-line prompt, it Reads the input, Evaluates it, then Prints the results, then does this over and over in a Loop. This behavior is called a REPL – a Read, Eval, Print Loop.

Many programming language environments have REPLs – Python and R for example. The input to the bash REPL is a command. Here are some examples:

ls               # example 1 - no options or arguments
ls -l            # example 2 - one "short" (single character) option (-l)
ls --help        # example 3 - one "long" (word) option (--help)
ls .profile      # example 4 - one argument, a file name (.profile)
ls --width=20    # example 5 - a long option with a value (--width is the option, 20 is the value)
ls -w 20         # example 6 - a short option w/a value, as above, where -w is the same as --width
ls -l -a -h      # example 7 - three short options entered separately (-l -a -h)
ls -lah          # example 8 - three short options that can be combined after a dash (-lah)

A command consists of:

  • The command name – here ls (list files)
    • A command can be any of the built-in Linux/Unix commands, or the name of a user-written script or program
  • One or more options, usually noted with a leading dash (-) or double-dash (--).
    • -l in example 2 (long listing)
    • --help in example 3
    • Options are optional – they do not have to be supplied (e.g. example 1 above)
  • One or more command-line arguments, which are often (but not always) file names; these are often optional
    • e.g. .profile in example 4

The shell executes the command line input when it sees a linefeed, which happens when you press Enter after entering the command.

Command options

The notes below apply to nearly all built-in Linux utilities, and to many 3rd party programs as well

  • Short (1-character) options can be provided separately, prefixed by a single dash(-)
    • or can be combined with the combination prefixed by a single dash (examples 7, 8)
  • Long (multi-character/"word") options are prefixed with a double dash (--) and must be supplied separately.
  • Many utilities have equivalent long and short options (e.g.  --width and -w above)
  • Both long and short options can be assigned a value (examples 5, 6)
    • The short option and its value are usually separated by a space, but can also be run together (e.g. -w20)
    • Strictly speaking, the long option and its value should be separated by an equal sign (=) according to the POSIX standard (see https://en.wikipedia.org/wiki/POSIX). But many programs let you use a space as separator also.

Some handy options for ls:

  •  -l shows a long listing, including file permissions, ownership, size and last modified date.
  • -a shows all files, including dot files whose names start with a period ( . ) which are normally not listed
  • -h says to show file sizes in human readable form (e.g. 12M instead of 12201749)

The arguments to ls are one or more file/directory names. If no arguments are provided, the contents of the current directory are listed. If an argument is a directory name, the contents of that directory are listed.

Exercise 1-1

What do you see when you just type ls then Enter?

 Answer...

Names of the files and sub-directories in your current Home directory, other than dot files.

data  docs  haiku.txt  jabberwocky.txt  mobydick.txt

What do you see when you enter ls -l?

 Answer...

A long listing files and sub-directories in your current Home directory, other than dot files.

drwxr-x--- 3 student01 CCBB_Workshops_1     7 May  2 15:54 data
drwxr-x--- 2 student01 CCBB_Workshops_1     4 Apr 25 14:34 docs
-rw-r----- 1 student01 CCBB_Workshops_1   218 Sep 22  2015 haiku.txt
-rw-r----- 1 student01 CCBB_Workshops_1   992 Sep 22  2015 jabberwocky.txt
-rw-r----- 1 student01 CCBB_Workshops_1 12319 Apr  9  2014 mobydick.txt

How can you tell which entries are files and which are directories?

 Answer...

Because of the coloring ls applies to its output, directories are a different color (e.g. yellow or blue) than files (white).

Also, in a long listing, the left-most permissions display will start with a "d" for directories (e.g. drwxrwx---)

What is the file size of the file mobydick.txt in bytes? In Kilobytes (1,024s of bytes, denoted K)

 Hint...

ls -l
ls -lh

 Answer...

ls -l shows mobydick.txt file size of 12319 bytes
ls -lh shows mobydick.txt file size as 13K bytes

Getting help

So how do you find out what options and arguments a command uses?

  1. In the Terminal, type in the command name then the --help long option (e.g. ls --help)
    • Works for most Linux commands; 3rd party tools may use -h or -? or even /? instead
    • May produce a lot of output, so you may need to scroll up quite a bit or pipe the output to a pager (e.g. ls --help | more)
  2. Use the built-in manual system (e.g. type man ls)
    • This system uses the less pager that we'll go over later.
    • For now, just know that a space advances the output by one screen/"page", and typing q will exit the display.
  3. Ask the Google, e.g. search for ls man page
    • Can be easier to read

Every Linux command has tons of options, most of which you'll never use. The trick is to start with the most commonly used options and build out from there. Then, if you need a command to do something special, check if there's an option already to do that.

A good place to start learning built-in Linux commands and their options is on our Some Linux commands page.

Many 3rd party tools, especially bioinformatics tools, may bundle a number of different functions into one command. For these tools, just typing in the command name then Enter will provide top-level usage information. For example, the bwa tool that aligns sequencing reads to a reference genome:

Use the program name alone as a command to get help
bwa

Produces something like this:

bwa top-level help information
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.16a-r1181
Contact: Heng Li <lh3@sanger.ac.uk>

Usage:   bwa <command> [options]

Command: index         index sequences in the FASTA format
         mem           BWA-MEM algorithm
         fastmap       identify super-maximal exact matches
         pemerge       merge overlapping paired ends (EXPERIMENTAL)
         aln           gapped/ungapped alignment
         samse         generate alignment (single ended)
         sampe         generate alignment (paired ended)
         bwasw         BWA-SW for long queries

         shm           manage indices in shared memory
         fa2pac        convert FASTA to PAC format
         pac2bwt       generate BWT from PAC
         pac2bwtgen    alternative algorithm for generating BWT
         bwtupdate     update .bwt to the new format
         bwt2sa        generate SA from BWT and Occ

Note: To use BWA, you need to first index the genome with `bwa index'.
      There are three alignment algorithms in BWA: `mem', `bwasw', and
      `aln/samse/sampe'. If you are not sure which to use, try `bwa mem'
      first. Please `man ./bwa.1' for the manual.

bwa, like many bioinformatics programs, is written as a set of sub-commands. This top-level help displays the sub-commands available. You then type bwa <command> to see help for the sub-command:

Get help on bwa index
bwa index

Displays something like this:

bwa top-level help information
Usage:   bwa index [options] <in.fasta>

Options: -a STR    BWT construction algorithm: bwtsw or is [auto]
         -p STR    prefix of the index [same as fasta name]
         -b INT    block size for the bwtsw algorithm (effective with -a bwtsw) [10000000]
         -6        index files named as <in.fasta>.64.* instead of <in.fasta>.*

Warning: `-a bwtsw' does not work for short genomes, while `-a is' and

Of course Google works on 3rd party tools also (e.g. search for bwa manual)

Exercise 1-2

How would you produce a long listing that is sorted by last modification time?

 Hint...

ls --help | more
man ls
Google: ls man page
Some Linux commands: File system navigation
, section on ls.

 Answer...

The -t option produces output sorted by modification time.

ls -l -t
ls -lt

This shows the most recent files first. How would you change the sort order to show the oldest files first?

 Answer...

The -r option reverses the sort order.

ls -l -t -r
ls -ltr

About command line input

You know the command line is ready for input when you see the command line prompt. It can be configured differently on different systems, but on our system it shows your account name, server name, current directory, then a dollar sign ($). Note the tilde character ( ~ ) signifies your Home directory.

student01@gsafcomp01:~$

Like everything in Unix, the command line has similarities to a text file. And in Unix, all text file "lines" are terminated by a linefeed character (\n, also called a newline).

 Line ending differences...

Note: The Unix linefeed (\n) line delimiter is different from Windows, where the default line ending is carriage-return + linefeed (\r\n), and some Mac text editors that just use a carriage return (\r).

As mentioned above, the shell executes command line input when it sees a linefeed, which happens when you press Enter after entering the command.

But you can enter more than one command on a single line – just separate the commands with a semi-colon ( ; ).

Multiple command on a line
ls haiku.txt; ls -lh

You can also split a single command across multiple lines by adding a backslash ( \ ) at the end of the line you want to continue, before pressing Enter.

Split a command across multiple lines
student01@gsafcomp01:~$ ls haiku.txt \
> mobydick.txt

Notice that the shell indicates that it is not done with command-line input by displaying a greater than sign ( > ). You just enter more text then Enter when done.

Use Ctrl-C to exit the current command input

At any time during command input, whether on the 1st command line prompt or at a > continuation, you can press Ctrl-c (Control key and the c key at the same time) to get back to the command prompt.

Literal characters and metacharacters

In the bash shell, and in most tools and programming environment, there are two kinds of input:

  • literal characters, that just represent (and print as) themselves
    • e.g. alphanumeric characters A-Z, a-z, 0-9
  • metacharacters - these are special characters that are associated with an operation in the environment
    • e.g. the Enter key that ends the current line

There are many metacharacters in bash:  # \ $ | ~ [ ]  to name a few.

Notice this list includes the backslash ( \ ) character we just used to continue command-line input on multiple Terminal lines. And it also includes the pound sign ( # ) which acts as a comment character – all input starting with the pound sign ( # ) is ignored.

We'll be pointing out the different metacharacters and their usages – which can depend on the context where they're used – as we learn more about bash.

Backslash for line continuation

When using the backslash ( \ ) to continue a text line, be careful not to enter any text after the backslash – just press Enter after it. Why?

The backslash ( \ ) metacharacter is used to escape the next character, which means to treat it as a literal even if it is a metacharacter.

As a line continuation indicator, the next character should be the linefeed from you pressing Enter. So instead of treating Enter as a metacharacter (ending the line and starting a new one) it treats it as a literal and inserts a literal linefeed character ( \n ) in the text, performing a linefeed operation and giving you the > prompt on a new line.

Command input errors

You don't always type in commands, options and arguments correctly – you can misspell a command name, forget to type a space, specify an unsupported option or a non-existent file, or make all kinds of other mistakes.

What happens? The shell attempts to guess what kind of error it is and reports an appropriate error message as best it can.

Some examples:

# You mis-type a command name, or a command not installed on your system
student01@gsafcomp01:~$ catt
catt: command not found

# You try to use an unsupported option
student01@gsafcomp01:~$ ls -z
ls: invalid option -- 'z'
Try 'ls --help' for more information.

# You specify the name of a file that does not exist
student01@gsafcomp01:~$ ls xxx
ls: cannot access 'xxx': No such file or directory

# You try to access a file or directory you don't have permissions for
student01@gsafcomp01:~$ cat /etc/sudoers
cat: /etc/sudoers: Permission denied

Command line history and editing

Sometimes you want to repeat a command you've entered before, possibly with some changes.

  • The built-in history command lists the commands you've entered, each with a number.
    • You can re-execute any command in the history by typing an exclamation point ( ! ) then the number
    • e.g. !15 re-executes the 15th command in your history.
  • Use Up arrow to retrieve any of the last 50+ commands you've typed, going backwards through your history.
    • You can then edit the retrieved line, and hit Enter (even in the middle of the command), and the shell will use that command.
  • The Down arrow "scrolls" forward from where you are in the command history.

The command line cursor (small thick bar on the command line) marks where you are on the command line.

  • Right arrow and Left arrow move the cursor forward or backward on the current command line.
  • Use Ctrl-a (holding down the Control key and a) to jump the cursor to the beginning of the line.
  • Use Ctrl-e to jump the cursor to the end of the line.
  • Arrow keys are also modified by Ctrl- (Windows) or Option- (Mac)
    • Ctrl-right-arrow (Windows) or Option-right-arrow (Mac) will skip by "word" forward
    • Ctrl-left-arrow (Windows) or Option-left-arrow (Mac) will skip by "word" backward

Once the cursor is positioned where you want it:

  • Just type in any additional text you want
  • To delete text after the cursor, use:
    • Delete key on Windows
    • Function-Delete keys on Macintosh
  • To delete text before the cursor, use:
    • Backspace key on Windows
    • Delete key on Macintosh
  • Use Ctrl-k (kill) to delete everything on the line after the cursor

Tab key completion

Hitting Tab when entering command line text invokes shell completion, instructing the shell to try to guess what you're doing and finish the typing for you. It's almost magic!

On most modern Linux shells you use Tab completion by pressing:

  • single Tab – completes file or directory name up to any ambiguous part
    • if nothing shows up, there is no unambiguous match
  • Tab twice – display all possible completions
    • you then decide where to go next

Let's have some fun with our friend the Tab key. Follow along if you can, as we use the Tab key to see the /stor/work/CBRS_unix/fastq path.

ls /st                     # press Tab key - expands to /stor/ which is the only match
ls /stor/w                 # press Tab key again - expands to /stor/work/, again the only match
ls /stor/work/C            # press Tab once - nothing happens because there are multiple matches
ls /stor/work/C            # press Tab a 2nd time - all matching directories listed
ls /stor/work/CB           # press Tab key - expands to /stor/work/CBRS_unix
ls /stor/work/CBRS_unix    # press Tab twice to see all completions
ls /stor/work/CBRS_unix/f  # press Tab once - expands to /stor/work/CBRS_unix/fastq

Tab key completion also works on commands! Type "bowtie" and Tab twice to see all the programs in the bowtie2 and bowtie tool suites.