What this section covers
- The basic structure of a command-processing script
- Defining and evaluating bash variables
- Grouping and evaluating values using double quotes ( " " ) or single quotes ( ' ' )
- Functions in bash, and their local variables
- Passing arguments to scripts and functions
- Subtleties of quoting in bash
Command-processing scripts
While bash scripts can be written to perform a single function, another style is to have a script perform multiple functions, where the functionality to be performed is specified by the first command argument to the script. This is the script style we'll explore here.
my_script.sh helloWorld # invoke the "helloWorld" functionality of my_script.sh my_script.sh goodbyeWorld # invoke the "goodbyeWorld" functionality of my_script.sh
There are many examples of command-processing scripts in bioinformatics: bwa, samtools, bedtools to name but a very few.
The step_01.sh Script
Here's a basic command-processing script with one sub-command. This script can be found in your home directory in ~/workshop/step_01.sh. Make sure it is executable (chmod +x ~/workshop/step_01.sh
).
#!/bin/bash # Script version global variable. Edit this whenever changes are made. __ADVANCED_BASH_VERSION__="step_01" # function that says "Hello World!" and displays user-specified text. function helloWorld() { local txt1=$1 local txt2=$2 shift; shift local rest=$@ echo "Hello World!" echo " text 1: '$txt1'" echo " text 2: '$txt2'" echo " rest: '$rest'" } # ======================================================================= # Main script command-line processing # ======================================================================= function usage() { echo "advanced_bash.sh, version $__ADVANCED_BASH_VERSION__" echo "" echo "Usage: advanced_bash.sh <command> [arg1 arg2...]" echo "" echo "Commands:" echo "helloWorld [text to display]" echo "" } CMD=$1 # initially $1 will be the command shift # after "shift", $1 will be the 2nd command-line argument; $2 the 3rd, etc. # and $@ will be arguments 2, 3, etc. case "$CMD" in helloWorld) helloWorld "$@" ;; *) usage ;; esac
The Parts
Even with only a few lines of code, there's a lot going on in this script. Let's look at it part by part.
The shebang line
The first line (#!/bin/bash
) is called the shebang – #! characters followed by the full path to the program which should execute the script, if it is invoked without an execution context (and if it has execute file permissions of course ).
# Call a script directly. # As long as it marked as executable (chmod +x), the shell # peeks at the shebang line and passes the script to that program. ~/workshop/step_01.sh # Call a script specifying the executing program. The shebang line will be ignored. bash ~/workshop/step_01.sh
Global script version variable
Our script defines one global variable visible to all code in the script. Variables (a.k.a. environment variables) in bash are set using just the variable name, followed by the equals sign "=" and the value, with no spaces in between. Variable names can contain alphanumeric characters and underscores ("_"). They cannot contain hyphens ("-"), periods (".") or any special characters.
# Script version global variable. Edit this whenever changes are made. __ADVANCED_BASH_VERSION__="step_01" # Later, the value of this environment variable is referenced # by prefixing the name with the dollar sign ( $ ) echo "Current script version is: $__ADVANCED_BASH_VERSION__" # or enclosed in ${ } echo "Current script version is: ${__ADVANCED_BASH_VERSION__}"
The variable's value is referenced by prefixing the variable name with the dollar sign $, or by enclosing it in braces prefixed by the dollar sign ${}. Usually these two forms are equivalent, except:
- when referencing a positional argument variable with more than one digit (e.g. ${10})
- to separate the variable evaluation from text immediately following (e.g. ${prefix}_file.txt)
- since underscore characters ( _ ) are allowed in variable names, the braces are needed so that the shell does not think the variable name is prefix_file.
Example:
myvar="some_text" echo $myvar echo ${myvar} echo $myvar_more_text # no output because the variable myvar_more_text is not defined echo ${myvar}_more_text
When defining or evaluating environment variables there's also a difference between enclosing the value in double quotes ( "$foo" ) or single quotes ( '$foo' ) – see Intro Unix: Quoting in the shell.
Example:
myvar="some text" echo "$myvar" echo '$myvar'
Functions
A bash function looks like this, with or without the function keyword.
function my_function() { # code goes here } | my_function() { # code goes here } |
Function and script arguments
Just like the script as a whole, arguments to a bash function are positional, and are referenced using:
- individual positional variables $1 $2 ... $9 ${10} ${11}...
- or $@ to refer to all of the arguments
- and $0 to refer to the script name itself
Note that while a function can have many arguments, the function definition never contains anything in its ( ) "formal argument list". Weird, eh?
And in bash, arguments passed to both scripts and functions are not enclosed in parentheses, as is the case in many programming languages.
Example:
function myfn() { echo "arg 1: $1"; echo "arg 2: $2"; echo "all args: $@"; } myfn foo bar baz
In our script, the shift keyword "pops" the first element off the argument list.
Since there is no formal argument list, it is good practice to copy function arguments into local variables with names suggesting their role. (e.g. local txt1=$1). The local keyword specifies that the variable scope is only within the function body – it is not visible to the caller or to called functions.
# Function that says "Hello World!" and displays user-specified text. function helloWorld() { local txt1=$1 local txt2=$2 shift; shift local rest=$@ echo "Hello World!" echo " text 1: '$txt1'" echo " text 2: '$txt2'" echo " rest: '$rest'" }
Function output
Unlike most other programming languages, bash functions and scripts can only return a single integer between 0 and 255. By convention a return value of 0 means success, and any other return value is an error code.
Because function return values are so limited, function calls are often made to display "results" on standard output, as in our helloWorld example function. We'll see later how this output can be captured and used instead of an explicit return value.
Handling command line arguments
The lines at the end of our script is the only directly executable (top-level) code – everything else is definitions (function or variable definitions).
This directly-executable code runs whenever the script is called. It's purpose is to determine what command the user wants to execute, then invoke that functionality with appropriate arguments.
CMD=$1 # initially $1 will be the command shift # after "shift", $1 will be the 2nd command-line argument; $2 the 3rd, etc. # and $@ will be arguments 2, 3, etc. case "$CMD" in helloWorld) helloWorld "$@" ;; *) usage ;; esac
The 1st (sub-command name) argument is captured in the CMD variable. Calling shift then removes that 1st argument, so that $@ now contains everything after the sub-command name. So script arguments 2, 3, etc., will be positional arguments 1, 2, etc., to whatever function is called.
Dispatching to the appropriate function is done using a case/esac block. The case argument string ("$CMD") is matched against each clause using the clause text before the right parenthesis ")". The double-semicolon ";;" terminates each case clause, including the default case "*".
Here we have only two cases to match: helloWorld and * (anything else). For helloWorld, the helloWorld function is called with all remaining command line arguments ("$@"). Otherwise, we call our usage function which displays some helpful usage information:
function usage() { echo "advanced_bash.sh, version $__ADVANCED_BASH_VERSION__" echo "" echo "Usage: advanced_bash.sh <command> [arg1 arg2...]" echo "" echo "Commands:" echo "helloWorld [text to display]" echo "" }
As we extend our command processing script, we'll add clauses to the case/esac block and add a short usage description to the usage function.
Calling a function or a script
Functions and scripts are called without parentheses around their arguments. Instead, each argument is separated by whitespace (one or more space characters).
# Call a custom script passing 3 arguments my_script.sh arg1 arg2 arg3 # Inside a script, call a function passing 3 arguments my_function arg1 arg2 arg3
Importantly, if an argument to be passed itself contains spaces, the argument must be enclosed in single or double quotes (or, more ugly, the spaces can be backspace-quoted, e.g. "\ ")
# Call a custom script passing 2 arguments my_script.sh arg1 "arg2 has embedded spaces" # Call a function passing 2 arguments my_function arg1 'arg2 has embedded spaces'
Example:
function myfn() { echo "arg 1: $1"; echo "arg 2: $2"; echo "all args: $@"; } myfn foo bar baz wonk myfn foo "bar baz" wonk myfn "foo bar" baz wonk
As described in Intro Unix: Quoting in the shell, this is the main function of both single and double quotes – to group text containing whitespace into one item.
Running the step_01.sh script
Let's run the step_01.sh script a couple of different ways.
show usage
First, with no arguments (or a non-matching argument), which shows us the usage message.
# Call the step_01 helloWorld command with no arguments to display usage ~/workshop/step_01.sh # Call with a non-matching argument also displays usage ~/workshop/step_01.sh xxx
The output should look like this in either case:
advanced_bash.sh, version step_01 Usage: advanced_bash.sh <command> [arg1 arg2...] Commands: helloWorld [text to display]
call helloWorld
Calling the helloWorld command with no further arguments just shows some empty fields because no arguments were provided:
# Call the step_01 helloWorld command with no other arguments ~/workshop/step_01.sh helloWorld
Output:
Hello World! text 1: '' text 2: '' rest: ''
Now call the helloWorld command with 4 further arguments.
# Call the step_01 helloWorld command with 4 arguments ~/workshop/step_01.sh helloWorld My name is Anna
The output should look like this.
Hello World! text 1: 'My' text 2: 'name' rest: 'is Anna'
What's happening, step by step
Script arguments will initially be:
- $1 - helloWorld
- $2 - My
- $3 - name
- $4 - is
- $5 - Anna
- $@ - helloWorld My name is Anna
Command line processing captures the 1st script argument then pops it off with shift.
CMD=$1 # initially $1 will be the command shift # after "shift", $1 will be the 2nd command line argument; $2 the 3rd, etc.
After shift is called, script arguments are:
- $1 - My
- $2 - name
- $3 - is
- $4 - Anna
- $@ - My name is Anna
The CMD variable matches the helloWorld) clause of the case/esac statement, so arguments are passed to the helloWorld function in the "$@" built-in variable.
case "$CMD" in helloWorld) helloWorld "$@" ;; *) usage ;; esac
The helloWorld function captures the first two arguments into local variables txt1 and txt2, then shifts them off, storing the remaining function arguments ($@) in the rest variable. It then echos the variable values, surrounded by single quotes:
function helloWorld() { local txt1=$1 local txt2=$2 shift; shift local rest=$@ echo "Hello World!" echo " text 1: '$txt1'" echo " text 2: '$txt2'" echo " rest: '$rest'" }
Initially helloWorld arguments are:
- $1 - My → stored in local variable txt1
- $2 - name → stored in local variable txt2
- $3 - is
- $4 - Anna
- $@ - My name is Anna
After shift; shift is called, helloWorld arguments are:
- My → now stored in local variable txt1
- name → now stored in local variable txt2
- $1 - is
- $2 - Anna
- $@ - is Anna → stored in local variable rest
excercise 1
How would you call the helloWorld command to produce this output?
Hello World! text 1: 'My' text 2: 'name is Anna' rest: ''
excercise 2
How would you call the helloWorld command to produce this output?
Hello World! text 1: '' text 2: 'My name is' rest: 'Anna B'
Quoting subtleties
We;ve already touched on difference kinds of Quoting in the shell. But there is an additional subtlety when handling script arguments.
Specifically: quoting a positional argument preserves argument quoting/grouping by the caller.
So there's a difference between these two ways of calling the helloWorld function inside our script:
# Wihout quotes, all argument grouping by the script caller is lost helloWorld $@ # With quotes, argument quoting by the script caller is preserved helloWorld "$@"
This is because enclosing a variable in double-quotes "" preserves any special formatting internal to the variable's value.
Tip
It is a good idea to double-quote bash positional argument variables in order to preserve the caller's quoting.
Corollary: If some argument isn't coming through as you're expecting, it's probably a quoting issue!
excercise 3
How would you call the helloWorld command to produce output similar to this, where the 2nd line of text is the contents of your PATH environment variable?
Hello World! text 1: 'My PATH is:' text 2: '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' rest: ''
excercise 4
How would you call the helloWorld command to produce this output?
Hello World! text 1: 'to evaluate a variable, use the dollar sign, e.g. $foo' text 2: 'use backslash (\) to escape special characters' rest: ''