This page should serve as a reference for the many "things Linux" we use in this course. It is by no means complete – Linux is **huge** – but offers introductions to many important topics.
...
- Macs and Linux have a Terminal program built-in
- Windows options:
- Windows 10+
- Command Prompt and PowerShell programs have ssh and scp (may require latest Windows updates)
- Start menu → Search for Command
- Putty – http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
- simple Terminal and file copy programs
- download either the Putty installer or just putty.exe (Terminal) and pscp.exe (secure copy client)
- Windows Subsystem for Linux – Windows 10 Professional includes a Ubuntu-like bash shells
- See https://docs.microsoft.com/en-us/windows/wsl/install-win10
- We recommend the Ubuntu Linux distribution, but any Linux distribution will have an SSH client
- Command Prompt and PowerShell programs have ssh and scp (may require latest Windows updates)
- Windows 10+
Use ssh (secure shell) to login to a remote computers.
Code Block | ||||
---|---|---|---|---|
| ||||
# General form: ssh <user_name>@<full_host_name> # For example ssh abattenh@ls6.tacc.utexas.edu |
...
Code Block | ||
---|---|---|
| ||
today=$( date ); echo $today # environment variable "today" is assigned today's date today="Today is: `date`"; echo $today # "today" is assigned a string including today's date |
Bash control flow
the bash for loop
As in many programming languages, a for loop performs a series of expressions on one or more item in the for's argument list.
The bash for loop has the general structure:
for <variable_name> in <list of space-separated items>
do <something>
<somthing else>
done
The <items> should be (or evaluate to) for's argument list: a space-separated list of items (e.g. 1 2 3 4 or `ls -1 *.gz` ).
Code Block | ||||
---|---|---|---|---|
| ||||
for num in `seq 4`
do
echo $num
done
# or, since bash lets you put multiple commands on one line
# if they are each separated by a semicolon ( ; )
for num in `seq 4`; do echo $num; done |
Gory details:
- The `seq 4` expression uses backtick evaluation to generate a set of 4 numbers: 1 2 3 4.
- The do/done block expressions are executed once for each of the items in the list
- Each time through the loop (the do/done block) the variable named num is assigned one of the values in the list
- Then the value can be used by referencing the variable using $num
- The variable name num is arbitrary – it can be any name we choose
processing multiple files in a for loop
One common use of for loops is to process multiple files, where the set of files to process is obtained by pathname wildcarding. For example, the code below counts the number of reads in a set of compressed FASTQ files:
Code Block | ||||
---|---|---|---|---|
| ||||
for fname in *.gz; do
echo "$fname has $((`zcat $fname | wc -l` / 4)) sequences"
done |
quotes matter
We saw how double quotes allow the shell to evaluate certain metacharacters in the quoted text.
But more importantly when assigning multiple lines of text to a variable, quoting the evaluated variable preserves any special characters in the variable value's text such as Tab or newline characters.
Consider this case where a captured string contains newlines, as illustrated below.
Code Block | ||
---|---|---|
| ||
txt=$( echo -e "aa\nbb\ncc" )
echo "$txt" # inside double quotes, newlines preserved
echo $txt # without double quotes, newlines are converted to spaces |
This difference is very important!
...
See the difference:
Code Block | ||
---|---|---|
| ||
nums=$( seq 5 )
echo $nums
echo "$nums"
echo $nums| wc -l # newlines converted to spaces, so only one line
echo "$nums" | wc -l # newlines preserved, so reports 5
# This loop prints a line for each of the files
for n in $nums; do
echo "the number is: '$n'"
done
# But this loop prints only one line
for n in "$nums"; do
echo "the number is: '$n'"
done |
the if statement
The general form of an if/then/else statement in bash is:
if [ <test expression> ]
then <expression> [ expression... ]
else <expression> [ expression... ]
fi
Where
- The <test expression> is any expression that evaluates to true or false
- In the shell, the number 0 (or an empty value) is false
- Anything else is true
- There must be at least one space around the <test expression> separating it from the enclosing bracket [ ].
- Double brackets [[ ]] can also be used to enclose the <test expression>
- When the <test expression> is true the then expressions are evaluated.
- When the <test expression> is false the else expressions are evaluated.
A simple example:
Code Block | ||
---|---|---|
| ||
for val in 5 0 "27" "$emptyvar" abc '0'; do
if [ "$val" ]
then echo "Value '$val' is true"
else echo "Value '$val' is false"
fi
done |
A good reference on the many built-in bash conditionals: https://www.gnu.org/software/bash/manual/html_node/Bash-Conditional-Expressions.html
reading file lines with while
The read function can be used to read input one line at a time, in a bash while loop.
While the full details of the read commad are complicated (see https://unix.stackexchange.com/questions/209123/understanding-ifs-read-r-line) this read-a-line-at-a-time idiom works nicely.
Code Block | ||
---|---|---|
| ||
while IFS= read line; do
echo "Line: '$line'"
done < ~/.bashrc
|
- The IFS= clears all of read's default input field separators, which is normally whitespace (one or more spaces or Tabs).
- This is needed so that read will set the line variable to exactly the contents of the input line, and not strip leading whitespace from it.
- The lines are redirected from ~/.bashrc to the standard input of the while loop by the < ~/.bashrc expression after the done keyword.
If the input data is well structured, its fields can be read directly into variables. Notice we can pipe all the output to more – or could redirect it to a file.
Code Block | ||
---|---|---|
| ||
tail /etc/passwd | while IFS=':' read account x uid gid name shell
do
echo $account $name
done | more
|
Writing multiple text lines
There are several ways to output multi-line text. You can:
- Start the text with a single quote or a double quote
- press Enter when you want to start a new line
- keep entering text and Enter until you're satisfied
- supply the matching single quote or a double quote then Enter
example:
Code Block |
---|
echo 'My
name is
Anna' |
- Use echo -e
- The -e option tells echo to replace some special backslash escapes characters that represent non-printable characters with their associated ASCII codes
- So \n will be replaced by a newline (linefeed) character and \t will be replaced by a Tab.
example:
- The -e option tells echo to replace some special backslash escapes characters that represent non-printable characters with their associated ASCII codes
Code Block |
---|
echo -e "My\nname is\nAnna" |
heredoc
Another method for writing multi-line text that can be useful for composing a large block of text in a script, is the heredoc syntax, where a block of text is specified between two user-supplied block delimiters, and that text block is sent to a command. The general form of a heredoc is:
Code Block | ||
---|---|---|
| ||
COMMAND << DELIMITER
..text...
..text...
DELIMITER |
Tip |
---|
The 2nd (ending) block delimiter you specify for a heredoc must appear at the start of a new line. |
For example, using the (arbitrary) delimiter EOF and the cat command:
Code Block | ||
---|---|---|
| ||
cat << EOF
This text will be output
And this USER environment variable will be evaluated: $USER
EOF |
Here the block of text provided to cat is just displayed on the Terminal. To write it to a file just use the 1> or > redirection syntax after the block delimiter you name:
Code Block | ||
---|---|---|
| ||
cat << EOF 1> out.txt
This text will be output
And this USER environment variable will be evaluated: $USER
EOF |
The out.txt file will then contain this text:
Code Block | ||
---|---|---|
| ||
This text will be output
And this USER environment variable will be evaluated: student01 |
Writing multiple text lines
There are several ways to output multi-line text. You can:
- Start the text with a single quote or a double quote
- press Enter when you want to start a new line
- keep entering text and Enter until you're satisfied
- supply the matching single quote or a double quote then Enter
example:
Code Block |
---|
echo 'My
name is
Anna' |
- Use echo -e
- The -e option tells echo to replace some special backslash escapes characters that represent non-printable characters with their associated ASCII codes
- So \n will be replaced by a newline (linefeed) character and \t will be replaced by a Tab.
example:
- The -e option tells echo to replace some special backslash escapes characters that represent non-printable characters with their associated ASCII codes
Code Block |
---|
echo -e "My\nname is\nAnna" |
heredoc
Another method for writing multi-line text that can be useful for composing a large block of text in a script, is the heredoc syntax, where a block of text is specified between two user-supplied block delimiters, and that text block is sent to a command. The general form of a heredoc is:
Code Block | ||
---|---|---|
| ||
COMMAND << DELIMITER
..text...
..text...
DELIMITER |
Tip |
---|
The 2nd (ending) block delimiter you specify for a heredoc must appear at the start of a new line. |
For example, using the (arbitrary) delimiter EOF and the cat command:
Code Block | ||
---|---|---|
| ||
cat << EOF
This text will be output
And this USER environment variable will be evaluated: $USER
EOF |
Here the block of text provided to cat is just displayed on the Terminal. To write it to a file just use the 1> or > redirection syntax after the block delimiter you name:
Code Block | ||
---|---|---|
| ||
cat << EOF 1> out.txt
This text will be output
And this USER environment variable will be evaluated: $USER
EOF |
The out.txt file will then contain this text:
Code Block | ||
---|---|---|
| ||
This text will be output
And this USER environment variable will be evaluated: student01
|
Bash control flow
the bash for loop
As in many programming languages, a for loop performs a series of expressions on one or more item in the for's argument list.
The bash for loop has the general structure:
for <variable_name> in <list of space-separated items>
do <something>
<somthing else>
done
The <items> should be (or evaluate to) for's argument list: a space-separated list of items (e.g. 1 2 3 4 or `ls -1 *.gz` ).
Code Block | ||||
---|---|---|---|---|
| ||||
for num in `seq 4`
do
echo $num
done
# or, since bash lets you put multiple commands on one line
# if they are each separated by a semicolon ( ; )
for num in `seq 4`; do echo $num; done |
Gory details:
- The `seq 4` expression uses backtick evaluation to generate a set of 4 numbers: 1 2 3 4.
- The do/done block expressions are executed once for each of the items in the list
- Each time through the loop (the do/done block) the variable named num is assigned one of the values in the list
- Then the value can be used by referencing the variable using $num
- The variable name num is arbitrary – it can be any name we choose
processing multiple files in a for loop
One common use of for loops is to process multiple files, where the set of files to process is obtained by pathname wildcarding. For example, the code below counts the number of reads in a set of compressed FASTQ files:
Code Block | ||||
---|---|---|---|---|
| ||||
for fname in *.gz; do
echo "$fname has $((`zcat $fname | wc -l` / 4)) sequences"
done |
quotes matter
We saw how double quotes allow the shell to evaluate certain metacharacters in the quoted text.
But more importantly when assigning multiple lines of text to a variable, quoting the evaluated variable preserves any special characters in the variable value's text such as Tab or newline characters.
Consider this case where a captured string contains newlines, as illustrated below.
Code Block | ||
---|---|---|
| ||
txt=$( echo -e "aa\nbb\ncc" )
echo "$txt" # inside double quotes, newlines preserved
echo $txt # without double quotes, newlines are converted to spaces |
This difference is very important!
- you do want to preserve newlineswhen processing one line of text at a time
- you do not want to preserve newlineswhen specifying the list of values a for loop processes (which must all be on one line)
See the difference:
Code Block | ||
---|---|---|
| ||
nums=$( seq 5 )
echo $nums
echo "$nums"
echo $nums| wc -l # newlines converted to spaces, so only one line
echo "$nums" | wc -l # newlines preserved, so reports 5
# This loop prints a line for each of the files
for n in $nums; do
echo "the number is: '$n'"
done
# But this loop prints only one line
for n in "$nums"; do
echo "the number is: '$n'"
done |
the if statement
The general form of an if/then/else statement in bash is:
if [ <test expression> ]
then <expression> [ expression... ]
else <expression> [ expression... ]
fi
Where
- The <test expression> is any expression that evaluates to true or false
- In the shell, the number 0 (or an empty value) is false
- Anything else is true
- There must be at least one space around the <test expression> separating it from the enclosing bracket [ ].
- Double brackets [[ ]] can also be used to enclose the <test expression>
- When the <test expression> is true the then expressions are evaluated.
- When the <test expression> is false the else expressions are evaluated.
A simple example:
Code Block | ||
---|---|---|
| ||
for val in 5 0 "27" "$emptyvar" abc '0'; do
if [ "$val" ]
then echo "Value '$val' is true"
else echo "Value '$val' is false"
fi
done |
A good reference on the many built-in bash conditionals: https://www.gnu.org/software/bash/manual/html_node/Bash-Conditional-Expressions.html
reading file lines with while
The read function can be used to read input one line at a time, in a bash while loop.
While the full details of the read commad are complicated (see https://unix.stackexchange.com/questions/209123/understanding-ifs-read-r-line) this read-a-line-at-a-time idiom works nicely.
Code Block | ||
---|---|---|
| ||
while IFS= read line; do
echo "Line: '$line'"
done < ~/.bashrc
|
- The IFS= clears all of read's default input field separators, which is normally whitespace (one or more spaces or Tabs).
- This is needed so that read will set the line variable to exactly the contents of the input line, and not strip leading whitespace from it.
- The lines are redirected from ~/.bashrc to the standard input of the while loop by the < ~/.bashrc expression after the done keyword.
If the input data is well structured, its fields can be read directly into variables. Notice we can pipe all the output to more – or could redirect it to a file.
Code Block | ||
---|---|---|
| ||
tail /etc/passwd | while IFS=':' read account x uid gid name shell
do
echo $account $name
done | more
|
Copying files between TACC and your laptop
Anchor | ||||
---|---|---|---|---|
|
...