Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page should serve as a reference for the many "things Linux" we use in this course. It is by no means complete – Linux is **huge** – but offers introductions to many important topics.

...

  • Macs and Linux have a Terminal program built-in
  • Windows options:

Use ssh (secure shell) to login to a remote computers.

Code Block
languagebash
titleSSH to a remote computer
# General form:
ssh <user_name>@<full_host_name>

# For example
ssh abattenh@ls6.tacc.utexas.edu

...

Code Block
languagebash
today=$( date );          echo $today  # environment variable "today" is assigned today's date
today="Today is: `date`"; echo $today  # "today" is assigned a string including today's date

Bash control flow

the bash for loop

As in many programming languages, a for loop performs a series of expressions on one or more item in the for's argument list.

The bash for loop has the general structure:

for <variable_name> in <list of space-separated items>
do <something>
  
<somthing else>
done

The <items> should be (or evaluate to) for's argument list: a space-separated list of items (e.g. 1 2 3 4 or `ls -1 *.gz` ).

Code Block
languagebash
titlefor loop example
for num in `seq 4`
do 
  echo $num
done

# or, since bash lets you put multiple commands on one line 
# if they are each separated by a semicolon ( ; )
for num in `seq 4`; do echo $num; done

Gory details:

  • The `seq 4` expression uses backtick evaluation to generate a set of 4 numbers: 1 2 3 4.
  • The do/done block expressions are executed once for each of the items in the list
  • Each time through the loop (the do/done block) the variable named num is assigned one of the values in the list
    • Then the value can be used by referencing the variable using $num
    • The variable name num is arbitrary – it can be any name we choose

processing multiple files in a for loop

One common use of for loops is to process multiple files, where the set of files to process is obtained by pathname wildcarding. For example, the code below counts the number of reads in a set of compressed FASTQ files:

Code Block
languagebash
titleFor loop to count sequences in multiple FASTQs
for fname in *.gz; do
   echo "$fname has $((`zcat $fname | wc -l` / 4)) sequences"
done

quotes matter

We saw how double quotes allow the shell to evaluate certain metacharacters in the quoted text.

But more importantly when assigning multiple lines of text to a variable, quoting the evaluated variable preserves any special characters in the variable value's text such as Tab or newline characters.

Consider this case where a captured string contains newlines, as illustrated below.

Code Block
languagebash
txt=$( echo -e "aa\nbb\ncc" )
echo "$txt"   # inside double quotes, newlines preserved
echo $txt     # without double quotes, newlines are converted to spaces

This difference is very important!

...

See the difference:

Code Block
languagebash
nums=$( seq 5 )
echo $nums
echo "$nums"

echo $nums| wc -l     # newlines converted to spaces, so only one line
echo "$nums" | wc -l  # newlines preserved, so reports 5

# This loop prints a line for each of the files
for n in $nums; do
  echo "the number is: '$n'"
done

# But this loop prints only one line
for n in "$nums"; do
  echo "the number is: '$n'"
done

the if statement

The general form of an if/then/else statement in bash is:

if [ <test expression> ]
then <expression> [ expression... ]
else <expression> [ expression... ]
fi

Where

  • The <test expression> is any expression that evaluates to true or false
    • In the shell, the number 0 (or an empty value) is false
    • Anything else is true
    • There must be at least one space around the <test expression> separating it from the enclosing bracket [ ].
    • Double brackets [[  ]] can also be used to enclose the <test expression>
  • When the <test expression> is true the then expressions are evaluated.
  • When the <test expression> is false the else expressions are evaluated.

A simple example:

Code Block
languagebash
for val in 5 0 "27" "$emptyvar" abc '0'; do
  if [ "$val" ]
    then echo "Value '$val' is true"
    else echo "Value '$val' is false"
  fi
done

A good reference on the many built-in bash conditionals: https://www.gnu.org/software/bash/manual/html_node/Bash-Conditional-Expressions.html

reading file lines with while

The read function can be used to read input one line at a time, in a bash while loop.

While the full details of the read commad are complicated (see https://unix.stackexchange.com/questions/209123/understanding-ifs-read-r-line) this read-a-line-at-a-time idiom works nicely. 

Code Block
languagebash
while IFS= read line; do
  echo "Line: '$line'"
done < ~/.bashrc
  • The IFS= clears all of read's default input field separators, which is normally whitespace (one or more spaces or Tabs).
    • This is needed so that read will set the line variable to exactly the contents of the input line, and not strip leading whitespace from it.
  • The lines are redirected from ~/.bashrc to the standard input of the while loop by the < ~/.bashrc expression after the done keyword.

If the input data is well structured, its fields can be read directly into variables. Notice we can pipe all the output to more – or could redirect it to a file.

Code Block
languagebash
tail /etc/passwd | while IFS=':' read account x uid gid name shell
do 
  echo $account $name
done | more

Writing multiple text lines

There are several ways to output multi-line text. You can:

  • Start the text with a single quote or a double quote
    • press Enter when you want to start a new line
    • keep entering text and Enter until you're satisfied
    • supply the matching single quote or a double quote then Enter
    • example:

Code Block
echo 'My
name is
Anna'
  • Use echo -e
    • The -e option tells echo to replace some special backslash escapes characters that represent non-printable characters with their associated ASCII codes
      • So \n will be replaced by a newline (linefeed) character and \t will be replaced by a Tab.
    • example:

Code Block
echo -e "My\nname is\nAnna"

heredoc

Another method for writing multi-line text that can be useful for composing a large block of text in a script, is the heredoc syntax, where a block of text is specified between two user-supplied block delimiters, and that text block is sent to a command. The general form of a heredoc is:

Code Block
languagebash
COMMAND << DELIMITER
..text...
..text...
DELIMITER
Tip

The 2nd (ending) block delimiter you specify for a heredoc must appear at the start of a new line.

For example, using the (arbitrary) delimiter EOF and the cat command:

Code Block
languagebash
cat << EOF
This text will be output
And this USER environment variable will be evaluated: $USER
EOF

Here the block of text provided to cat is just displayed on the Terminal. To write it to a file just use the 1> or > redirection syntax after the block delimiter you name:

Code Block
languagebash
cat << EOF 1> out.txt
This text will be output
And this USER environment variable will be evaluated: $USER
EOF

The out.txt file will then contain this text:

Code Block
languagebash
This text will be output
And this USER environment variable will be evaluated: student01

Writing multiple text lines

There are several ways to output multi-line text. You can:

  • Start the text with a single quote or a double quote
    • press Enter when you want to start a new line
    • keep entering text and Enter until you're satisfied
    • supply the matching single quote or a double quote then Enter
    • example:


Code Block
echo 'My
name is
Anna'
  • Use echo -e
    • The -e option tells echo to replace some special backslash escapes characters that represent non-printable characters with their associated ASCII codes
      • So \n will be replaced by a newline (linefeed) character and \t will be replaced by a Tab.
    • example:


Code Block
echo -e "My\nname is\nAnna"

heredoc

Another method for writing multi-line text that can be useful for composing a large block of text in a script, is the heredoc syntax, where a block of text is specified between two user-supplied block delimiters, and that text block is sent to a command. The general form of a heredoc is:

Code Block
languagebash
COMMAND << DELIMITER
..text...
..text...
DELIMITER


Tip

The 2nd (ending) block delimiter you specify for a heredoc must appear at the start of a new line.

For example, using the (arbitrary) delimiter EOF and the cat command:

Code Block
languagebash
cat << EOF
This text will be output
And this USER environment variable will be evaluated: $USER
EOF

Here the block of text provided to cat is just displayed on the Terminal. To write it to a file just use the 1> or > redirection syntax after the block delimiter you name:

Code Block
languagebash
cat << EOF 1> out.txt
This text will be output
And this USER environment variable will be evaluated: $USER
EOF

The out.txt file will then contain this text:

Code Block
languagebash
This text will be output
And this USER environment variable will be evaluated: student01

Bash control flow

the bash for loop

As in many programming languages, a for loop performs a series of expressions on one or more item in the for's argument list.

The bash for loop has the general structure:

for <variable_name> in <list of space-separated items>
do <something>
  
<somthing else>
done

The <items> should be (or evaluate to) for's argument list: a space-separated list of items (e.g. 1 2 3 4 or `ls -1 *.gz` ).

Code Block
languagebash
titlefor loop example
for num in `seq 4`
do 
  echo $num
done

# or, since bash lets you put multiple commands on one line 
# if they are each separated by a semicolon ( ; )
for num in `seq 4`; do echo $num; done

Gory details:

  • The `seq 4` expression uses backtick evaluation to generate a set of 4 numbers: 1 2 3 4.
  • The do/done block expressions are executed once for each of the items in the list
  • Each time through the loop (the do/done block) the variable named num is assigned one of the values in the list
    • Then the value can be used by referencing the variable using $num
    • The variable name num is arbitrary – it can be any name we choose

processing multiple files in a for loop

One common use of for loops is to process multiple files, where the set of files to process is obtained by pathname wildcarding. For example, the code below counts the number of reads in a set of compressed FASTQ files:

Code Block
languagebash
titleFor loop to count sequences in multiple FASTQs
for fname in *.gz; do
   echo "$fname has $((`zcat $fname | wc -l` / 4)) sequences"
done

quotes matter

We saw how double quotes allow the shell to evaluate certain metacharacters in the quoted text.

But more importantly when assigning multiple lines of text to a variable, quoting the evaluated variable preserves any special characters in the variable value's text such as Tab or newline characters.

Consider this case where a captured string contains newlines, as illustrated below.

Code Block
languagebash
txt=$( echo -e "aa\nbb\ncc" )
echo "$txt"   # inside double quotes, newlines preserved
echo $txt     # without double quotes, newlines are converted to spaces

This difference is very important!

  • you do want to preserve newlineswhen processing one line of text at a time
  • you do not want to preserve newlineswhen specifying the list of values a for loop processes (which must all be on one line)

See the difference:

Code Block
languagebash
nums=$( seq 5 )
echo $nums
echo "$nums"

echo $nums| wc -l     # newlines converted to spaces, so only one line
echo "$nums" | wc -l  # newlines preserved, so reports 5

# This loop prints a line for each of the files
for n in $nums; do
  echo "the number is: '$n'"
done

# But this loop prints only one line
for n in "$nums"; do
  echo "the number is: '$n'"
done

the if statement

The general form of an if/then/else statement in bash is:

if [ <test expression> ]
then <expression> [ expression... ]
else <expression> [ expression... ]
fi

Where

  • The <test expression> is any expression that evaluates to true or false
    • In the shell, the number 0 (or an empty value) is false
    • Anything else is true
    • There must be at least one space around the <test expression> separating it from the enclosing bracket [ ].
    • Double brackets [[  ]] can also be used to enclose the <test expression>
  • When the <test expression> is true the then expressions are evaluated.
  • When the <test expression> is false the else expressions are evaluated.

A simple example:

Code Block
languagebash
for val in 5 0 "27" "$emptyvar" abc '0'; do
  if [ "$val" ]
    then echo "Value '$val' is true"
    else echo "Value '$val' is false"
  fi
done

A good reference on the many built-in bash conditionals: https://www.gnu.org/software/bash/manual/html_node/Bash-Conditional-Expressions.html

reading file lines with while

The read function can be used to read input one line at a time, in a bash while loop.

While the full details of the read commad are complicated (see https://unix.stackexchange.com/questions/209123/understanding-ifs-read-r-line) this read-a-line-at-a-time idiom works nicely. 

Code Block
languagebash
while IFS= read line; do
  echo "Line: '$line'"
done < ~/.bashrc
  • The IFS= clears all of read's default input field separators, which is normally whitespace (one or more spaces or Tabs).
    • This is needed so that read will set the line variable to exactly the contents of the input line, and not strip leading whitespace from it.
  • The lines are redirected from ~/.bashrc to the standard input of the while loop by the < ~/.bashrc expression after the done keyword.

If the input data is well structured, its fields can be read directly into variables. Notice we can pipe all the output to more – or could redirect it to a file.

Code Block
languagebash
tail /etc/passwd | while IFS=':' read account x uid gid name shell
do 
  echo $account $name
done | more

Copying files between TACC and your laptop
Anchor
Copying_files_to_from_TACC
Copying_files_to_from_TACC

...