Overview
About Anna
Anna is a member of the Center for Biomedical Research Support (CBRS), which is home to a number of core facilities: the Bioinformatics Consulting Group (BCG), Genome Sequencing and Analysis Facility (GSAF), Proteomics/Mass Spec, Microscopy, Mouse Genetic Engineering, and others.
- Anna Battenhouse, Associate Research Scientist, abattenhouse@utexas.edu
- BA English literature, 1978, Carleton College
- Commercial software development 1982 – 2007
- Texas Instruments, Motorola ...
- lots of software development experience but limited Unix/Linux
- Joined Vishwanath Iyer Lab 2007 (functional genomics)
- “retirement career”
- began to appreciate Linux & bash (slowly)
- BS Biochemistry, 2013, UT
- Current affiliations:
- Manager, Biomedical Research Computing Facility (BRCF)
- member, Bioinformatics Consulting Group (BCG)
- member, Marcotte lab (systems biology/proteomics)
The Biomedical Research Computing Facility (BRCF) and the Bioinformatics Consulting Group (BCG) are CBRS core facilities that support local research computing.
Note that Anna is not a Unix guru – there's a world of things she doesn't know! But she knows enough to be considered expert-ish
About you
Who has had command-line experience before? (E.g. Linux, Unix/Mac Unix, DOS)
What programming language experience do you have? (e.g. Python, R, Java)
About Unix/Linux
Here are two (similar) documents describing the history of Unix/Linux.
Unix has been around a long time (~1969), before computers had screens or hard drives – magnetic tape and paper tape were used instead. It is written in the C programming language, which was considered a high-level language at the time – at least compared to assembly language. These days C is considered a low-level language .
The original, fundamental "Zen" of Unix is that everything is a file – devices, printers, terminals, and of course actual files.
The GNU project ("Gnu is Not Unix") started by Richard Stallman in 1983, provided a lot of open-source tools and utilities to Unix. Popularity of Unix increased dramatically with this development. At this point in time, some Unix flavors were proprietary; others were open or partially open.
Linux, a "Unix-like" system, was developed in 1991 by Linus Torvalds using GNU utilities. Linux is entirely open-source so has become the "Unix of choice" for scientific computing. There are, however, commercial distributions of Linux (e.g. RedHat), but they have restrictions on what open-source tools can be included/used.
There are many flavors/distributions of Unix, and many flavors/distributions of Linux. Severak clusters at TACC (the Texas Advanced Computing Center) use CentOS, an open-source version of RedHat. Our servers run Ubuntu, from the Debian family of distros. Distributions may differ in a number of system management processes, but generally offer a similar set of utilities.
Why use command-line Linux
The other fundamental philosophy of Unix/Linux is to provide many small tools and utilities that can easily interact with each other using their built-in Input and Output streams, which we'll learn more about soon.
- The upside of this is that it affords tremendous flexibility, letting you perform complex data manipulations on the command line without writing a formal program or script.
- The downside is that there are a lot of tools to learn, each with many options/switches.
Why not use a program with a GUI (Graphical User Interface) instead?
- Using the command line lets you be very precise and flexible at the same time
- After the initial learning curve (which is non-trivial!!) the command line can be easier – and faster!
- And you can easily test your command line "mini-scripts" as you develop them
- You can easily work on remote computers from your laptop, as we'll do in this class
Challenges involved:
- Lots of tools to learn, each with many options
- Lots of odd syntax to learn – one misplaced character can lead to errors or other problems
- Although there's a ton of information about Linux on the Internet, it's hard to get started with the basics
This class aims to get you started addressing these challenges.
Setup to follow hands-on
This class is designed to be hands-on, to provide you with the enjoyment ( ) of working in the Linux command line environment.
However, all steps and scripts are detailed on this Wiki, and you will see me exercising processes on the command line interactively. So you may decide (at any time) to just watch and listen.
Note that you will have access to this Wiki even after the class, and I will email you a link to a video recording of the course.
If you choose to follow hands-on, you'll be using the BRCF "GSAF pod", a set of 3 compute servers attached to a large, shared storage server.
Accounts and servers
We have set up 99 "student" accounts, named student01 , student02 ... student99 . We'll assign one to you. The password for these accounts is given in the chat.
These credentials are active for the next few weeks, but will be de-activated on Sunday May 12, 2024, in the evening.
With your studentNN account you can ssh into one of the following servers:
- gsafcomp01.ccbb.utexas.edu – odd number studentNN accounts
- gsafcomp02.ccbb.utexas.edu – even number studentNN accounts
UT VPN
For those associated with UT Austin and are not on the UT campus network, you'll need to have the UT VPN service active in order to connect to these servers via SSH. See How to Connect to the UT VPN
Folks not associated with UT Austin, or do not have the UT VPN service installed, can use an alternate web-based method, described below.
Logging in via SSH
You can access the servers using ssh in a Terminal program that runs on your computer. On Macs, this program is called Terminal. On Windows (Windows 10 or later) it is called Command Prompt or PowerShell. Find and open this program now on your computer.
ssh is an executable program that runs on your local computer and allows you to connect securely to a remote computer. We're going to use ssh to access one of the above compute servers on the GSAF pod:
# From the UT campus network, or if you have the UT VPN active: ssh student02@gsafcomp02.ccbb.utexas.edu
- Answer yes to the SSH security question prompt
- this will only be asked the 1st time you login
- Enter the class password at the password prompt, then press Enter.
- for security reasons, the text that you enter will not be displayed
Once you've successfully logged in, you can logout by just typing exit then Enter. You'll then be back in your local computer's Terminal environment.
If your Windows version does not support ssh, you can download PuTTY from: https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html
If your Terminal has a dark background, the default shell colors can be hard to read. Execute this line to display directory names in yellow.
export LS_COLORS=$LS_COLORS:'di=01;33:'
We'll see later how to set this environment variable in your login script (~/.profile) so that it gets executed every time you login to this server.
For now, just copy the appropriate line above, paste it into your Terminal window (after logging on), then press Enter.
Logging in via RStudio web
If you're attending remotely or do not have access to the UT VPN, you can use the Terminal functionality in the RStudio web application.
To access the Terminal built into RStudio Server.
- Click on one of these links:
- Select the RStudio link
- Enter your studentNN account name and our password, then click the Sign In button
- In the RStudio web application, select the Terminal tab above the type-in area
You should now see a command line in the RStudio type-in area.
In the RStudio Terminal if the default color for directories is difficult to see against its white background, execute this line to display directory names in blue.
export LS_COLORS=$LS_COLORS:'di=01;34'
We'll see later how to set this environment variable in your login script (~/.profile) so that it gets executed every time you login to this server.
For now, just copy the appropriate line above, paste it into your Terminal window (after logging on), then press Enter.