Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

This is a work in progress. If you want something added, or explained better please let us know.

Anchor
argument
argument
argument

argument(s) are provided to commands to tell them what to act on. For example, a file or #directory #path could be provided to the ls command to alter its default behavior of listing out the current working #directory.

Anchor
cat
cat
cat

cat displays the contents of files listed its arguments in order even if it scrolls off the screen. This is more useful with standard output rediection; ie,

Code Block
cat a b c > d

concatenates files a, b, c and stores the result in d.

Anchor
cd
cd
cd

cd is the command used to change your current working #directory; eg,

Code Block
cd /share/apps

Anchor
cp
cp
cp

The copy command, cp, is used to copy files from #pathname to another. The basic usage is

Code Block
cp src dest

where dest may be a file or directory in which case the file dest/src is created. More than one src can be provided in which case the destination must be a directory. For example,

Code Block
cp src1 src2 dest

creates the files dest/src1 dest/src2. cp provides some useful #options:

  • -p => newly created files should have the permissions, and timestamps of existing files
  • -r => copies directories

Anchor
diff
diff
diff

Anchor
directory
directory
directory

A directory in UNIX is a file whose contents are the members of the directory. Some inferior Operating Systems refer to these as folders. Some important directories that you may deal with when using CCBB systems are

  • /share/apps => the location of many 3rd party, and non-standard applications, development toolkits, and so on
  • /share/scripts => the location of some job script templates that we provide
  • /home/username => the home directory of a particular user. When typing on a #shell commandline, or in a shell script this may also be referred to using ~ (for your own), ~username, or using the $HOME #shellvariable.

Every #process (including your #shell command line) has a

Anchor
current working directory
current working directory
current working directory. This is the directory that you are currently processing in. Within your current directory there are two more directories that you can access

  • . => the current directory
  • .. => the parent of the current directory

Your current working directory, if you have not paid attention, can be determined with the command: pwd. You can view the contents of a directory using #ls.

Anchor
environment
environment
Environment

Every program runs in environment which is a collection of variables that are inherited from their parent, and which they pass on to their children. They use these variables to modify how they run. One important variable is #PATH which is used to find executable programs, but there are many others. For example, BLAST uses a variable called BLASTDB to find its databases.

Anchor
globs
globs
globs

Globs are shell constructs which are used on the command line to stand for patterns of numbers. They are related to Regular Expressions so care must be made to understand their differences. There are 3 types of globs

  • * which matches any number of characters or none at all
  • ? which matches any single character
  • [] which may contain single characters to be matched

Some example uses:

  • ls *.tar.gz (matches any file ending in .tar.gz)
  • ls c?t (matches cut, cat, and any other 3 letter words starting with c and ending with t)
  • ls c[au]t (matches cut, cat, but not other 3 letter words starting with c and ending with t)

WARNING When using globs with rm you need to be very careful to make sure the glob doesn't match more than intended. Using ls possibly with the -d option first is suggested. More so when used with -r. Some people run into problems because * does not match files that begin with a leading . as these are 'hidden' files, and are usually config files of some sort. This leads them them to run

Code Block
rm .*

and disaster ensues if -r is also specified. This is because .* matches the special directory entry .. which stands for my parent. Always use .??* as a glob which matches hidden files.

Anchor
grep
grep
grep

Anchor
head
head
head/tail

Anchor
kernel
kernel
kernel

The kernel is the core of the operating system. It is responsible for managing devices like disk drives, enforcing user permissions and access controls, and scheduling processes to be be run on the CPU.

Anchor
link
link
ln

The ln command creates links which are file tree entries. Most common are symbolic links which are just fake names which point to an existing name. You create a symbolic link using

Code Block
ln -s old new

where old and new are paths. Less common at least for normal uses are hard links. A hard link is a 2nd name for the same blocks of data. To create a hard link you use

Code Block
ln old new

They are different in many ways, but the most notable difference is that with symbolic links you know that the link exists. Thus, if you remove the old file then the link dangles waiting for it to be recreated. On the other hand, there is no overt indication that a hard link exists. Either name may be removed with out affecting the fact that the other still exists, and can be used to refer to the blocks on the disks that store a particular file. The next major difference is that you cannot hard link a directory, but you can soft link one. Finally, soft links can be seen using ls -l which shows them as new -> old.

Anchor
ls
ls
ls

ls is, next to cd, probably the most used UNIX command. It is used list out the contents of a directory. For example, you can do

Code Block
ls path

to have the listing of #path. The #path #argument is optional; if you don't provide one, then the current working #directory is listed out. If #path is a directory the contents of the directory are listed out. If #path is a file, then only the file is listed out which at least confirms its existence. You can provide multiple paths, too.

ls has many useful #options

  • -l => show also meta-data such as modification and access times, permissions, sizes, and ownerships
  • -a => show all files (by default, files which start with . are not listed out since they are usually configuration files)
  • -d => list out a directory, and not it's contents
  • -t => sort by modification time (puts newly created or modified files at top)
  • -r => reverse sort
  • -F => add 1 character at end to indicate file type:
    • * => executable (or at least executable permissions granted)
    • / => directory
    • others

Anchor
man
man
man

The man command displays the manual page (as in Read The Fine Manual) for a particular command. To view more about the use of man run the command

Code Block
man man

The -k option allows you to get a list (eg, man -k list shows you all of the commands with list in their synopsis which might be a way to search for information about listing files with the ls command. Finding, and reading man pages is an art that needs practice.

Anchor
mkdir
mkdir

mkdir is the command to create one or more pathnames provided as arguments. It has one useful options, -p, which is used to create any intermediate #path elements. For example,

Code Block
mkdir /home/foobar/doesnotexist/newdir

fails if /home, /home/foobar, and /home/doesnotexist all fail to exist. When -p is specified, though, any needed directories are created.

Anchor
more
more
more / less

more and less are used to view the contents of one or more files a screen full at at time. They differ because less allows you to forwards and backwards in the file. See also head/tail.

Anchor
mv
mv
mv

The move command, mv, is used when you want to rename a file. The basic usage is

Code Block
mv src dest

Anchor
NFS
NFS
NFS

Anchor
options
options
options

options are used to modify the behavior of a command.

Anchor
permissions
permissions
Permissions

Anchor
path
path
pathname or path

UNIX stores and locates files in a tree fashion. The top of the tree is called / (referred to as #root). / contains several directories such as etc, usr, home, and so on. These can be referred to by the pathnames /etc, /usr, /home, etc. These may contain other directories, or files. Each file in the UNIX filesystem is then located by the full pathname that specifies exactly what branches must be followed starting with / and ending with the file. Thus, /home/foobar/specialfiles/data.fasta is the full path name for the file called data.fasta which is in user foobar's home directory in a directroy called specialfiles. Since each #process has a current working directory, you can use the special . and .. directories to to specify a relative path to an object in the tree. For example, if /home/foobar/specialdir is your current working directory, ../anotherspecialdir/specialfile specifies the file /home/foobar/anotherspecialdirectory/specialfile. On the other hand ./specialfile refers to /home/foobar/specialdir/specialfile. Thus,

Code Block
ls ../anotherspecialdir

and

Code Block
ls ./specialfile

works like expected. Note that in the second case the ./ may be left off since by default ls looks in it's current directory if you don't specify either a full or relative path to an object. This is not true when running a command. In order for a command to be executed, it must either by specified as a full path, or it needs to be in your #PATH variable. Thus, if there is a program called foo in your current working directory

Code Block
foo 

it will not be executed unless your current working directory is listed in your PATH, but

Code Block
./foo

will (and overrides searching the PATH as does typing the full path name to foo). Information about a path can be found using #ls.

Anchor
PATH
PATH

The path is an environmental variable which is used to specify a list of paths to be searched for programs. The PATH variable is to children processes by their invoking process and is typically set by your #shell initialization to a value such as /usr/bin:/bin which are the default locations for OS provided programs. You can add to it by typing

Code Block
PATH=DIR:$PATH
export PATH

or

Code Block
PATH=$PATH:DIR
export PATH

depending on where you want the entry to be added (since the paths are searched in order, and the first match is used). Typically you would do this to avoid having to type out a full path such as /share/apps/python-2.6.5/bin/python to launch a program, or when there is some ambiguity and you want to clearly state what you want. For example, there is a /usr/bin/python, and a /share/apps/python-2.6.5/bin/python which we provide. If you do nothing, then /usr/bin/python is found first, and it is run. CCBB provided apps are installed in /share/apps (either /share/apps/bin, or a more specific application directory). Since there are so many, and some different versions we do not set them in the path. Instead we expect you will use #ls and #cd to find the ones you need, or look them up elsewhere on this wiki.

Anchor
pipe
pipe
pipe

A pipe is a connection between the standard input and standard output of two programs, and is specified with |. For example,

Code Block
a | b

runs the program a, and tells b to read the output produced by a. Thus,

Code Block
cat foo | sort | more

sorts file foo, and then displays it one screen at a time. The pipe is the cornerstone of UNIX philosophy: write simple programs that do one thing well, and use pipes to glue them together into more complex statements.

Anchor
process
process
process

A process is a program that has been loaded into memory, and which is being executed. A process which depends on a disk or network reading and writing is said to be I/O bound. On the opposite extreme are CPU bound jobs. While this is ideal, too many CPU bound jobs can make the system sluggish for interactive users.

Anchor
regexp
regexp
Regular Expression

Anchor
rm
rm
rm

The rm command is used to remove paths which are files. By default it doesn't ask for permissions, and once executed the file is gone forever (unless there is a copy, or a backup to restore from). rm has several options:

  • -f => force removal even if permissions say it should be protected
  • -r => used to remove a directory AND its contents AND the contents of any subdirectories AND their contents (ie, USE WITH CAUTION).
  • -i => ask before removing

Anchor
rmdir
rmdir
rmdir

The rmdir command is used to remove one or more directories which are specified as its arguements. To be removed, directories need to be empty. To get around this limitation without actually removing each item in the directory you can run rm -r. This is convenient if you are removing a directory tree which has several levels of sub-directories, but it is also potentially dangerous.

Anchor
root
root
root

Either the top of the filesystem tree, the #user account which can perform any task on the computer, or the not to be trifled with person who has access to the root account. Due to the 2nd usage, a UNIX system which has been broken in to is said to be "rooted". This is not considered a good thing.

Anchor
shell
shell
shell

Shells are used by #users to interactive with the system. Besides the ability to launch, and manage jobs, shells provide a powerful scripting language. This is the power of UNIX. By writing scripts it's easy to customize routine, repeatable actions that you might take to process your data. This also gives you a way to keep a record of how the process was done, and what might have resulted.

Accounts in CCBB are created using the Bourne Again Shell (BASH) which is descendant of the original Bourne Shell. This has been chosen as the standard because (1) it's the default for Linux systems, and (2) it has all of the nicer interactive features of other shells, while maintaining the scripting features of the Bourne, and Korn shells.

Learning to use the shell effectively is a major step towards harnessing the power of UNIX. Please review our New Users Guide for some good references.

Anchor
std
std
standard input/output/error and redirection

Just as every program has an environment, every program has 3 input / output sources. These are called standard input, standard output, and standard error. Standard input is the keyboard. Standard output, and standard error are the screen, but they differ because standard output is buffered to avoid overwhelming the windowing environment with lots of little changes. Standard error is intended for program coders to use to immediately notify a program's user of an error. This can be a problem since the order of output may then appear in random locations as the buffering of output occurs at the same time that standard error is used.

Using shell redirection you can send connect these to files. For example, suppose you have a program that expects you to type some input before it actually process your files. If you wanted to run this program over and over again with the same inputs you could use

Code Block
foo < input

where foo is the name the program, and input is a file that contains the input that you would have otherwise had to type. You can also redirect standard output using >. For example,

Code Block
foo > output

which runs foo, and saves the standard output in a file called output. Likewise, standard error can be redirected with 2> as in

Code Block
foo 2> error

You can redirect both with

Code Block
foo > out 2>&1

Here order matters as &1 indicates that standard error 2> is tied (or redirected) to the same location as standard output (via &1). If you ran

Code Block
foo 2>&1 >out

standard error would be tied to standard output (ie, the buffered version of the screen), and then standard output would be redirected to the file out. You can also do all 3

Code Block
foo < in > out 2>error

Standard error, and output are useful because they let you save a record of the output of your data processing. You can then later refer to it to remind yourself of how a data processing run went. This is so important that the clustering batch processing system automatically creates output and error files for you.

See also: #pipe, #script, and #screen

Anchor
swap
swap
swap

Swap is disk based virtual memory. When swap space is being used, the system is said to be swapping. This could be bad performance wise, but when memory is limited it may be the only way to get programs to run.

Anchor
touch
touch
touch

Anchor
user
user
user

A login to the system, though some user accounts are system accounts that do nothing more than own programs, or run services.

Anchor
variable
variable
variable

Anchor
zombie
zombie
zombie

A zombie is a #process which has terminated, and which is being kept by the #kernel in the process list in case the parent also terminates, or queries as to the state of the #process. When this occurs, the zombie is reaped from the process list.