newcluster

The new version of the CCBB cluster deployed on May 22, 2011 has some new features that are being documented here.

Queues

There are now several different types of hardware, so custom queues are now on the system. This is actually by design, as it will now be easier for people to use our centralized resources without having to purchase a full cluster themselves. Instead, they would just need to buy their nodes. Currently the queues are phylo, kirkp, and bigmem. The phylo queue consists of the nodes in phylocluster. The bigmem and kirkp queues contain new Dell R410's. Each system has 64 GB, 16 processors, and 600 GB of disk space so they should support the processing of large data sets. The bigmem queue is hardware specifically bought for phylocluster users, but you should still refrain from launching jobs on it which will run from long periods of time. The nodes in kirkp were purchased by the Kirkpatrick Lab. However, they have graciously agreed to let anyone use them. Again, please refrain from filling kirkp with jobs that run for long periods of time.

To actually use one of the queues you will now be required to use

-q queue_name

with qsub.

We are actively researching the ability to checkpoint jobs for both fault tolerance purposes, and also for the purposes of moving them between queues.

File server

All data is now stored on the CCBB file server, files.ccbb.utexas.edu. This should make it easier for us manage user accounts, and data. Also, it will be easier to keep that data backed up. From the user perspective, you can continue to just use phylocluster to access your files. However, the file server provides "drag and drop" access to your files. For more information, see the file server web page. If you didn't already have access to files.ccbb.utexas.edu , you will need to come to PAT 141 for an account. People who used phylofiles.ccbb.utexas.edu for this "drag and drop" access to the cluster will likewise need to come get a new account.

This is supported by the purchase of a new Gigabit Ethernet switch, so you should not see any problems other maybe some delay when a lot of people are trying to access their files. This would also happen if files were locally stored on disk. It's just the nature of using a shared system. There is tuning we can do if things get seriously overloaded, but this requires that you let us know when this occurs, so that we can collect evidence as to what needs to be adjusted.

User Accounts

Not all user accounts have been migrated. Instead, only those people known to be using the cluster currently have been moved. The remaining accounts are being archived, but can easily be migrated if needed. Just let us know. Going forward people needing access to the cluster for the purposes of classes will be given a generic student account which can be disabled when the class is over. All other accounts will periodically reviewed to ensure that they are still needed. Keeping accounts on the system which don't have people using them is a big security risk, so we should avoid this if possible.

If you need your account put back on the cluster, please let us know.

Applications

The applications in /share/apps are not upgraded from the previous cluster version. This may mean that some things may crash, or refuse to run depending on what has changed in the cluster software. If you notice a problem, please let us know. To avoid problems like this in the future, 4 of the old phylo nodes have been set aside for a test cluster which can be used to test new cluster releases before deployment.

Also, more and more applications are being ported to the modules infrastructure. Modules make the use of applications more transparent, and allows us to keep several versions of a given package active. For more information about using software modules, please refer to the module home page.