Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • wget – retrieves the contents of an Internet URL
  • cp – copies directories or files located on any local file system
  • scp – copies directories or files to/from a remote system
  • rsync – copies directories or files on either local or remote systems

(Read more about Copying files and directories)

TACC storage areas and Linux commands to access data
(all commands to be executed at TACC except
laptop-to-TACC copies, which must be executed on your laptop)

...

There are 3 local file systems available on any TACC compute cluster (stampede2stampede3, lonestar6, etc.), and your account has a directory in each of the three.

...


HomeWorkScratch
quota10 GB1024 GB = 1 TB2+ PB (basically infinite)none
policybacked upnot backed up,
not purged
not backed up,
purged if not accessed recently (~10 days)
access commandcdcdwcds
environment variable$HOME

$WORK (different sub-directory for each cluster)

$STOCKYARD (root of the shared Work file system)

$SCRATCH
root file system/home/work/scratch
use forSmall files such as scripts that you don't want to lose.Medium-sized files you don't want to copy over all the time. For example, custom programs you install (these can get large), or annotation file used for analysis.Large files accessed from batch jobs. Your starting files will be copied here from somewhere else, and your final results files will be copied elsewhere (e.g. stockyard, corral, your BRCF POD, or your organization's storage area.

...

Code Block
--------------------- Project balances for user abattenh ----------------------
| Name           Avail SUs     Expires | Name           Avail SUs     Expires |
| OTH21095           27688  905  2023-09-302025-01-31 | MCB21106DNAdenovo           3000 1496  20232024-09-30 |
| OTH21164            1010 215  20242025-0503-31 | OTH21180             899996  20242025-03-31 |
------------------------ Disk quotas for user abattenh ------------------------ 
| Disk         Usage (GB)     Limit    %Used   File Usage       Limit   %Used |
| /scratch            0.70       0.0     0.00            5670           0    0.00 |
| /home1              0.0      11.7     0.0102          232316           0    0.00 |
| /work             169.0    1024.0    16.50        79361     3000000    2.65 |
------------------------------------------------------------------------------- 

...

The rightmost Mounted on column give the top-level access path. Find /home1, /work, and /scratch and note their Size numbers!What do we mean by "hierarchy"? It is like a

Code Block
Filesystem                                           Size  Used Avail Use% Mounted on
devtmpfs                                             126G     0  126G   0% /dev
tmpfs                                                126G   43M  126G   1% /dev/shm
tmpfs                                                126G  4.1G  122G   4% /run
tmpfs                                                126G     0  126G   0% /sys/fs/cgroup
/dev/md127                                           150G   92G   59G  61% /
/dev/sda2                                           1014M  207M  808M  21% /boot
/dev/md126                                           284G   21G  264G   8% /tmp
/dev/md125                                           8.0G  4.3G  3.8G  54% /var
129.114.40.1:/admin                                  3.5T  714G  2.6T  22% /admin
129.114.40.7:/home1                                  7.0T  6.5T  504G  93% /home1
172.29.200.10@o2ib1172:172.29.200.11@o2ib1172:/work  6.8P  2.5P  4.3P  37% /work
129.114.52.169:/corral/main                           38P   21P   18P  55% /corral
tmpfs                                                 26G     0   26G   0% /run/user/891443
tmpfs                                                 26G     0   26G   0% /run/user/881379

What do we mean by "hierarchy"? The file system hierarchy is like a tree, with the root file system (denoted by the leading / ) as the trunk, sub-directories as branches, sub-sub-directories as branches from branches (and so forth), with files as leaves off any branch.

...

  • original – for original sequencing data (compressed FASTQ files)
    • sub-directories named, for example, by year_month.<sequencing run/job or project name>
  • aligned – for alignment data (BAM files, etc)
    • sub-directories named, e.g.,  by year_month.<project_name>
  • analysis – further downstream analysis
    • reasonably named sub-directories, often by project
  • refs – reference genomes and other annotation files used in alignment and analysis
    • sub-directories for different reference genomes and aligners
    • e.g. ucsc/hg38/star, ucsc/sacCer3/bwa, mirbase/v20/bowtie2
  • code – for scripts and programs you and others in your organization write
    • ideally maintained in a version control system such as git, subversion or cvs.
    • can have separate sub-directories for people, or various shared repositories.

...