...
- wget – retrieves the contents of an Internet URL
- cp – copies directories or files located on any local file system
- scp – copies directories or files to/from a remote system
- rsync – copies directories or files on either local or remote systems
(Read more about Copying files and directories)
TACC storage areas and Linux commands to access data (all commands to be executed at TACC except laptop-to-TACC copies, which must be executed on your laptop) |
...
There are 3 local file systems available on any TACC compute cluster (stampede2stampede3, lonestar6, etc.), and your account has a directory in each of the three.
...
Home | Work | Scratch | |
---|---|---|---|
quota | 10 GB | 1024 GB = 1 TB | 2+ PB (basically infinite)none |
policy | backed up | not backed up, not purged | not backed up, purged if not accessed recently (~10 days) |
access command | cd | cdw | cds |
environment variable | $HOME | $WORK (different sub-directory for each cluster) $STOCKYARD (root of the shared Work file system) | $SCRATCH |
root file system | /home | /work | /scratch |
use for | Small files such as scripts that you don't want to lose. | Medium-sized files you don't want to copy over all the time. For example, custom programs you install (these can get large), or annotation file used for analysis. | Large files accessed from batch jobs. Your starting files will be copied here from somewhere else, and your final results files will be copied elsewhere (e.g. stockyard, corral, your BRCF POD, or your organization's storage area. |
...
Code Block |
---|
--------------------- Project balances for user abattenh ---------------------- | Name Avail SUs Expires | Name Avail SUs Expires | | OTH21095 27688 905 2023-09-302025-01-31 | MCB21106DNAdenovo 3000 1496 20232024-09-30 | | OTH21164 1010 215 20242025-0503-31 | OTH21180 899996 20242025-03-31 | ------------------------ Disk quotas for user abattenh ------------------------ | Disk Usage (GB) Limit %Used File Usage Limit %Used | | /scratch 0.70 0.0 0.00 5670 0 0.00 | | /home1 0.0 11.7 0.0102 232316 0 0.00 | | /work 169.0 1024.0 16.50 79361 3000000 2.65 | ------------------------------------------------------------------------------- |
...
The rightmost Mounted on column give the top-level access path. Find /home1, /work, and /scratch and note their Size numbers!What do we mean by "hierarchy"? It is like a
Code Block |
---|
Filesystem Size Used Avail Use% Mounted on
devtmpfs 126G 0 126G 0% /dev
tmpfs 126G 43M 126G 1% /dev/shm
tmpfs 126G 4.1G 122G 4% /run
tmpfs 126G 0 126G 0% /sys/fs/cgroup
/dev/md127 150G 92G 59G 61% /
/dev/sda2 1014M 207M 808M 21% /boot
/dev/md126 284G 21G 264G 8% /tmp
/dev/md125 8.0G 4.3G 3.8G 54% /var
129.114.40.1:/admin 3.5T 714G 2.6T 22% /admin
129.114.40.7:/home1 7.0T 6.5T 504G 93% /home1
172.29.200.10@o2ib1172:172.29.200.11@o2ib1172:/work 6.8P 2.5P 4.3P 37% /work
129.114.52.169:/corral/main 38P 21P 18P 55% /corral
tmpfs 26G 0 26G 0% /run/user/891443
tmpfs 26G 0 26G 0% /run/user/881379 |
What do we mean by "hierarchy"? The file system hierarchy is like a tree, with the root file system (denoted by the leading / ) as the trunk, sub-directories as branches, sub-sub-directories as branches from branches (and so forth), with files as leaves off any branch.
...
- original – for original sequencing data (compressed FASTQ files)
- sub-directories named, for example, by year_month.<sequencing run/job or project name>
- aligned – for alignment data (BAM files, etc)
- sub-directories named, e.g., by year_month.<project_name>
- analysis – further downstream analysis
- reasonably named sub-directories, often by project
- refs – reference genomes and other annotation files used in alignment and analysis
- sub-directories for different reference genomes and aligners
- e.g. ucsc/hg38/star, ucsc/sacCer3/bwa, mirbase/v20/bowtie2
- code – for scripts and programs you and others in your organization write
- ideally maintained in a version control system such as git, subversion or cvs.
- can have separate sub-directories for people, or various shared repositories.
...