Table of Contents |
---|
...
- Most sequencing facilities will give you compressed sequencing data files
- gzip format (.gz extension) for individual files
- tar or zip format for directories of files
- Even with compression it's easy to run out of storage space!
You may be tempted un-compress decompress your sequencing files to manipulate them more directly
- resist the temptation to gunzip!
- nearly all modern bioinformatics tools are able to work on .gz files
- there are techniques for working with compressed files without ever un-compressing decompressing them
arrange adequate storage space
- At TACC
- Obtain an allocation on TACC's corral disk array (initial 5 TB are no-cost)
- Stage your active projects on corral or $WORK$WORK2
- copy data to $SCRATCH for analysis
- copy important analysis products back to corral or $WORK$WORK2
- Periodically back up corral or $WORK2 directories to ranch tape archive
- On a UT Biomedical Research Support Facility (BRCF) "POD"
- See https://wikis.utexas.edu/display/RCTFusers
- Home and Work areas on POD servers are automatically backed up weekly
- and archived to ranch every 4-6 months
- Home and Work areas on POD servers are automatically backed up weekly
- GSAF customers can obtain a no-cost 2 TB allocation on the shared GSAF POD
- See https://wikis.utexas.edu/display/RCTFusers
backup analysis artifacts regularly
- All TACC users automatically have a 2 TB allocation TACC's ranch tape archive system
- larger allocations can be requested by project owners in the TACC User Portal
- free! and under-utilized
- Periodically back up your corral or $WORK$WORK2 directories to ranch tape archive
- large directories should be combined first using the tar program
- large directories should be combined first using the tar program
distinguish between types of data
...
- Original sequence data (FASTQ files)
- must be backed up!
- Alignments
- usually larger than original FASTQs
- can be backed up once stable
- Downstream analysis artifacts
- Reporting artifacts (plots, plotting code)
While a project is active you will want to keep more intermediate artifacts for reference. Many of these can be deleted removed after publication.
track your analysis steps
...