Table of Contents |
---|
Data wrangling best practices
NGS is smack dab in the middle of the Big Data revolution. Initial NGS fastq files are big (100s of MB to GB) – and they're just the start.
Organization and good practices are critical! Your data can get out of hand very quickly!
keep fastq files compressed
...
Artifacts from different stages of the analysis will have different archival requirements.
- Original sequence data (fastq files)
- must be backed up!
- Alignments
- usually larger than original fastqs
- should be backed up once stable
- Peak calling artifacts
- Downstream analysis artifacts
...