Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Data wrangling best practices

NGS is smack dab in the middle of the Big Data revolution. Initial NGS fastq files are big (100s of MB to GB) – and they're just the start.

Organization and good practices are critical! Your data can get out of hand very quickly!

keep fastq files compressed

...

Artifacts from different stages of the analysis will have different archival requirements.

  • Original sequence data (fastq files)
    • must be backed up!
  • Alignments
    • usually larger than original fastqs
    • should be backed up once stable
  • Peak calling artifacts
  • Downstream analysis artifacts

...