Environment setup
Directories and symlinks
Directories and links needed in your home directory.
cd ln -s -f $SCRATCH scratch ln -s -f $WORK work ln -s -f /work/projects/BioITeam mkdir -p ~/local/bin cd ~/local/bin ln -s -f /work/projects/BioITeam/common/bin/launcher_maker.py ln -s -f /work/projects/BioITeam/ls5/opt/cutadapt-1.10/bin/cutadapt ln -s -f /work/projects/BioITeam/ls5/opt/multiqc-1.0/multiqc ln -s -f /work/projects/BioITeam/ls5/opt/samstat-1.09/samstat
.bashrc setup
If you already have a .bashrc set up, make a backup copy first. You can restore your original login script after class is over.
cd cp .bashrc .bashrc.beforeNGS
Copy and configure the login profile for this class
cd cp /work/projects/BioITeam/projects/courses/Core_NGS_Tools/tacc/bashrc.corengs.ls5 .bashrc chmod 600 .bashrc
Source it to make it active (if this doesn't work, log off then log back in):
source ~/.bashrc
Environment variables
General
export ALLOCATION=UT-2015-05-18 export BI=/corral-repl/utexas/BioITeam export BIWORK=/work/projects/BioITeam export CORENGS=$BIWORK/projects/courses/Core_NGS_Tools export PATH=.:$HOME/local/bin:$PATH # For cutadapt support: export PYTHONPATH=$BIWORK/ls5/lib/python2.7/site-packages:$PYTHONPATH # For MultiQC support: export PYTHONPATH=$BIWORK/ls5/lib/python2.7/annab-packages:$PYTHONPATH
Turn on coloring by file type in the shell:
export LS_OPTIONS='-N --color=auto -T 0' # For better colors using a white background terminal, un-comment this line: export LS_COLORS=$LS_COLORS:'di=1;33:' # For better colors using a white background terminal: export LS_COLORS=$LS_COLORS:'di=1;34:'
TACC intro
Commands files
Simple commands
mkdir -p $SCRATCH/core_ngs/slurm/simple cd $SCRATCH/core_ngs/slurm/simple cp $CORENGS/tacc/simple.cmds
Wayness commands
mkdir -p $SCRATCH/core_ngs/slurm/wayness cd $SCRATCH/core_ngs/slurm/wayness cp $CORENGS/tacc/wayness.cmds .
Start an idev session
To start a 3-hour idev (interactive development) session:
idev -p normal -m 180 -N 1 -n 24 -A UT-2015-05-18 --reservation=CCBB
You can tell you're in a idev session because the hostname command will return a compute node name (e.g. nid00438) instead of a login node name (e.g. login5).
The n idev session will terminate when the requested time has expired, or you use the exit command.
Working with FASTQ
Yeast data
Working with some yeast ChIP-seq FASTQ data:
# Area for "original" sequencing data mkdir -p $WORK/archive/original/2018_05.core_ngs cd $WORK/archive/original/2018_05.core_ngs wget http://web.corral.tacc.utexas.edu/BioITeam/yeast_stuff/Sample_Yeast_L005_R1.cat.fastq.gz wget http://web.corral.tacc.utexas.edu/BioITeam/yeast_stuff/Sample_Yeast_L005_R2.cat.fastq.gz # Create a $SCRATCH area for FASTQ prep and link the yeast data there mkdir -p $SCRATCH/core_ngs/fastq_prep cd $SCRATCH/core_ngs/fastq_prep ln -s -f $WORK/archive/original/2018_05.core_ngs/Sample_Yeast_L005_R1.cat.fastq.gz ln -s -f $WORK/archive/original/2018_05.core_ngs/Sample_Yeast_L005_R2.cat.fastq.gz # Copy over a small FASTQ file cd $SCRATCH/core_ngs/fastq_prep cp $CORENGS/misc/small.fq .
ATACseq data for MultiQC
Get some FastQC reports for MultiQC:
mkdir -p $SCRATCH/core_ngs/multiqc/fqc.atacseq cd $SCRATCH/core_ngs/multiqc/fqc.atacseq cp $CORENGS/multiqc/fqc.atacseq/*.html
FASTQ files for cutadapt
For command-line cutadapt exploration:
cd $SCRATCH/core_ngs/fastq_prep cp $CORENGS/human_stuff/Sample_H54_miRNA_L004_R1.cat.fastq.gz . cp $CORENGS/human_stuff/Sample_H54_miRNA_L005_R1.cat.fastq.gz . zcat Sample_H54_miRNA_L004_R1.cat.fastq.gz | head -2000 > miRNA_test.fq
For batch cutadapt processing:
mkdir -p $SCRATCH/core_ngs/cutadapt cd $SCRATCH/core_ngs/cutadapt cp $CORENGS/human_stuff/Sample_H54_miRNA_L004_R1.cat.fastq.gz . cp $CORENGS/human_stuff/Sample_H54_miRNA_L005_R1.cat.fastq.gz . cp $CORENGS/yeast_stuff/Yeast_RNAseq_L002_R1.fastq.gz . cp $CORENGS/yeast_stuff/Yeast_RNAseq_L002_R2.fastq.gz . cp $CORENGS/tacc/cuta.cmds .
Alignment workflow
Alignment workflow setup
Starting files:
# FASTA (for building references) mkdir -p $SCRATCH/core_ngs/references/fasta cp $CORENGS/references/*.* $SCRATCH/core_ngs/references/fasta/ # FASTQ (to align) mkdir -p $SCRATCH/core_ngs/alignment/fastq cp $CORENGS/alignment/*fastq.gz $SCRATCH/core_ngs/alignment/fastq/
References
Get a copy of all references we build in the exercises (including FASTA):
mkdir -p $SCRATCH/core_ngs/references rsync -ptlvrP $CORENGS/references/ $SCRATCH/core_ngs/references/
BWA PE alignment of yeast data
To jump into aligning PE yeast data with BWA
# Pre-built references mkdir -p $SCRATCH/core_ngs/references rsync -avrP $CORENGS/references/ $SCRATCH/core_ngs/references/ # FASTQ (to align) mkdir -p $SCRATCH/core_ngs/alignment/fastq cp $CORENGS/alignment/*fastq.gz $SCRATCH/core_ngs/alignment/fastq/ # Alignment directory mkdir -p $SCRATCH/core_ngs/alignment/yeast_bwa cd $SCRATCH/core_ngs/alignment/yeast_bwa ln -s -f ../fastq ln -s -f ../../references/bwa/sacCer3 module load bwa module load samtools
samtools manipulation of aligned yeast data
To jump into post-alignment manipulation of the yeast_pairedend.bam with samtools:
mkdir -p $SCRATCH/core_ngs/alignment/yeast_bwa cd $SCRATCH/core_ngs/alignment/yeast_bwa cp $CORENGS/catchup/yeast_bwa/yeast_pairedend.bam . module load samtools # If the sorted, indexed BAM is needed: cp $CORENGS/catchup/yeast_bwa/yeast_pairedend.sort* .
SAMTools and BEDTools
Setup for samtools
mkdir -p $SCRATCH/core_ngs/samtools cd $SCRATCH/core_ngs/samtools cp $CORENGS/catchup/for_samtools/* . module load samtools