Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Create a 'launcher' directory in your $WORK folder and cd into it. Next, copy over a utility script for splitting NextGen sequence files to this directory (courtesy of PerM https://code.google.com/p/perm/).

Code Block
cp /corral-repl/utexas/BioITeam/tacc_ngs$BASE/splitReads.sh .

and copy our example launcher script to this working directory. It's too complicated to type in!

Code Block
cp /corral-repl/utexas/BioITeam/tacc_ngs$BASE/bowtie-launcher.sh .

Now follow along as your instructor dissects the annotated bowtie-launcher script, and we'll submit at the very end. Then, you can say you've run a parallel NextGen job at TACC!

Code Block
#!/bin/bash

#$ -V
#$ -cwd
#$ -pe 1way 48
#$ -q normal
#$ -l h_rt=01:00:00
#$ -A 20121008-NGS-ACES
#$ -m be
#$ -M vaughn@tacc.utexas.edu
#$ -N bowtie-launcher

# Load the bowtie AND launcher modules
module load launcher
module load bowtie/0.12.8
module load samtools

# Simple variable to save typing
BASE="/corral-repl/utexas/BioITeam/tacc_ngs/human_variation/"
PREFIX="query"

# Create a working directory to hold a lot of intermediate files
TEMPDIR="tmp"
mkdir -p $TEMPDIR

# Now, run the handy splitReads utility
# Usage: spitReads.sh sourceFile outputPrefix
#
./splitReads.sh $BASE/human_variation/bigseqs_R1.fastq $TEMPDIR/$PREFIX
# The temp directory will now contain 46 files containing ~1M reads each

# Iterate over the 46 subfiles in the temp directory
# Craft a bowtie alignment command and write it out to the paramlist file
#
touch bowtie-launcher.paramlist
for C in ${TEMPDIR}/${PREFIX}_*
do
	echo "time bowtie --threads 12 -x -t -S $BASE/human_variation/ref/hs37d5.fa ${C} ${C}.sam && samtools view -S -b ${C}.sam > ${C}.bam" >> bowtie-launcher.paramlist
done

# Submit to the TACC Launcher
#
EXECUTABLE=$TACC_LAUNCHER_DIR/init_launcher
time $TACC_LAUNCHER_DIR/paramrun $EXECUTABLE bowtie-launcher.paramlist

# Optional: Consolidate the BAM files into a single BAM
# You can do this in a separate, dependent script so that 
# you free up all the other nodes associated with this task
# but here we show it so you can see that the final result of this
# workflow can be a single file
#
BAMS=${TEMPDIR}/*.bam
samtools merge bigseqs_R1.bam ${BAMS}

...