...
Create a 'launcher' directory in your $WORK folder and cd into it. Next, copy over a utility script for splitting NextGen sequence files to this directory (courtesy of PerM https://code.google.com/p/perm/).
Code Block |
---|
cp /corral-repl/utexas/BioITeam/tacc_ngs$BASE/splitReads.sh . |
and copy our example launcher script to this working directory. It's too complicated to type in!
Code Block |
---|
cp /corral-repl/utexas/BioITeam/tacc_ngs$BASE/bowtie-launcher.sh . |
Now follow along as your instructor dissects the annotated bowtie-launcher script, and we'll submit at the very end. Then, you can say you've run a parallel NextGen job at TACC!
Code Block |
---|
#!/bin/bash #$ -V #$ -cwd #$ -pe 1way 48 #$ -q normal #$ -l h_rt=01:00:00 #$ -A 20121008-NGS-ACES #$ -m be #$ -M vaughn@tacc.utexas.edu #$ -N bowtie-launcher # Load the bowtie AND launcher modules module load launcher module load bowtie/0.12.8 module load samtools # Simple variable to save typing BASE="/corral-repl/utexas/BioITeam/tacc_ngs/human_variation/" PREFIX="query" # Create a working directory to hold a lot of intermediate files TEMPDIR="tmp" mkdir -p $TEMPDIR # Now, run the handy splitReads utility # Usage: spitReads.sh sourceFile outputPrefix # ./splitReads.sh $BASE/human_variation/bigseqs_R1.fastq $TEMPDIR/$PREFIX # The temp directory will now contain 46 files containing ~1M reads each # Iterate over the 46 subfiles in the temp directory # Craft a bowtie alignment command and write it out to the paramlist file # touch bowtie-launcher.paramlist for C in ${TEMPDIR}/${PREFIX}_* do echo "time bowtie --threads 12 -x -t -S $BASE/human_variation/ref/hs37d5.fa ${C} ${C}.sam && samtools view -S -b ${C}.sam > ${C}.bam" >> bowtie-launcher.paramlist done # Submit to the TACC Launcher # EXECUTABLE=$TACC_LAUNCHER_DIR/init_launcher time $TACC_LAUNCHER_DIR/paramrun $EXECUTABLE bowtie-launcher.paramlist # Optional: Consolidate the BAM files into a single BAM # You can do this in a separate, dependent script so that # you free up all the other nodes associated with this task # but here we show it so you can see that the final result of this # workflow can be a single file # BAMS=${TEMPDIR}/*.bam samtools merge bigseqs_R1.bam ${BAMS} |
...