...
Code Block |
---|
|
align_bwa_illumina.sh 20212022_0605_05
Align Illumina SE or PE data with bwa. Produces a sorted, indexed,
duplicate-marked BAM file and various statistics files. Usage:
align_bwa_illumina.sh <aln_mode> <in_file> <out_pfx> <assembly> [ paired trim_sz trim_sz2 seq_fmt qual_fmt ]
Required arguments:
aln_mode Alignment mode, either global (bwa aln) or local (bwa mem).
in_file For single-end alignments, path to input sequence file.
For paired-end alignments using fastq, path to the the R1
fastq file which must contain the string 'R1' in its name.
The corresponding 'R2' must have the same path except for 'R1'.
out_pfx Desired prefix of output files in the current directory.
assembly One of hg38, hg19, hg38, mm10, mm9, sacCer3, sacCer1, ce11, ce10,
danRer7, hs_mirbase, mm_mirbase, or reference index prefix.
Optional arguments:
paired 0 = single end alignment (default); 1 = paired end.
trim_sz Size to trim reads to. Default 0 (no trimming)
trim_sz2 Size to trim R2 reads to for paired end alignments.
Defaults to trim_sz
seq_fmt Format of sequence file (fastq, bam or scarf). Default is
fastq if the input file has a '.fastq' extension; scarf
if it has a '.sequence.txt' extension.
qual_type Type of read quality scores (sanger, illumina or solexa).
Default is sanger for fastq, illumina for scarf.
Environment variables:
show_only 1 = only show what would be done (default not set)
aln_args other bowtie2 options (e.g. '-T 20' for mem, '-l 20' for aln)
no_markdup 1 = don't mark duplicates (default 0, mark duplicates)
run_fastqc 1 = run fastqc (default 0, don't run). Note that output
will be in the directory containing the fastq files.
keep 1 = keep unsorted BAM (default 0, don't keep)
bwa_bin BWA binary to use. Default bwa 0.7.x. Note that bwa 0.6.2
or earlier should be used for scarf and other short reads.
also: NUM_THREADS, BAM_SORT_MEM, SORT_THREADS, JAVA_MEM_ARG
Examples:
align_bwa_illumina.sh local ABC_L001_R1.fastq.gz my_abc hg38 1
align_bwa_illumina.sh global ABC_L001_R1.fastq.gz my_abc hg38 1 50
align_bwa_illumina.sh global sequence.txt old sacCer3 0 '' '' scarf solexa |
...
We're going to run this script and a similar Bowtie2 alignment script, on the yeast data using the TACC batch system. In a new directory, copy over the commands and submit the batch job. We ask for 2 hours (-t 02:00:00) with 4 tasks/node (-w 4); since we have 4 commands, this will run on 1 compute node.
Code Block |
---|
language | bash |
---|
title | Run multiple alignments using the TACC batch system |
---|
|
# Make sure you're not in an idev session by looking at the hostname
hostname
# If the hostname looks like "c455-004.stampede2ls6.tacc.utexas.edu", exit the idev session
# Copy over the Yeast data if needed
mkdir -p $SCRATCH/core_ngs/alignment/fastq
cp $CORENGS/alignment/Sample_Yeast*.gz $SCRATCH/core_ngs/alignment/fastq/
# Make a new alignment directory for running these scripts
mkdir -p $SCRATCH/core_ngs/alignment/bwa_script
cd $SCRATCH/core_ngs/alignment/bwa_script
ln -s -f ../fastq
# Copy the alignment commands file and submit the batch job
cp $CORENGS/tacc/aln_script.cmds .
launcher_creator.py -j aln_script.cmds -n aln_script -t 02:00:00 -w 4 -a UT-2015-05-18OTH21164 -q normal
sbatch --reservation=BIO_DATA_week_1CoreNGSday4 aln_script.slurm
showq -u |
...
Code Block |
---|
language | bash |
---|
title | Commands to run multiple alignment scripts |
---|
|
/work2work/projects/BioITeam/common/script/align_bwa_illumina.sh global ./fastq/Sample_Yeast_L005_R1.cat.fastq.gz bwa_global sacCer3 1 50
/work2work/projects/BioITeam/common/script/align_bwa_illumina.sh local ./fastq/Sample_Yeast_L005_R1.cat.fastq.gz bwa_local sacCer3 1
/work2work/projects/BioITeam/common/script/align_bowtie2_illumina.sh global ./fastq/Sample_Yeast_L005_R1.cat.fastq.gz bt2_global sacCer3 1 50
/work2work/projects/BioITeam/common/script/align_bowtie2_illumina.sh local ./fastq/Sample_Yeast_L005_R1.cat.fastq.gz bt2_local sacCer3 1 |
...