A healthy taste of resources available, specifically for this course - not a comprehensive catalog.
...
- Linux fundamentals on this wiki
- Wikis for the 3 CBRS Unix/Linux workshops:
Online tutorials:
- Ryan's Linux Tutorial: http://ryanstutorials.net/linuxtutorial/
- Unix bootcamp for biologists: http://korflab.ucdavis.edu/bootcamp.html
- Unix primer (longer version) for biologists:
...
- UCSC Genome Browser - visualize and download NGS data (see more below)
- Broad Institute Integrated Genomcs Genomics Viewer (IGV)
- especially good for visualizing BAM file details
- Introduction to Sequence analysis in the Amazon EC2 cloud
- where you can "rent" Linux machines (useful if you don't have access to TACC)
- Galaxy website for online sequencing data analysis
- SEQAnwers forum - many NGS sequencing questions answered here
- A funny SEQAnwers post about biologists starting to analyze NGS data: http://seqanswers.com/forums/showthread.php?t=4589
...
- Overviews
Technology intros
- Illumina (Solexa) – most common "short" (< 300 bp) read sequencing
- Newer single molecule sequencing
- Single cell sequencing
- Older technologies (less common now)
Life Technologies SOLiD (short reads in "colorspace")
Roche/454 – long (multi-Kb) reads often used in assemblies
- Illumina (Solexa) – most common "short" (< 300 bp) read sequencing
...
- File formats
- input: FASTQ format
- output: the SAM (Sequence Alignment Map) format specification
- SAM1.pdf – header fields, body fields, flag definitions
- https://github.com/samtools/hts-specs/blob/master/SAMtags.pdf – tag fields
- Aligners
- bwa (Burrows-Wheeler Aligner) by Heng Li – http://bio-bwa.sourceforge.net/
- fast, sensitive and easy to use
- bowtie2 – http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml
- fast, sensitive and extremely configurable
- bwa (Burrows-Wheeler Aligner) by Heng Li – http://bio-bwa.sourceforge.net/
- Comparison of different aligners
- by Heng Li, developer of bwa, samtools, and many other bioinformatics tools
- The BioITeam has some TACC-aware alignment scripts you might find useful:
- bwa alignment
/
work/projects/BioITeam/common/script
/align_bwa_illumina.sh
- bowtie2 alignment
/
work/projects/BioITeam/common/script/
align_bowtie2_illumina.sh
- merging sorted BAM files (read-group aware)
/
work/projects/BioITeam/common/script/
merge_sorted_bams.sh
- kallisto pseudo-alignment to annotated transcripts
/work/projects/BioITeam/common/script/run_kallisto.sh
- also available on many BRCF pods under /mnt/bioi/script.
- many pre-built references also available in /mnt/bioi/ref_genome
- email or come talk to Anna if you have questions or problems
- bwa alignment
...
- SAM (Sequence Alignment Map) format specification (SAM1.pdf)
- Translate SAM file flags web calculator: http://broadinstitute.github.io/picard/explain-flags.html
- type in a decimal number to see which flags are set
- Translate SAM file flags web calculator: http://broadinstitute.github.io/picard/explain-flags.html
- samtools – by Heng Li
- SAM/BAM conversion, flag filtering, sorting, indexing, duplicate filtering
- older 0.1.xx versions: http://samtools.sourceforge.net/
- newer 1.3+ versions: http://www.htslib.org/
- Picard toolkit – http://broadinstitute.github.io/picard/
- SAM/BAM utilities that are read-group aware
- especially MarkDuplicates for flagging duplicate alignments
- bedtools – http://bedtools.readthedocs.org/en/latest/
- All sub-commands: http://bedtools.readthedocs.io/en/latest/content/bedtools-suite.html
- Swiss army knife for all manner of common BED, BAM, VCF, GFF/GTF file manipulation.
- See BEDTools Overview for some common use cases.
- Available in the TACC module system
- RNA-seq QC, metrics & plotting tools:
- RSeQC – http://rseqc.sourceforge.net/
RNA-SeQC (Broad Institute) –
- RNA-QC-Chain – http://bioinfo.single-cell.cn/rna-qc-chain.html
File formats and conversion
- SAM format specification – http://samtools.github.io/hts-specs/SAMv1.pdf
- crucial for performing format conversions, of which ChIP-seq analysis can have many
- HTS format specifications – http://samtools.github.io/hts-specs/
- clearinghouse page for a number of NGS formats (SAM, CRAM, VCF, BCF, etc.)
- Genome browser file formats – http://genome.ucsc.edu/FAQ/FAQformat.html
- BED, bedGraph, narrowPeak and many more
- SRA (Sequence Read Archive) from NCBI
- BioITeam script for converting GTF/GFF3 files to BED format
/work/projects/BioITeam/common/script/gtf_to_bed.pl
- UCSC file format conversion scripts - useful for getting to/from WIG and BED to corresponding binary formats
- Make sure you download the correct scripts for your operating system!
- Also available as a BioContainers module
...