Genome Variant Analysis Course 2015
We will meet in Room 4.128 of Mezes Hall (MEZ). We strongly encourage you to use the computers provided in the classroom for these tutorials, but you may also bring your personal laptops.
Course Overview
The course will be built based on 2 ~90 minute sections per day for 4 days, with a typical format of a brief presentation and a hands on guided tutorial during each section with additional "bonus tutorials" covering important (yet not critical) aspects of NGS data analysis that can be completed in each section time permitting, or on your own. By the end of this course, we hope to achieve the following goals:
- Teach you different ways next generation sequencing libraries are constructed, and the advantages/disadvantages associated with the different types.
- Familiarize you with how the Texas Advanced Computing Center (TACC) can be used to simplify and speed up your data analysis.
- Teach you the basics of read mapping in both individuals and populations, and identifying variants within individuals and rare variants within populations.
- Provide reference materials covering a breadth of material sufficient to give you a starting point of where to begin you own data analysis, and enough experience that you can begin that analysis on your own.
Your Instructors
Name | Initials | Affiliation | Expertise |
---|---|---|---|
Daniel Deatherage | DD | Barrick Lab | Unix, Python, NGS Library Prep, Capture, Rare Variant Identification |
Sean Leonard | SL | Barrick Lab | Unix, R |
A nod to the past
This class has been taught multiple times in the last few years. We wish to acknowledge a great deal of help with creating these web pages and materials from previous instructors of the Intro to NGS Bioinformatics course taught in May 2013 and the Genome Variant Analysis Course 2014 taught in May 2014.
Two individuals warrant special mention, the director of the GSAF Scott Hunicke-Smith, and Jeffrey Barrick have been the driving force behind this class for a number of years, and the majority of the tutorials presented here were developed by them or adapted from their work.
Course Schedule
Tuesday, May 26th. Day 1 – "The Basics"
Presentation: Next Generation Sequencing Library Preparation and Experimental Design (and general introduction)
Tutorial: Introduction to linux and lonestar
Bonus Tutorial: Evaluating raw sequencing data
Presentation: Single-nucleotide variant (SNV) calling
Presentation: Structural variant (SV) calling
Tutorial: Bacterial genome variants the easy way – breseq
Wednesday May 27th. Day 2 – "The Principles of Variant Calling"
Presentation: Read Mapping
Tutorial: Mapping with bowtie2
Tutorial: SNV calling with SAMtools with a post-class fix now available here
Tutorial: SV calling with SVDetect
Tutorial: Integrative Genome Viewer (IGV)
Bonus Tutorial: Evaluating mapped read data
Thursday May 28th. Day 3 – "Human Variant Calling"
Pre-presentation task: Day 3 Start (includes tutorials)
Presentation: What changes with humans?
Tutorial: Human Trios Analysis
Bonus Tutorial: Human variants with GATK
Bonus Tutorial: Tumor/normal Analysis with Virmid
Bonus Tutorial: Linux 1 liners (how to use grep and awk to get the most out of your work)
Bonus Tutorial: samtools mpileup in more detail on human (makes use of linux 1 liners)
Friday May 29th. Day 4 – "(Rare) Variant Detection in Populations"
Tutorial: Annotating variants with annovar
Bonus Tutorial: Filtering and screening variants
Presentation: Where do errors come from, and what can we do about them?
Presentation: Alternative library prep methods
Tutorial: Exome capture and metrics
Tutorial: Sequencing error correction (SSCS reads)
Bonus Tutorial: Rare variant detection in bacteria using breseq
Additional Resources
Here is a jumbled mess of things that have been presented in years past that should be ordered to be more useful.
- YouTube video explaining illumina sequencing
- NGS Course Resources Tool List
- GSAF adaptor and barcode sequence resource
- Working on TACC from your Mac or PC
- Scott's list of linux one-liners
- Installing Virtual machine & Linux on Windows
- Example BWA alignment script
- Variant calling with GATK (SPHS)
- Visualize mapped data at UCSC genome browser (AB)
- Genome variation in mixed samples (FreeBayes, deepSNV) (JB)
- SRA toolkit and Exercises (AB)
- Shell Scripting (SPHS/AB)
- Installing Linux tools (JB)
- Custom Genome Databases
- Evaluating & Visualizing assemblies (bacterial, SPHS)
- Genome Assembly Examples (SPHS)
- Tutorial: Genome Assembly (velvet) (SPHS)
- Visualize mapped data at UCSC genome browser (AB)
- ddRAD (Stacks tutorial: http://evomics.org/wp-content/uploads/2013/03/cesky_2014_RAD_tutorial_updated.pdf) , Tn-Seq?
Welcome to the University Wiki Service! Please use your IID (yourEID@eid.utexas.edu) when prompted for your email address during login or click here to enter your EID. If you are experiencing any issues loading content on pages, please try these steps to clear your browser cache.