Genome Variant Analysis Course 2015

Owned by Deatherage, Daniel E

Last updated: May 29, 2015

We will meet in Room 4.128 of Mezes Hall (MEZ). We strongly encourage you to use the computers provided in the classroom for these tutorials, but you may also bring your personal laptops.

Course Overview

The course will be built based on 2 ~90 minute sections per day for 4 days, with a typical format of a brief presentation and a hands on guided tutorial during each section with additional "bonus tutorials" covering important (yet not critical) aspects of NGS data analysis that can be completed in each section time permitting, or on your own. By the end of this course, we hope to achieve the following goals:

Teach you different ways next generation sequencing libraries are constructed, and the advantages/disadvantages associated with the different types.
Familiarize you with how the Texas Advanced Computing Center (TACC) can be used to simplify and speed up your data analysis.
Teach you the basics of read mapping in both individuals and populations, and identifying variants within individuals and rare variants within populations.
Provide reference materials covering a breadth of material sufficient to give you a starting point of where to begin you own data analysis, and enough experience that you can begin that analysis on your own.

Your Instructors

Name	Initials	Affiliation	Expertise
Daniel Deatherage	DD	Barrick Lab	Unix, Python, NGS Library Prep, Capture, Rare Variant Identification
Sean Leonard	SL	Barrick Lab	Unix, R

A nod to the past

This class has been taught multiple times in the last few years. We wish to acknowledge a great deal of help with creating these web pages and materials from previous instructors of the Intro to NGS Bioinformatics course taught in May 2013 and the Genome Variant Analysis Course 2014 taught in May 2014.

Two individuals warrant special mention, the director of the GSAF Scott Hunicke-Smith, and Jeffrey Barrick have been the driving force behind this class for a number of years, and the majority of the tutorials presented here were developed by them or adapted from their work.

Course Schedule

Tuesday, May 26th. Day 1 – "The Basics"

Presentation: Next Generation Sequencing Library Preparation and Experimental Design (and general introduction)

Tutorial: Introduction to linux and lonestar

Bonus Tutorial: Evaluating raw sequencing data

Presentation: Single-nucleotide variant (SNV) calling

Presentation: Structural variant (SV) calling

Tutorial: Bacterial genome variants the easy way – breseq

Wednesday May 27th. Day 2 – "The Principles of Variant Calling"

Presentation: Read Mapping

Tutorial: Mapping with bowtie2

Tutorial: SNV calling with SAMtools with a post-class fix now available here

Tutorial: SV calling with SVDetect

Tutorial: Integrative Genome Viewer (IGV)

Bonus Tutorial: Evaluating mapped read data

Thursday May 28th. Day 3 – "Human Variant Calling"

Pre-presentation task: Day 3 Start (includes tutorials)

Presentation: What changes with humans?

Tutorial: Human Trios Analysis

Bonus Tutorial: Human variants with GATK

Bonus Tutorial: Tumor/normal Analysis with Virmid

Bonus Tutorial: Linux 1 liners (how to use grep and awk to get the most out of your work)

Bonus Tutorial: samtools mpileup in more detail on human (makes use of linux 1 liners)

Friday May 29th. Day 4 – "(Rare) Variant Detection in Populations"

Tutorial: Annotating variants with annovar

Bonus Tutorial: Filtering and screening variants

Presentation: Where do errors come from, and what can we do about them?

Presentation: Alternative library prep methods

Tutorial: Exome capture and metrics

Tutorial: Sequencing error correction (SSCS reads)

Bonus Tutorial: Rare variant detection in bacteria using breseq

Additional Resources

Here is a jumbled mess of things that have been presented in years past that should be ordered to be more useful.

YouTube video explaining illumina sequencing
NGS Course Resources Tool List
GSAF adaptor and barcode sequence resource
Working on TACC from your Mac or PC
- Editing files, more detail
Scott's list of linux one-liners
Installing Virtual machine & Linux on Windows
Example BWA alignment script
Variant calling with GATK (SPHS)
Visualize mapped data at UCSC genome browser (AB)
Genome variation in mixed samples (FreeBayes, deepSNV) (JB)
SRA toolkit and Exercises (AB)
Shell Scripting (SPHS/AB)
Installing Linux tools (JB)
Custom Genome Databases
Evaluating & Visualizing assemblies (bacterial, SPHS)
Genome Assembly Examples (SPHS)
Tutorial: Genome Assembly (velvet) (SPHS)
Visualize mapped data at UCSC genome browser (AB)
ddRAD (Stacks tutorial: http://evomics.org/wp-content/uploads/2013/03/cesky_2014_RAD_tutorial_updated.pdf) , Tn-Seq?

Confluence Documentation | Web Privacy Policy | Web Accessibility