Genome Variant Analysis Course 2023

Course Overview

The past several years this course was presented online due to COVID19, and due in part to positive responses received by participants, this year the course is being offered in a hybrid format. You should have received zoom contact information in your email which will be available for each day of the course. You are welcome to attend in person or via zoom on whichever days work best for your schedule, I will be in person each day. As this is yet another new way of presenting this course I would highly appreciate any feedback you have at the end of it.

The course is designed to have 2 ~90 minute sections per day for 5 days, with the goal of teaching you how to preform the standard next-generation sequencing analysis to identify genomic variants. This will be accomplished through: presentations covering information essential to all types of analysis, guided tutorials to reinforce the essential concepts, and optional self guided tutorials to help you learn the skills that are most specific to your own analysis. By the end of this course, we hope to achieve the following goals:

  1. Teach you different ways next generation sequencing libraries are constructed, and the advantages/disadvantages associated with the different types. 
  2. Familiarize you with how the Texas Advanced Computing Center (TACC) can be used to simplify and speed up your data analysis.
  3. Introduce you to common ways of installing programs useful for NGS analysis.
  4. Teach you the basics of read mapping in both individuals and populations, and identifying variants within individuals and rare variants within populations.
  5. Provide reference materials covering a breadth of material sufficient to give you a starting point of where to begin you own data analysis, and enough experience that you can begin that analysis on your own.

Below you will see a tentative schedule and list of tutorials. As the week goes on, links will be added to each of the headings.  

Your Instructor

Name

Initials

Affiliation

Expertise

Daniel Deatherage

DD

Barrick Lab

Unix, Python, NGS Library Prep, Capture, Rare Variant Identification

A nod to the past

I think it important to acknowledge a great deal of help with creating these web pages and materials from previous instructors of the Intro to NGS Bioinformatics course taught in 2013 and the Genome Variant Analysis Course taught in 2014-2016. Two individuals warrant special mention, the former director of the GSAF Scott Hunicke-Smith, and Jeffrey Barrick were the driving force behind this class for a number of years, and many of the tutorials presented here were originally developed by them or adapted from their work.


Verifying setup/access week of June 12th:

In order to ensure as smooth an experience as possible, the week prior to the course, each participant needs to:

  1. Log into TACC.
  2. Provide their TACC ID to Dan.

As mentioned in the introduction email it is extra important to take care of this as early as possible due to unknown TACC help desk availibility on the first day of class and the potential for problems that need their help.

Depending on your operating system you should complete either the window or mac tutorial below. If you are having difficulties be sure to email Dan so a zoom session can be scheduled, and avoid having to use class time on administrative things that may not be resolved without additional help from people at TACC who will not be present in the class.

Windows10

MacOS

Course Schedule

Monday, June 19th. Day 1 – "The Basics"

Presentation: General Course Introduction

Tutorial: Introduction to Linux and Stampede2

Presentation: Experimental Design & Library Prep

Tutorial: Evaluating Raw Sequencing Data

Tuesday, June 20th. Day 2 – "Principles of Variant calling"

Presentation: Read Mapping

Day 1 catchup

idev session reminder

Tutorial: Mapping Reads with bowtie2

Presentation: Single Nucleotide Variant Calling

Presentation: Structural Variant Calling

Tutorial: Using samtools to identify SNVs

Tutorial: Using SVDetect to identify SV

Bonus Presentation: Read Mapping Details and File Formats

Wednesday, June 21st. Day 3 – Visualization and Long Reads

Presentation: Errors: Where do they come from and how do we identify them as noise rather than signal?

Bonus Presentation: Error Reduction Methods - for when errors really do matter.

Tutorial: Visualization: Integrated Genome Viewer Tutorial

Tutorial: Visualization: Bacterial genome variants the easiest way – breseq


At this point in the course, you have the basic tools that will help you regardless of what type of research you are involved in. The remainder of the course is full of topics that are more specific to different research areas. They are divided into broad categories to help you decide which ones you want to complete during the remaining time. If you are unsure just ask and I'll help identify ones which may be more applicable to your work.

Thursday, June 22nd. Day 4 – User specific tutorials

Bacterial  Centric Tutorials

Tutorial: Advanced Breseq

Tutorial: breseq with multiple references

Tutorial: Evaluating Error Correction Using Breseq


Human and Higher Eukaryote Centric Tutorials

Tutorial: Human Trios Analysis

Tutorial: Comparing Multiple samples

Tutorial: GATK

Tutorial: Exome Capture Metrics – with GATK

Tutorial: Annotating with Annovar

Method based Tutorials that may be of help regardless of sample type

Tutorial: MultiQC - fastQC summary tool for multiple samples

Tutorial: Read processing with fastp

Tutorial: Genome Assembly (Short Reads) 

Tutorial: Novel DNA identification

Tutorial: Advanced mapping

Tutorial: Error Correction (Molecular Indexing)

Tutorial: Annotating plasmids with pLannotate

Long Read Tutorials (available Friday)

Tutorial: Introduction to long reads

Tutorial: Genome Assembly (Long Reads) 

Friday, June 23rd. Day 5 – Long reads and TACC the "normal" way

The first half of today's class will go through long reads. After the break, we will be go over a brief review to put things back in prospective and give you a tutorial on how to do things the 'normal way' on TACC which means using the job submission system and commands files before giving you any remaining time to go through tutorials and ask any remaining questions.

Presentation: Working with long reads

Presentation: Genome Variant Analysis Review

Tutorial: End of class review and data collection

Tutorial: Post-class changes to your environment