2020 Introduction
Your Instructors
Anna Battenhouse, Associate Research Scientist, abattenhouse@utexas.edu
BA English literature, 1978
Commercial software development 1982 – 2007
Joined Iyer Lab 2007 (“retirement career”)
BS Biochemistry, UT Austin, 2013
- Joined the Biomedical Research Support Facility (BRCF) and Marcotte Lab summer 2017
- Also affiliated with
Vy Dang, vyqtdang@utexas.edu
Second year graduate student in the Marcotte Lab
Research Interests: Comparative multi-omics, Systems Biology, and Evolution
Riddhiman Garge, riddhimankg@utexas.edu
Biochemistry graduate student in the Marcotte Lab
- Research Interests: Systems Biology, Synthetic Biology, Evolution, Multi-omics
About the Iyer Lab (where Anna learned NGS)
Dr. Vishy Iyer, PI | |
Main focus is functional genomics
| |
Research methods include
| |
|
Communication
Asking questions
Feel free to ask questions any time during the instructor's lecture and demonstrations.
You can also post your question to the Zoom chat.
Breakout rooms
We'll sometimes use breakout rooms when working on short assignments, and when troubleshooting problems you run into. As you login to the Zoom, you'll be assigned to one of two breakout rooms, each associated with one of our TAs (Vy or Riddhiman).
Getting help
Since most folks are new to the Linux command line, we expect you to run into problems! Please let us know if you're having difficulties in the Zoom chat (FRI students may want to private chat with their REs first).
Making mistakes and running into problems is key to learning the Linux command line! It is not only expected – it is encouraged . So once you tell us you're having an issue and get our attention, we encourage you to share your screen so everyone can benefit from shared troubleshooting.
If you'd prefer not to share your screen with with the class, a TA may ask you to join a breakout room to help you, depending on the issue.
Conventions
If you see a block of text like this:
ls -h
it means, type the command ls -h
into a terminal window, hit Enter, and see what happens.
We intend this course to offer as much self-learning as possible. Consequently, you'll find many sections like this - click on the triangle to expand them:
and some sections like this:
Course goals
- Hands-on, tutorial style – learn by doing
- common bioinformatics tools & file formats
- Introduce NGS vocabulary
- both high-level view and practice with specific tools
- Cover the NGS basics
- the first few things you'll do after receiving raw sequences
- raw sequence preparation
- alignment to reference
- basic alignment analysis
- the first few things you'll do after receiving raw sequences
- Understand and practice required skills
- Get you comfortable with Linux and TACC – your best "frenemies"
- Make you self-sufficient enough in 5 days to become experts over time
- Show some "best practices" for working with NGS data
NGS Challenges
Diverse skill set requirements
|
Large and growing datasets
NGS methods produce staggering amounts of data!
Typical dataset these days
- yeast: 5 – 20 million reads
- human: 20 – 250 million reads
- single or paired end, length 75 – 250 bases
The initial fastq files are big (100s of MB to GB) – and they're just the start.
- Organization and naming conventions are critical.
- Your data can get out of hand very quickly!
progression of Iyer Lab datasets over time:
- 2008 – Yeast heat shock remodeling of chromatin
- 2 yeast datasets
- less than 2 million sequences
- 2010 – Allelic bias in CTCF binding
- 13 CTCF datasets from 3 GM cell lines
- ~200 million sequences
- 2012 – Transcription factor data analysis (ENCODE2)
- 32 ChIP-seq datasets gathered over 3 years (3 TFs across 11 cell lines)
- ~ 1 billion sequences
- 2013 – miRNA overexpression effects
- 42 RNAseq datasets (7 conditions)
- ~ 2.6 billion sequences
- 2014 – eQTL analysis of CTCF binding
- 52 very deeply sequenced CTCF datasets
- ~ 8 billion sequences
- 2018 – Functional analysis of glioblastoma tumors and cell lines
- nearly 500 datasets in total (ChIP-seq, RNAseq, miRNAseq, 4C, exome/genome sequencing)
- > 22 billion sequences
Welcome to the University Wiki Service! Please use your IID (yourEID@eid.utexas.edu) when prompted for your email address during login or click here to enter your EID. If you are experiencing any issues loading content on pages, please try these steps to clear your browser cache.