Practical advice - short read re-sequencing data
- Inconsistent alignment at indels
- Example 1: ybaL mutation at 475,292 in REL8593A sample.
- Example 2:
- Misalignment across structural variants
- Example 1: gltB mutation at 3,289,962 in REL8593A sample?
- Example 2: rbs mutation at 3,289,962 in REL8593A sample?
- Mismapping of reads not present in reference genome
- Dark matter: repetitive genomic regions
- Reference-related issues:
- Chromosome names MUST MATCH EXACTLY in all input data files - reference genome, genes, SNP databases, etc. Don't assume these all follow one convention. It's common to find chromosomes simply numbered 1, 2, 3, etc. It's also common to find them named "chr1", "chr2", "chr3"...
- Some files, such as BED files, call the first base of a chromosome "base 0". Others, like SAM/BAM files, call the first base "base 1". The UCSC genome browser maintains a nice list of these details.
- For the human genome, this web site has a nice cheat sheet in case you get in trouble.
Welcome to the University Wiki Service! Please use your IID (yourEID@eid.utexas.edu) when prompted for your email address during login or click here to enter your EID. If you are experiencing any issues loading content on pages, please try these steps to clear your browser cache.