Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleDiscussion of the output

Here is my interpretation of the data:

1) This method effectively looks at a very narrow genomic region, probably within a homologous recombination block.

2) The most telling data: the child will have heterozygous SNPs from two homozygous parents.

3) So all this data is consistent with column 1 (NA12878) being the child:

	 12 0/0	0/1	0/0
5 0/0 0/1 0/1
4 0/1 0/0 0/1
8 0/1 0/0 1/1
43 0/1 0/1 0/0
24 0/1 1/1 0/0

"Outlier" data are:

      3 0/1	0/0	0/0
      1 1/1	0/1	0/0
 

This is, in fact, the correct assessment - NA12878 is the child.

  

This same type of analysis can be done on much larger cohorts, say cohorts of 100 or even 1000s of individuals with known disease state to attempt to identify associations between allelic state and disease state.

 

 

 

 

Warning
titleThis seems broken and confused

STOP HERE UNTIL FIXED

Code Block
languagebash
titleYou can examine some of the variant calls with:
collapsetrue
more trios_tutorial.raw.vcf

Note the extensive header information, followed by an ordered list of variants the same as you saw at the end of the samtools tutorial yesterday. 

 

bcftools auxiliary perl script called vcfutils.pl can provide some useful stats. This is not part of a standard TACC module but it's in our common $BI/bin directory.

...

Undoubtedly, you this looks like nothing but gibberish to you and understandably so, a tab delimited output without headers is rarely if ever useful. It is unclear exactly what the column headings are

 

Here is an honest write-up about the Ts/Tv metricreferencing this 2002 paper by Ebersberger.  Bottom line: between about 2.1 and 2.8 is OK for Ts/Tv (also called Ti/Tv). You can also get some quick stats with some linux one-liners on this page; there are more thorough analysis programs built to work with vcf's.

...