Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
titlesamtools multi-sample variants: separate bam files
mkdir $SCRATCH/GVA_Human_trios/multi_sample
cd $SCRATCH/GVA_Human_trios
samtools mpileup -uf $SCRATCH/GVA_Human_trios/raw_files/ref/hs37d5.fa \
  $SCRATCH/GVA_Human_trios/raw_files/NA12878.chrom20.ILLUMINA.bwa.CEU.exome.20111114.bam \
  $SCRATCH/GVA_Human_trios/raw_files/NA12891.chrom20.ILLUMINA.bwa.CEU.exome.20111114.bam \
  $SCRATCH/GVA_Human_trios/raw_files/NA12892.chrom20.ILLUMINA.bwa.CEU.exome.20111114.bam \
    | bcftools call -v -c - > multi_sample/trios_tutorial.all.samtools.vcf

This command again will generate very little output and take ~10 minutes to complete.

Identify the lineage

If genetics works, you should be able to identify the child based strictly on the genotypes.  Can you do it?

...

Given this information can you make any determination about family structure for these 3 individuals?

...

titleDiscussion of the output

...

The first column is the number of occurrences generated by the uniq -c command in our large 1 liner, with the following 3 columns being the different individual samples. So consider that while some lines may not show a viable mendelian inheritance pattern, you should weight things according to how many times each scenario occurred as our filtering was fairly limited.

Expand
titleDiscussion of the output

Overall this data is consistent with column 1 (NA12878) being the child. Lines marked with an * are inconsistant:

     34 0/1	0/1	0/0
     20 0/1	0/0	0/1
     20 0/0	0/1	0/0
     14 0/1	1/1	0/0	# middle can't be child
      6 0/0	0/1	0/1
      4 1/1	1/1	0/0	* No mendelian combination exists
      1 1/1	0/1	0/0	*
      1 0/1	0/0	0/0	*
 

This is, in fact, the correct assessment - NA12878 is the child.

Going further

Can you modify the large 1 liner command to be more strict, or to include more examples such that you eliminate the non-mendelian inheritance situations and consider a larger number of loci?

This same type of analysis can be done on much larger cohorts, say cohorts of 100 or even 1000s of individuals with known disease state to attempt to identify associations between allelic state and disease state.


Return to GVA2020 page.