From GenoSeq

Jump to: navigation, search

Genotype Data Merge Software

There is a growing need to merge data within and between genotyping centers and to merge current data with legacy data sets. The resulting larger sample sizes confer greater statistical power. Accurate merging is particularly important for successful association studies. However, manually merging data is time-consuming and error prone. The increasing frequency of collaboration between genotyping centers calls for automation in the interest of saving time and improving accuracy. Some factors that complicate merging include differences between genotyping hardware; binning methods; molecular weight standards; and curve fitting algorithms. The result is that genotypes from different sources do not differ by the same amount, both between and within markers. This makes it insufficient to align genotypes by adding a constant number of base pairs to the alleles of one of the data sets, even when considering individual markers. Other factors making it difficult to accurately merge data include few samples in common, and samples drawn from different ethnic groups, in which allele frequencies may vary.

To address these issues, we have developed a Bayesian model and a Markov chain Monte Carlo (MCMC) algorithm for sampling the posterior distribution under the model. These algorithms are implemented in software which computes the allelic alignments with the greatest posterior probabilities under several merging options. The software evaluates the characteristics of the input data sets, allows user-specified merging options, and includes an error analysis to assign a merge-quality score. If there is too little information to confidently merge data sets, the software will recommend that they be analyzed separately.

Presson A, Sobel E, Lange K, Papp JC. (2006) Merging Microsatellite Data Journal Of Computational Biology 13:1131

Angela Presson with her Data Merging Poster at the American Society of Human Genetics Meeting
Angela Presson with her Data Merging Poster at the American Society of Human Genetics Meeting
Views
Personal tools
+