Posts Tagged ‘Blossoc’

QBlossoc

Wednesday, February 25th, 2009

We have just published a paper on our Blossoc method in the latest issue of Genetics:

Local Phylogeny Mapping of Quantitative Traits: Higher Accuracy and Better Ranking Than Single-Marker Association in Genomewide Scans

Besenbacher, Mailund and Schierup, Genetics, Vol. 181, 747-753, February 2009, Copyright © 2009 doi:10.1534/genetics.108.092643

Abstract

We present a new method, termed QBlossoc, for linkage disequilibrium (LD) mapping of genetic variants underlying a quantitative trait. The method uses principles similar to a previously published method, Blossoc, for LD mapping of case/control studies. The method builds local genealogies along the genome and looks for a significant clustering of quantitative trait values in these trees. We analyze its efficiency in terms of localization and ranking of true positives among a large number of negatives and compare the results with single-marker approaches. Simulation results of markers at densities comparable to contemporary genotype chips show that QBlossoc is more accurate in localization of true positives as expected since it uses the additional information of LD between markers simultaneously. More importantly, however, for genomewide surveys, QBlossoc places regions with true positives higher on a ranked list than single-marker approaches, again suggesting that a true signal displays itself more strongly in a set of adjacent markers than a spurious (false) signal. The method is both memory and central processing unit (CPU) efficient. It has been tested on a real data set of height data for 5000 individuals measured at ~317,000 markers and completed analysis within 5 CPU days.

The method works very similarly to our first paper on Blossoc.  Running along the genome, we infer local genealogies and then scores the region according to how well the genealogy explains the phenotype under consideration.

What is new in this paper is the way we score threes when the phenotype is quantitative rather than qualitative (case/control status).

Also just published this month is results from the QTLMAS XII workshop held last year.  As part of this workshop, a dataset with genome-wide genetic data and a quantitative phenotype was simulated, and groups could then compete in mapping the quantitative traits (Crooks et al. 2009).

One group used Blossoc.  No, it wasn’t us, but Ledur et al (2009).  I am pretty proud to learn that Blossoc was considered the best performing method on this data.

Crooks et al’s conclusion:

In this dataset, the best methods for detecting QTL were Blossoc [11] followed by a Bayesian linkage analysis [7], both of which used information from multiple markers to infer QTL genotypes. The two studies that aimed to increase the efficiency of QTL detection by reducing the amount of analysis had lowest power and were not effective in identifying the QTL with the largest effects. Estimates of QTL location were generally very good. There were bigger differences in how well the methods estimated the QTL effects. Here, two of the models that were most accurate used single markers in place of QTL genotype and simultaneously fit a polygenic effect. Although in this case estimates from a single locus model were as accurate as from a multilocus model, fitting multiple loci should allow closely linked QTL to be distinguished. A valuable approach might be to first locate QTL by a multimarker/haplotype method and then fit the closest markers in a multilocus model, to estimate QTL effects. For future such projects, we recommend that participants provide a list of their top-ranked effects, and report confidence intervals for QTL location and effect size estimates. Areas that we suggest for further work include significance thresholds, closely linked QTL and epistatic effects.

Actually, we do not try to estimate QTL effects in our method.  I simply hadn’t thought about that until I read the conclusions from QTLMAS XII.  Well, that leaves a topic for future work…

  1. S. Besenbacher, T. Mailund, M. H. Schierup (2008). Local Phylogeny Mapping of Quantitative Traits: Higher Accuracy and Better Ranking Than Single-Marker Association in Genomewide Scans Genetics, 181 (2), 747-753 DOI: 10.1534/genetics.108.092643
  2. Lucy Crooks, Goutam Sahana, Dirk-Jan de Koning, Mogens Sandø Lund, Örjan Carlborg (2009). Comparison of analyses of the QTLMAS XII common dataset. II: genome-wide association and fine mapping BMC Proceedings
  3. Mônica Corrêa Ledur, Nicolas Navarro, Miguel Pérez-Enciso (2009). Data modeling as a main source of discrepancies in single and multiple marker association methods BMC Proceedings

56-76=-20

Replicating haplotype findings

Tuesday, August 26th, 2008

I have a small problem.

We have analysed some cancer data from DeCODE as part of the association mapping project PolyGene. We used Blossoc for this and we found some candidate regions worth examining further.

We have access to samples from Spain and the Netherlands, and we want to try to replicate the findings there. Now the problem is how to choose a strategy for replication.

Blossoc is a haplotype method that tries to infer the local genealogy in a region and then examines the clustering of phenotypes on this genealogy. The problem with such an approach is that you really need an entire region to replicate to try to do the same trick in the replication population. This means typing a lot of markers in the replication sample (expensive) and potentially correcting for a lot of tests (reducing power). It is not really the way to go.

We extended Blossoc to output what it considers the most important SNPs in the genealogy inference in each interesting region. This should contain the most important SNPs in the regions for the replication, and gave us 2-6 SNPs per candidate region (with only 43 SNPs all in all for three diseases, so not a small reduction).

We have typed these SNPs in the replication population, but now we need to figure out how to try to replicate the findings with only that.

It goes without saying that we need to decide exactly what to test for based on the original data. If we start searching for significant signals in the new data we are no longer replicating but data trawling and the risk of false positives drastically increases.

I have a program for listing all haplotype patterns in a data set and testing them for association, and I can run that on the old data to pick the patterns to test for in the new data.  There is a tradeoff, though, between association scores and the complexity of the pattern.  There is bound to be some overfitting in the old data, and we want to avoid that in the patterns to replicate.

It is a tricky problem…

QBlossoc

Monday, June 16th, 2008

I’ve just made a release of our association mapping tool, Blossoc.  The new release, nicknamed QBlossoc, adds full support for quantitative phenotypes.  We’ve had support for quantitative phenotypes for a while now, but this version is the “official” release for it, with tuned default options and such.

It is mainly the work of Søren Besenbacher, who’s been running simulation experiments for the last several months to figure out which scoring methods, and with which parameters, works best.

Although the quantitative traits method hasn’t been published yet — we only submitted the paper last week — it has already been used by Monica Ledur et al. to analyse the data set from the XII QTLMAS workshop in Uppsala.  There should be a paper coming out on that as well.  I don’t know its status, but Monica was kind enough to send the manuscript to me.