Posts Tagged ‘association mapping’

Last week in the blogs…

Tuesday, April 21st, 2009

Oh man, these regular summaries are slipping … it used to be Sunday, then it was Monday, now it is Tuesday.  I hope I can catch up with it next week, but the last week was just crazy with work, and it isn’t exactly slowing down this week…

Anyway, her goes.

Association mapping

Computing

Evolution

Genetics

Programming

Research Life

111-130=-19

New association mapping paper out

Sunday, March 15th, 2009

We just got another paper out — well, out in “online access” — about association mapping for interacting genes.

Using biological networks to search for interacting loci in genomwide association studies

M. Emily et al, European Journal of Human Genetics

Genome-wide association studies have identified a large number of single-nucleotide polymorphisms (SNPs) that individually predispose to diseases. However, many genetic risk factors remain unaccounted for. Proteins coded by genes interact in the cell, and it is most likely that certain variants mainly affect the phenotype in combination with other variants, termed epistasis. An exhaustive search for epistatic effects is computationally demanding, as several billions of SNP pairs exist for typical genotyping chips. In this study, the experimental knowledge on biological networks is used to narrow the search for two-locus epistasis. We provide evidence that this approach is computationally feasible and statistically powerful. By applying this method to the Wellcome Trust Case–Control Consortium data sets, we report four significant cases of epistasis between unlinked loci, in susceptibility to Crohn’s disease, bipolar disorder, hypertension and rheumatoid arthritis.

This was work we did in the PolyGene project, where one of the problems we considered was detecting disease association when the association is caused by interacting genes.  Testing all combinations of markers is computationally infeasible, to say nothing of multiple testing correction, so we consider only pair-wise interaction.

Even when only considering pairs, handling all of them can be a problem.  If you have half a million markers, you have about 125 billion pairs.  Reducing the number of pairs to consider thus might be worth doing.

One option is to check only pairs where one or both of the individual markers show some association signal.  Another option is to use knowledge about interactions and only consider pairs of markers that a priori are known to interact.

The latter is what we do in this paper.  We use an interaction network to decide which markers are candidates for interaction, and then we test only those.


Emily, M., Mailund, T., Hein, J., Schauser, L., & Schierup, M. (2009). Using biological networks to search for interacting loci in genome-wide association studies European Journal of Human Genetics DOI: 10.1038/ejhg.2009.15

74-92=-18

New paper out

Wednesday, March 4th, 2009

We just got a new paper out yesterday in BMC Medical Genetics:

Haplotype frequencies in a sub-region of chromosome 19q13.3, related to risk and prognosis of cancer, differ dramatically between ethnic groups

Schierup et al.

BMC Medical Genetics 2009, 10:20 doi:10.1186/1471-2350-10-20

Abstract

Background

A small region of about 70 kb on human chromosome 19q13.3 encompasses 4 genes of which 3, ERCC1, ERCC2, and PPP1R13L (aka RAI) are related to DNA repair and cell survival, and one, CD3EAP, aka ASE1, may be related to cell proliferation. The whole region seems related to the cellular response to external damaging agents and markers in it are associated with risk of several cancers.

Methods

We downloaded the genotypes of all markers typed in the 19q13.3 region in the HapMap populations of European, Asian and African descent and inferred haplotypes. We combined the European HapMap individuals with a Danish breast cancer case-control data set and inferred the association between HapMap haplotypes and disease risk.

Results

We found that the susceptibility haplotype in our European sample had increased from 2 to 50 percent very recently in the European population, and to almost the same extent in the Asian population. The cause of this increase is unknown. The maximal proportion of overall genetic variation due to differences between groups for Europeans versus Africans and Europeans versus Asians (the Fst value) closely matched the putative location of the susceptibility variant as judged from haplotype-based association mapping.

Conclusions

The combined observation that a common haplotype causing an increased risk of cancer in Europeans and a high differentiation between human populations is highly unusual and suggests a causal relationship with a recent increase in Europeans caused either by genetic drift overruling selection against the susceptibility variant or a positive selection for the same haplotype. The data does not allow us to distinguish between these two scenarios. The analysis suggests that the region is not involved in cancer risk in Africans and that the susceptibility variants may be more finely mapped in Asian populations.

Mikkel and I got involved in the project to try to use our haplotype based association mapping methods to analyse data where a single marker analysis had already shown an association with several kinds of cancer.

We didn’t really discover anything new when running our tools on the data, so to try something else we combined the case/control data with HapMap data to try to increase the number of markers through imputation.

That is when we discovered that a haplotype in the region, that is found in about 50% of Europeans (CEU and our case/control data) is only found in ~1% of Africans (YRI).  Furthermore, this haplotype was the at-risk haplotype in our case/control data and looks to be the derived haplotype when compared with the chimp genome.

Reference

Mikkel H Schierup, Thomas Mailund, Heng Li, Jun Wang, Anne Tjonneland, Ulla Vogel, Lars Bolund, Bjorn A Nexo (2009). Haplotype frequencies in a sub-region of chromosome 19q13.3, related to risk and prognosis of cancer, differ dramatically between ethnic groups BMC Medical Genetics, 10 (1) DOI: 10.1186/1471-2350-10-20

63-82=-19

QBlossoc

Wednesday, February 25th, 2009

We have just published a paper on our Blossoc method in the latest issue of Genetics:

Local Phylogeny Mapping of Quantitative Traits: Higher Accuracy and Better Ranking Than Single-Marker Association in Genomewide Scans

Besenbacher, Mailund and Schierup, Genetics, Vol. 181, 747-753, February 2009, Copyright © 2009 doi:10.1534/genetics.108.092643

Abstract

We present a new method, termed QBlossoc, for linkage disequilibrium (LD) mapping of genetic variants underlying a quantitative trait. The method uses principles similar to a previously published method, Blossoc, for LD mapping of case/control studies. The method builds local genealogies along the genome and looks for a significant clustering of quantitative trait values in these trees. We analyze its efficiency in terms of localization and ranking of true positives among a large number of negatives and compare the results with single-marker approaches. Simulation results of markers at densities comparable to contemporary genotype chips show that QBlossoc is more accurate in localization of true positives as expected since it uses the additional information of LD between markers simultaneously. More importantly, however, for genomewide surveys, QBlossoc places regions with true positives higher on a ranked list than single-marker approaches, again suggesting that a true signal displays itself more strongly in a set of adjacent markers than a spurious (false) signal. The method is both memory and central processing unit (CPU) efficient. It has been tested on a real data set of height data for 5000 individuals measured at ~317,000 markers and completed analysis within 5 CPU days.

The method works very similarly to our first paper on Blossoc.  Running along the genome, we infer local genealogies and then scores the region according to how well the genealogy explains the phenotype under consideration.

What is new in this paper is the way we score threes when the phenotype is quantitative rather than qualitative (case/control status).

Also just published this month is results from the QTLMAS XII workshop held last year.  As part of this workshop, a dataset with genome-wide genetic data and a quantitative phenotype was simulated, and groups could then compete in mapping the quantitative traits (Crooks et al. 2009).

One group used Blossoc.  No, it wasn’t us, but Ledur et al (2009).  I am pretty proud to learn that Blossoc was considered the best performing method on this data.

Crooks et al’s conclusion:

In this dataset, the best methods for detecting QTL were Blossoc [11] followed by a Bayesian linkage analysis [7], both of which used information from multiple markers to infer QTL genotypes. The two studies that aimed to increase the efficiency of QTL detection by reducing the amount of analysis had lowest power and were not effective in identifying the QTL with the largest effects. Estimates of QTL location were generally very good. There were bigger differences in how well the methods estimated the QTL effects. Here, two of the models that were most accurate used single markers in place of QTL genotype and simultaneously fit a polygenic effect. Although in this case estimates from a single locus model were as accurate as from a multilocus model, fitting multiple loci should allow closely linked QTL to be distinguished. A valuable approach might be to first locate QTL by a multimarker/haplotype method and then fit the closest markers in a multilocus model, to estimate QTL effects. For future such projects, we recommend that participants provide a list of their top-ranked effects, and report confidence intervals for QTL location and effect size estimates. Areas that we suggest for further work include significance thresholds, closely linked QTL and epistatic effects.

Actually, we do not try to estimate QTL effects in our method.  I simply hadn’t thought about that until I read the conclusions from QTLMAS XII.  Well, that leaves a topic for future work…

  1. S. Besenbacher, T. Mailund, M. H. Schierup (2008). Local Phylogeny Mapping of Quantitative Traits: Higher Accuracy and Better Ranking Than Single-Marker Association in Genomewide Scans Genetics, 181 (2), 747-753 DOI: 10.1534/genetics.108.092643
  2. Lucy Crooks, Goutam Sahana, Dirk-Jan de Koning, Mogens Sandø Lund, Örjan Carlborg (2009). Comparison of analyses of the QTLMAS XII common dataset. II: genome-wide association and fine mapping BMC Proceedings
  3. Mônica Corrêa Ledur, Nicolas Navarro, Miguel Pérez-Enciso (2009). Data modeling as a main source of discrepancies in single and multiple marker association methods BMC Proceedings

56-76=-20

Day one of APBC

Tuesday, January 13th, 2009

We are now half-way through day one of APBC.

So far it has been fun.

Association mapping tutorial

The day started with tutorials. Four tutorials in two tracks.  For the first tutorial, we attended Matthew Stephens’ tutorial on Bayesian and imputation methods for association mapping.

It was a very nice tutorial.  He started out with presenting the general setup in a genome wide association study — so nothing new here for us — and then talked about Bayesian approaches and the benefit of using those.

Mainly motivated by the problem that a p-value does not tell you anything about the probability of there being no association — which is what you are probably interested in knowing — and does not capture uncertainty in the data caused by sample size of minor allele frequency.  Well, in some way it does, since the power depends on sample size and MAF, but under the null model all p-values are equally likely, so the p-value in itself does not tell you about the probability of the null model being true.

In the Bayesian setup, you can calculate the probability of the null being true, the false discovery rate, fdr = P(null-model | data), from the posterior odds, PO = P(alternative-model | data) / P(null-model | data), since fdr = 1/(1+PO).

The posterior odds, of course, depends on the Bayes’ factor and the prior odds, but makes it easy to explicitly quantify the belief that the posterior or alternative model is the true model.

He spent a lot of time on ways of testing single markers for association in this framework, so very little time was left for multi-locus methods — which he skipped — and imputation methods — covered very quickly.

While I would have preferred to hear more about the last two topics, I still very much enjoyed the tutorial.

Off for coffee

In the break between tutorials Besenbacher and I talked imputation with Matthew, so we missed the beginning of the next tutorial session and decided to skip the next tutorial and instead the three of us went for coffee in a nearby Starbucks and continue our association mapping discussion.

Afternoon

The afternoon program started with a keynote talk by David Lipman on the molecular evolution of Influenza.

I’m personally very fascinated by influenza, especially because of the global pandemics it has caused, so this was a very interesting talk for me.

The last part of the afternoon is the first session with paper presentations, but the jet lag is kicking in and that — combined with the high temperature in the lecture hall — makes it hard to stay awake.

We’ve decided to skip this session and rest a bit, so we are ready for the reception in the evening.

13-15 = -2