Are synthetic associations a man-made phenomenon?

Early last year David Goldstein and colleagues published a provocative paper claiming that many GWAS associations are driven not by common variants of modest effect (the canonical common disease – common variant hypothesis underpinning GWAS) but instead by a local cluster of lower frequency  variants that have much bigger effects on disease risk. They dubbed this hypothesized phenomenon “synthetic association” and the term quickly became a genetics buzzword. The paper was widely discussed in both the specialist and mainstream media, and caused quite a stir among academic statistical geneticists.

That debate has been re-opened today by a set of Perspectives in PLoS Biology: a rebuttal by us (Carl & Jeff) and our colleagues at Sanger, a rebuttal by Naomi Wray, Shaun Purcell and Peter Visscher, a rebuttal to the rebuttals by David Goldstein and an editorial by Robert Shields to tie it all together.

What are the messages from all this? Well, we argue in our piece that several lines of evidence suggest that synthetic associations, while plausible, aren’t very common. First, family based linkage studies (like the ones used to identify BRCA1CFTR and other culprits behind single-gene disorders), which were remarkably unsuccessful in studying complex disease, are well powered to pick up the kind of genetic model underlying synthetic association. Second, attempts to find these rarer ‘smoking gun’ mutations (e.g. by completely sequencing many patients in GWAS regions) haven’t turned up much yet. Finally, the synthetic association hypothesis would predict that GWAS hits are ancestry-specific (e.g. genes found in Europeans wouldn’t turn up in a study of Japanese), whereas nearly all GWAS results studied in sufficient depth have replicated across many populations.

Interestingly there is a well documented example of a synthetic association that we’ve worked extensively on: NOD2 and Crohn’s disease is a GWAS hit driven by three nearby low-frequency, large effect variants. It conveniently also illustrates all of our points above: it was originally discovered by linkage, the three coding variants were discovered by resequencing and it is not associated in East Asia. For these reasons and more, NOD2 is an outlier from the GWAS experience, underlining the likelihood that such occurrences are rare.

The Wray et al paper provides an even more technical critique, showing that neither the allele frequency distribution, nor the number of independent associations predicted by the synthetic association model are consistent with the bulk of GWAS observations. In reply, Goldstein presents a series of logical arguments which he asserts contradict some of the data presented in the other two papers. He usefully presents a number of points where all parties agree: GWAS were useful and valuable information about disease genetics has been learned, and synthetic associations are theoretically possible. He maintains, however, that the question of how widespread they are is still unresolved. We obviously disagree with this notion, but are glad that PLoS Biology has put together a nice series of articles arguing both sides of the question.

[Added in edit: Razib Khan has additional background on synthetic associations, and a dissection of the paper by Wray et al., over at Discover.]

  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • FriendFeed
  • Reddit

2 Responses to “Are synthetic associations a man-made phenomenon?”


  • Shane McKee

    Jeff, I meant to say hi at the DDD research forum today; loving this blog. I confess to feeling heavily out of my depth when discussing these problems, but isn’t the main problem not that the majority of GWAS have been successfully replicated in different populations, but only those that *have* been successfully replicated have been shown to be replicable in different populations? Many results remain unreplicated and may be type 1 errors (I don’t have the figures, and I may be totally over-egging the pudding here!).

    That said, supposing synthetic association is common (and probabilistically this strikes me as unlikely), isn’t it still the case that the data erroneously lead you to the right gene, rather than the wrong gene? In which case, does it really matter? And when you sequence the gene in multiple cases (SNP typing being *so* last decade), all will be revealed?

    OK, I get the point that we’re still chasing the missing heritability, but the only way we’re likely to find it is by using the *non*missing heritability to unpick the biology, and then design better candidate gene studies (likely by in silico analyses of exome/genome data)?

    Sorry if these questions are a tad random and ignorant, but I have a hard time bending the old brain around these complex statistical issues, yet I think they’re important. Thanks for flagging these up – that’s my night-time reading sorted…

    Cheers,
    -Shane

  • Jeff, I meant to say hi at the DDD research forum today; loving this blog. I confess to feeling heavily out of my depth when discussing these problems, but isn’t the main problem not that the majority of GWAS have been successfully replicated in different populations, but only those that *have* been successfully replicated have been shown to be replicable in different populations? Many results remain unreplicated and may be type 1 errors (I don’t have the figures, and I may be totally over-egging the pudding here!).

    That said, supposing synthetic association is common (and probabilistically this strikes me as unlikely), isn’t it still the case that the data erroneously lead you to the right gene, rather than the wrong gene? In which case, does it really matter? And when you sequence the gene in multiple cases (SNP typing being *so* last decade), all will be revealed?

    OK, I get the point that we’re still chasing the missing heritability, but the only way we’re likely to find it is by using the *non*missing heritability to unpick the biology, and then design better candidate gene studies (likely by in silico analyses of exome/genome data)?

    Sorry if these questions are a tad random and ignorant, but I have a hard time bending the old brain around these complex statistical issues, yet I think they’re important. Thanks for flagging these up – that’s my night-time reading sorted…

    Cheers,
    shane mae

Comments are currently closed.

Page optimized by WP Minify WordPress Plugin