The current issue of Cell has some important correspondence in response to an essay published by Jon McClellan and Mary Claire King in April. Daniel covered the original piece and hosted a guest post from Kai Wang which detailed some of the more obvious flaws in their argument. Now, Wang and his colleagues from Philadelphia have published an official response in Cell, in parallel with a similar letter from Robert Klein and colleagues from New York. Accompanying these is a further reply from McClellan and King. Read on for an overview of three contentious statements made in the original piece, and the rebuttals to each.
[In response to a comment, I’ve added the most representative single sentence quotation I could find from the original McClellan and King essay next to each of their claims as I’ve expressed them. –JB]
- Claim: Disease predisposing alleles cannot circulate at high frequency in human populations because negative selection is constantly trying to purge them. [“In order to be maintained at polymorphic frequencies worldwide, common variants with even modest influence on disease must withstand selective pressure in every generation.”]
Reply: Selection always acts within an environmental context, and the environment in which alleles identified by GWAS are deleterious (i.e. Western countries in the last hundred years) is starkly different from the environment in which nearly all of human evolution (indeed evolution of any species) has taken place. Furthermore, balancing (rather than strictly negative) selection might play an important role in many GWAS hits (there are numerous examples documented where one allele at a locus simultaneously increases risk of one condition and protects from another). Finally, GWAS hits have a very weak effect on disease risk, and many of the diseases in question are relatively late-onset. These factors combine to mean that the net selective disadvantage is weak, and sweeping the alleles out of the population will take a long time even in a relatively stable environment (let alone a rapidly changing one).
- Claim: Most GWAS hits are intronic or intergenic, and thus can’t possibly be pointing at something functional. [“A major limitation of genome-wide association studies is the lack of any functional link between the vast majority of risk variants and the disorders they putatively influence.”]
Reply: This has to be the most bizarre of McClellan and King’s claims. First, King herself posited in a famous 1975 Science paper that regulation, rather than protein coding changes likely explain many phenotypic differences. Second, the fundamental design of GWAS relies on the fact that SNPs actually studied aren’t necessarily causative, but are correlated with unknown causal alleles. We certainly haven’t pegged down all the biology underlying GWAS hits, but that to me is the most exciting part of ongoing analysis of these studies: it would be boring indeed if they had all easily mapped to nonsynonymous coding SNPs in candidate genes. - Claim: Many (most? all?) GWAS hits are due to cryptic population structure. [“We further suggest that many GWAS findings stem from factors other than a true association with disease risk.”]
Reply: Wang et al point out that GWAS practitioners generally bend over backwards to address possible population stratification, and there exist well known and widely used methods. The particular example that McClellan and King harped on was, in fact, studied largely in family based samples which are immune to population stratification. Furthermore, the “evidence” that the SNP in question varies widely in frequency across Europe was based on absurdly small sample sizes (McClellan and King neglected to put any confidence intervals on their published estimates of allele frequency in different parts of Europe). Examination in an enlarged sample set from one such population reveals that McClellan and King’s estimate of 71% frequency in Tuscans is actually 41%, which is remarkably similar to the 39% estimate elsewhere in Europe.
Seriously? I think the entire community would be jumping up and down in delight. “it would be boring indeed if they had all easily mapped to nonsynonymous coding SNPs in candidate genes.”
@Leonid, well that would certainly make it easy! I meant it would be boring from the perspective of having interesting scientific challenges to think about.
It was good to see those responses in Cell today and thanks for elaborating. I agree that point 2 was rather bizarre, it was actually an amazing thing to assert by scientists with such a reputation, especially without any evidence to back it up. There are at least 2 reasons why intronic or intergenic can be important a) by LD, which has been established and b) by long range effects based on structure. DNA is highly coiled – regions that are megabases apart are in close proximity and could influence each other. This I believe is less well established although I think there is some evidence but don’t have time to find it right now! I just think that you can come up with infinite hypotheses to discount LD etc, but until any are backed up with better evidence they should just be ignored, certainly I don’t think that that should appear in Cell.
@Jeff, I don’t think we’ll run out of those any time soon. But the M/K response in Cell has a definite “Through the Looking Glass” quality.
Regarding point #1, I think it’s worth emphasizing more (you allude to this) that negative selection isn’t all-powerful. Imagine that there are 10,000 places in the genome which, if mutated, alter disease risk by some small amount. Perhaps these mutations are slightly deleterious, but at equilibrium, their allele frequencies will have a *distribution*–some will be polymorphic, and of those that are polymorphic, some will be common and most rare. The precise distribution depends on the selection coefficient (which depends on the effect size of the mutation and how deleterious the disease is), but for variants of small effect, I doubt any of the selection coefficients are large enough to shift the distribution so that we expect no common variants at all.
A common but not necessarily honorable strategy in debate is to first simplify and state over-strongly your opponent’s position. I think some of that is going on in the re-stated claims summarized above (“cannot,” “can’t possibly” in 1 and 2; the “?”s in 3 which show some deliberation about how strongly to re-state). Something about the sweeping nature of GWAS itself, perhaps, leads people to make mildly overly-sweeping claims, which just gives the other side an easy toe-hold to engage in this strategy. One way to maintain caution is always to quote, rather than re-state, the other’s position. Another is to concentrate on specific studies and their local, not global, interpretation.
@Stephen, that’s a fair criticism. I’ve amended the post to include the most similar single sentence quotation to my statement of their claims.
These excerpts only hint at the tone of their essay, however, which makes a number of statements about the genetic architecture of complex disease which are stunning in their sweeping nature and lack of evidence presented to support them, such as:
“It is now clear that common risk variants fail to explain the vast majority of genetic heritability for any human disease, either individually or collectively”
“The general failure to confirm common risk variants is not due to a failure to carry out GWAS properly.”
“If common alleles influenced common diseases, many would have been found by now.”
Regarding Claim 1 – Jeff is right and M&K are not, by a long way. In a population of 20 SNPs can come and go rapidly, in a population of millions they cannot, not rapidly, if at all – not these sort of low impact SNPs. Even if a couple produced 10 children with new SNPs (actually they would be mutations) that gave a RR for T2DM of 0.5 how many generations would be necessary for them to become proper SNPs (present in >1% pop)? Would they ever? Natural selection does seem to have operated on some common SNPs like lactose persistence, but that was approx 8,000 yrs ago. Also the T allele of MTHFR C677T is much more prevalent in Southern Europe compared to Northern Europe, there is a gradient of about 40% down to approx 5%. Again though, this happened several thousand years ago during the migrations of small founder populations from South to North (and is considered to be due to declining folate content in the diet moving north).
The other point is that SNPs associated with disease can also have beneficial effects – both negative and positive effects usually depend on the environment.
Claim 2 – the lack of a functional link is a major limitation but not for the M&K reasons. It’s a limitation because if all the SNPs found were actually functional then we would learn a lot more about the biology. Of course it is also possible that a lot of them ARE functional, we just don’t know how yet.
functionality is the key issue. active ( rna ) of all types. just more data to say that whole /rna/ome is required. no more cel based work.full whole phenotype work must be considered as the only valid work. no deep sequencing = no valid data
Why don’t we hear more about epigenetics?