Size matters, and other lessons from medical genetics

Size really matters: prior to the era of large genome-wide association studies, the large effect sizes reported in small initial genetic studies often dwindled towards zero (that is, an odds ratio of one) as more samples were studied. Adapted from Ioannidis et al., Nat Genet 29:306-309.

[Last week, Ed Yong at Not Exactly Rocket Science covered a paper positing an association between a genetic variant and an aspect of social behavior called prosociality. On Twitter, Daniel and Joe dismissed this study out of hand due to its small sample size (n = 23), leading Ed to update his post. Daniel and Joe were then contacted by Alex Kogan, the first author of the study in question. He kindly shared his data with us, and agreed to an exchange here on Genomes Unzipped. In this post, we expand on our point about the importance of sample size; Alex’s reply is here.

Edit 01/12/11 (DM): The original version of this post included language that could have been interpreted as an overly broad attack on more serious, well-powered studies in psychiatric disease genetics. I’ve edited the post to reduce the possibility of collateral damage. To be clear: we’re against over-interpretation of results from small studies, not behavioral genetics as a whole, and I apologise for any unintended conflation of the two.]

In October of 1992, genetics researchers published a potentially groundbreaking finding in Nature: a genetic variant in the angiotensin-converting enzyme ACE appeared to modify an individual’s risk of having a heart attack. This finding was notable at the time for the size of the study, which involved a total of over 500 individuals from four cohorts, and the effect size of the identified variant–in a population initially identified as low-risk for heart attack, the variant had an odds ratio of over 3 (with a corresponding p-value less than 0.0001).

Readers familiar with the history of medical association studies will be unsurprised by what happened over the next few years: initial excitement (this same polymorphism was associated with diabetes! And longevity!) was followed by inconclusive replication studies and, ultimately, disappointment. In 2000, 8 years after the initial report, a large study involving over 5,000 cases and controls found absolutely no detectable effect of the ACE polymorphism on heart attack risk. In the meantime, the same polymorphism had turned up in dozens of other association studies for a wide range of traits ranging from obstet­ric cholestasis to menin­go­­coccal disease in children, virtually none of which have ever been convincingly replicated.

The ACE story is not unique; time and time again, initial reports of associations between candidate genes and complex diseases failed to replicate in subsequent studies. With the benefit of hindsight, the problem is clear: in general, common genetic polymorphisms have very small effects on disease risk. Detecting these subtle effects requires studying not dozens or hundreds, but thousands or tens-of-thousands of individuals. Smaller studies, which had no power to detect these small effects, were essentially random p-value generators. Sometimes the p-values were “significant” and sometimes not, without any correlation to whether a variant was truly associated. Additionally, since investigators were often looking at only a few variants (often just one!) in a single gene that they strongly believed to be involved in the disease, they were often able to subset the data (splitting males and females, for example) to find “significant” results in some subgroup. This, combined with a tendency to publish positive results and leave negative results in a desk drawer, resulted in a conflicted and confusing body of literature which actively retarded progress in medical genetics .

The problems that plagued underpowered candidate genetic association studies, also endemic to other fields of science, are eminently soluble – in a sane world they should now be well behind us. And indeed, in the last four years the genetics community has identified thousands of associations between genetic variants and disease that consistently and robustly replicate, thanks to the crucial innovation of genome-wide association studies done on thousands of individuals. These studies were both well powered to find tiny effects, and weren’t constrained to a particular starting hypothesis about which bits of the genome are involved in any particular disease.

Given this progress we find it frustrating to see researchers making two-decade-old mistakes today. Consider the paper in question by Alex Kogan and colleagues. The authors took a highly-studied candidate gene (the oxytocin receptor) and tested for association between a genetic variant in this gene and a trait called prosociality in a sample of 23 individuals [1]. In light of what we know about complex trait genetics, this study design is hopelessly underpowered. If the effect sizes of genetic variants on relatively well-defined traits like diabetes and heart attack are small, the effect sizes of genetic variants on less well-defined traits like prosociality must be even smaller. This observation has been reinforced many times by the correlation between how easy it is to clinically define a disease or trait and how successful the GWAS approach has been. This study has produced a random p-value, perhaps “significant” (in this case, standard linear regression gives a p-value of 0.03), but ultimately meaningless.

The Kogan et al. study is just a symptom, in our opinion, of a habit among some research groups of ignoring the recent history of genetics. These groups continue to publish small studies on “sexy” genes like the serotonin transporter (known sometimes as “the depression gene”; this moniker is indeed depressing) and MAOA (unfortunately also known as the “warrior gene”). By historical analogy, most if not all of this literature is wrong, and will soon be forgotten. Signs of this are already starting to appear. For example, consider the “depression gene”. A recent large meta-analysis involving thousands of individuals found no detectable effect of the gene on the disease. And a recent large (~5,000 individuals) genome-wide association study of various personality traits found no significant effects of genetic variants anywhere in the genome on personality.

Our genes affect nearly every aspect of our lives, including our personality, so real genetic associations to these traits unquestionably exist. Indeed, genuine risk variants for serious psychiatric diseases like schizophrenia have been found, but only in very large, carefully-performed studies involving tens of thousands of people. Unravelling the genetic basis of variation in more subtle human behavioural traits will be a fascinating process, but everything we know about both the genetics of complex traits and the complexity of human behavior indicate that this will not be easy, and that it will also require genome-wide approaches with sample sizes in the thousands, not the low dozens. In addition, it will require adhering to rigorous procedures for study design and statistical analysis, as followed by most large-scale disease genome-wide association studies but all too often ignored by behavioral geneticists.

Finally, we extend a plea to science writers: before writing about any article claiming a genetic association, it’s worth doing some simple sanity checks. Is the sample large enough to capture the typically tiny effect sizes we expect to see for complex human traits? (Unless there is some reason to believe that a trait has a comparatively simple genetic basis, that means a sample size in the thousands.) Have the authors performed an independent replication study in a separate cohort, using the same genetic model and statistical approach? And does the study show any of the telltale signs of “significance-hunting”, such as reporting of results from some subsets of their cohort but not others, or the use of an unnecessarily complex statistical model? If the answers are no, no or yes to these questions, it is very likely that the study’s results are the outcome of artefact or chance rather than a genuine association, and you should report it with the appropriate caveats [2] – or better yet, don’t report it at all until the crucial replication studies have been performed.

[1] The author has been quoted as saying that “the number of observers and video clips observed actually makes for a larger sample size, providing greater statistical power“. This is incorrect. If I were interested in height and had 100 friends measure me, the large number of friends measuring my height obviously does not influence the sample size. I would still have a sample size of one, albeit with a very precise measurement.

[2] It’s worth noting once again the phenomenal example set by Ed Yong, who registered critical comments about the article on Twitter and edited his post to accommodate them within a matter of minutes.

  • Digg
  • StumbleUpon
  • Facebook
  • Twitter
  • Google Bookmarks
  • FriendFeed
  • Reddit

18 Responses to “Size matters, and other lessons from medical genetics”

  • Thanks so much for this. It explains the problem in a detailed and objective manner. I followed the debate with great interest because I’m a statistician myself and often face the arduous task of being given “imperfect” data and still having to make the best out of it because my experimentalist colleagues spent money and effort in extracting whatever data they were able to extract.

    I have a more general question that I want to toss out there as food for thought: Do you think it would’ve been possible to find the money to genotype 5,000 individuals in 2000 if in the literature there wasn’t already a study with 500 individuals that pointed to the fact that the allele was interesting and worth looking at?

  • Daniel MacArthur

    Hi EEGeorgi,

    Back in 2000 it would have been very difficult to collect genotype data on that scale without prior supporting data. But part of the point we try to make in the post is that it’s no longer necessary to approach these sorts of studies with defined ideas about which sections of the genome will be associated: genome-wide association studies allow you to look at all common genetic variants simultaneously. It’s telling that this kind of hypothesis-free* approach has been so much more successful than the previous decades of hypothesis-driven experiments.

    *GWAS aren’t strictly speaking hypothesis-free – the hypothesis is that somewhere in the genome there are common variants associated with the trait in question that are in at least partial linkage disequilibrium with the markers on your genotyping chip.

  • What is needed is not to increase sample sizes for bigger association studies but to move beyond looking for associations with common variants altogether. For the most part, these are neutral. The real payoff will be in the highly unique profiles of rare variants that each of us carries – these are certainly more important for disorders and there is mounting evidence that they are also more important for the variation of traits in the “normal range”. Time to sequence! Though again – large sample sizes will be required!

  • I agree with Kevin and disagree with the main authors. Why assume that all effect sizes will be “tiny”? And, if they are, who cares about them anyway? Chip-based GWAs are absolutely hopeless for picking up rare variation and there’s no reason to think that rare variants with major effects won’t contribute to “common” and/or “quantitative” traits. In fact, it’s a fair be that that’s where a lot of the “missing variance” may be hiding. That said, I fully agree that the study of sociability is almost certainly nonsense. And if it’s not, the authors will just have been lucky, not correct.

  • re: sequencing versus genotyping in association studies. Obviously this is where the field is headed, it’s just a matter of time. Right now, if I had the choice between sequencing 100 individuals and genotyping 1,000 individuals, I’d go with the larger sample size. Other reasonable people will disagree about this, but sequencing is not a panacea; my guess is that people who are hoping for lots of rare variants of large effect are likely to be at least somewhat disappointed.

    This Visscher paper is, IMO, the best statement of the issue (worth flipping through for people who aren’t familiar with the lastest pseudo-controversy in human genetics)

  • Is this a critique of behavioural genetic studies or association studies in general? The Visscher paper cited by Joe above does detail many problems with the former. Also the false discovery rate has been high in hypothesis driven gene association studies – but as a field it has been far from a failure, and why does it have to be one (GWAS) or the other? It doesn’t and it isn’t, indeed many GWAS SNPs become the subject of single gene hypothesis driven studies (FTO etc).

    The ACE study is a good example of trying to find too much – a single gene strongly associated with an extremely complex “disease” – heart attack is the end point reached by many different pathways in different individuals, it’s not really a single disease anyway.

    The successes have come from being less ambitious and looking at single gene effects on relatively simple outcomes. Most (all?) of the current genetic variants used in pharmacogenetics are the result of single gene studies on the “simple” process of the metabolism of a single type of molecule.

    There are many other very well established genes which have been found through initially relatively small studies. ACTN3 (thank you Daniel!) and its effect on muscle fibres (note, NOT it’s effect on overall performance, and NOT a prediction of future athletic ability in children). The crucial features of succesful non GWAS association studies have been to look at simple events and include both genetics plus environmental measures. The big failures were those, like ACE, that looked just at gene and disease (same for SOD2 and cancer –

  • indeed many GWAS SNPs become the subject of single gene hypothesis driven studies (FTO etc).

    Once an association has been reliably and conclusively established, it’s worth following up on. This is uncontroversial. The problem is when people start following up on polymorphisms that have, in reality, no connection to the phenotype they’re interested in (e.g, the behavioral genetics literature on 5-HTT, COMT, DRD, etc.).

    I’d say the critique is of small candidate gene studies in behavioral genetics. The handful (maybe literally less than 5?) of real associations discovered over a decade of small candidate gene studies in medical genetics is not the result I’d be hoping for as a field…

  • Why assume that all effect sizes will be “tiny”? And, if they are, who cares about them anyway?

    This is a commonly held, but incorrect perception. There is no relationship between the effect size of an associated genetic polymorphism discovered by GWAS (or any other method) and its clinical relevance. For example, GWAS of cholesterol level discovered a common SNP in the HMGCR gene which lowers LDL by ~3 mg/dl per allele. It’s a tiny effect, and not even close to the strongest LDL GWAS hit. HMGCR, of course, has been known for some time as the target of statins, which lower LDL by 70 mg/dl, and are among the most widely prescribed drugs in the world. Minuscule genetic associations can identify biological pathways or functions of profound clinical importance.

  • The problem with “highly unique profiles of rare variants that each of us carries” is that we have no idea what almost any of these do. Most of the time, nothing much, it seems. So, the reason there have been very few papers on rare variants in common diseases, is not that people haven’t thought it worth looking, it is that the results are largely uninterpretable. So, as Kevin says, with sequencing, you still need big numbers so that the rare variants mount up, and an even bigger replication panel to check your results.

  • Basically, Neil is 100% correct. So I have nothing to add here.

  • Okay, I have another question. Whenever you look at these kind of associations, you assume that genomes have evolved independently, right? In other words, you assume that when you do see something it’s a true association and not the underlying relation between the genomes. Now, that is relatively true for relatively small sample sizes, but the more you increase the sample size the more likely you are to pick shared ancestry across genomes because after all we’re not that old as a species. So, there’s a drawback in increasing the sample size, too, right? Even with a samples size of 5,000 you might pick a false positive. You might get a really strong signal at locus X when the true underlying variant is somewhere else, and all locus X is telling you is that your subgroup is related by a common ancestor. Or have I missed something? I know these studies always stratify by race, but even different racial groups coalesce at some point and, again, unless I’m totally missing the point (and if so, please correct me as I’m really trying to understand this better), ideally one should do a phylogenetic tree of the whole sample and use that as a correction for associations.

  • @EEGiorgi

    You are right that population stratification is a potential confounder for large studies, and a lot of the work we do in association studies involve controling for this. Traditionally people use either a principal component analysis, or something like STRUCTURE or ADMIXTURE, to seperate out true effects from stratification-driven signals. Increasily, we are moving towards used full mixed models, which directly integrate across the covariance matrices to capture basically all global relatedness. There are also people who do what you suggest, and use ancestural recombination graphs (phylogenetic trees that take into account recombination), but that is less common.

    We sometimes perform replication via family studies to reassure ourselves that the effects aren’t due to stratification. These studies show us that there haven’t been any examples of major GWAS findings that were driven by stratification, so we are pretty convinced that our methods are controling stratification so far!

  • EEGiorgi,

    In addition to Luke’s comment, you might be interested in a recent review of the issue you raise:

  • That’s very helpful, thank you both!

  • Sirs,

    Good show all around gentlemen! Would that we could frame and elevate this exchange in this era of coarse discourse and low manners.


    C. V. Snicker

  • PNAS just published another OXTR study, the third one in the past few months. This time the sample size is 194, not too bad, but I’m still puzzled because the SNP in question (rs53576) is silent… So the part I don’t understand is: how can all these studies speculate over the effects of this single SNP and how they are linked to the release of oxytocin if the SNP is silent? I’m probably missing something, but when we look at antibodies in order to explain effects, we map mutations to the crystalline structure, so in this case I would think if the effect is explainable by the release of oxytocin, then the mutation should affect the crystal structure of the receptor, which is not the case because the SNP is silent… So then isn’t it a more interesting question to investigate what other mechanisms are involved here? If there really is an effect, isn’t it time to look at the broader picture in which people investigate what other SNPs are in linkage disequilibrium with this particular SNP and what other mechanisms could be involved in these associations? To me, besides the sample size issue you advocate here, that’s another reason to do whole genome studies.

  • “MAOA (unfortunately also known as the ‘warrior gene’). By historical analogy, most if not all of this literature is wrong, and will soon be forgotten. Signs of this are already starting to appear.”

    Bullshit. Morons.

  • ANoteOfCaution

    GWAS: The holy grail of some horny for data geneticists.
    Why think about what you are doing when you can use a shotgun approach. Just shoot, any hits counts. For GWAS you need large numbers because you get so many chance associations (and many of those don’t even come close to being biologically relevant) that you need very rigorous multiple testing corrections to weed out most of the chance associations. And then they go back to single SNP testing to prove that the association is valid. The old techniques were good then and everybody believed in those techniques. Yes, GWAS has its merits, but it also has its drawbacks. Quit promoting GWAS as the best thing in the world. It is not.
    Don’t get me wrong, I have been doing genetic research for years and I have used every technique. The most important step of all is to be a researcher and think(!) about your data and your research techniques.
    GWAS is a hype now. It will be replaced by a new one, and a new one, and a new one, and ….

Comments are currently closed.

Page optimized by WP Minify WordPress Plugin