Tag Archive for 'GWAS'

Page 2 of 3

Are synthetic associations a man-made phenomenon?

Early last year David Goldstein and colleagues published a provocative paper claiming that many GWAS associations are driven not by common variants of modest effect (the canonical common disease – common variant hypothesis underpinning GWAS) but instead by a local cluster of lower frequency  variants that have much bigger effects on disease risk. They dubbed this hypothesized phenomenon “synthetic association” and the term quickly became a genetics buzzword. The paper was widely discussed in both the specialist and mainstream media, and caused quite a stir among academic statistical geneticists.

That debate has been re-opened today by a set of Perspectives in PLoS Biology: a rebuttal by us (Carl & Jeff) and our colleagues at Sanger, a rebuttal by Naomi Wray, Shaun Purcell and Peter Visscher, a rebuttal to the rebuttals by David Goldstein and an editorial by Robert Shields to tie it all together.

Continue reading ‘Are synthetic associations a man-made phenomenon?’

From GWAS to pathways, the consequences of DTC genetics and screening by sequencing

A paper out in PLoS Genetics this week takes a step towards using genome-wide association data to reconstruct functional pathways. Using protein-protein interaction data and tissue-specific expression data, the authors reconstruct biochemical pathways that underlie various diseases, by looking for variants that interact with genes in GWAS regions. These networks can then tell us about what systems are disrupted by GWAS variants as a whole, as well as identifying potential drug targets. The figure to the right shows the network constructed for Crohn’s disease; large colored circles are genes in GWAS loci, small grey circles are other genes in the network they constructed. As an interesting side note, the GWAS variants were taken from a 2008 study; since then, we have published a new meta-analysis, which implicated a lot of new regions. 10 genes in these regions, marked as small red circles on the figure, were also in the disease network. [LJ]

23andMe customers will be interested in a neat little FireFox plug-in that allows them to view their own genotypes for any 23andMe SNP mentioned on a web page. You can download the plug-in here (you’ll need to have an up-to-date version of FireFox), and I have a brief review of the tool here. [DM]
Continue reading ‘From GWAS to pathways, the consequences of DTC genetics and screening by sequencing’

Estimating heritability using twins

Last week, a post went up on the Bioscience Resource Project blog entited The Great DNA Data Deficit. This is another in a long string of “Death of GWAS” posts that have appeared around the last year. The authors claim that because GWAS has failed to identify many “major disease genes”, i.e. high frequency variants with large effect on disease, it was therefore not worthwhile; this is all old stuff, that I have discussed elsewhere (see also my “Standard GWAS Disclaimer” below). In this case, the authors argue that the genetic contribution to complex disease has been massively overestimated, and in fact genetics does not play as large a part in disease as we believe.

The one particularly new thing about this article is that they actually look at the foundation for beliefs about missing heritability; the twin studies of identical and non-identical twins from which we get our estimates of the heritability of disease. I approve of this: I think all those who are interested in the genetics of disease should be fluent in the methodology of twin studies. However, in this case, the authors come to the rather odd conclusion that heritability measures are largely useless, based on a small statistical misunderstanding of how such studies are done.

I thought I would use this opportunity to explain, in relative detail, where we get our estimates of heritability from, why they are generally well-measured and robust, and real issues need to be considered when interpreting twin study results. This post is going to contain a little bit of maths, but don’t worry if it scares you a little, you only really need to get the gist.
Continue reading ‘Estimating heritability using twins’

Friday Links

A quick note about the Reader Survey; we are going to stop taking responses at the end of Saturday (Pacific Time). If you haven’t already done so, please fill out the survey now.

A couple of interesting articles this week on the Personal Genome Project and public genomics in general. Mark Henderson at the Times has an opinion piece (behind a paywall, I’m afraid) about Misha Angrist‘s book Here Is A Human Being (see also this review from The Intersection), and in the Duke Magazine Mary Carmichael has an in-depth feature on the work of George Church, with some interesting history of the early days of the PGP.

One aspect that comes out of these articles is how those who take part in public genomics projects are starting to own the unknown unknowns. They accept that they cannot anticipate all the risks of making their data public, but are willing to take the risk of exposing themselves to these unknown risks, and in doing so turn them into knowns. Another aspect is the sheer number of individuals who want to sign up to have their data published online: 15,000 people have expressed interesting in being part of the PGP, despite initial NIH concerns the no-one would want to take part at all. This also chimes with research presented at ASHG this year, showing that members of the public are more concerned with contributing to scientific knowledge, and, crucially, getting access to their own genetic data than they are about the potential risks that such data could expose them too. [LJ]

Continue reading ‘Friday Links’

Friday Links

At the risk of turning Friday Links into a self-trumpet-blowing occasion, we are happy to report that a number of GNZ contributors (Jeff, Carl and Luke) are authors on a new Crohn’s disease GWAS meta-analysis of 6000 patients that came out in Nature Genetics this week. The study brings the number of Crohn’s associations up to 71, with 30 novel, bringing the proportion of heritability explained up to about 24%; also worth noting that all of the associations from the previous meta-analysis were replicated it this one, showing how the cross-platform independent replication experiments that are now standard have largely obliterated false positives in GWAS. There were also 5 loci that showed evidence of a second, independent signal, which I think is a promising sign of things to come.

Continue reading ‘Friday Links’

Friday Links

The largest genome-wide association study ever undertaken was published in Nature this week. The appropriately named Genetic Investigation of ANthropocentric Traits (GIANT) consortium combined data from 183,727 individuals and identified around 180 loci influencing human height. The loci were enriched with genes underlying skeletal growth and other relevant biological pathways. Interestingly, these 180 loci are estimated to only account for 10% of the phenotypic variation in height (or around 12.5% of the heritability). [CAA]

Christophe Lambert from Golden Helix has an excellent, thorough post looking at the importance of careful experimental design in large-scale genetic association studies. In particular, Lambert focuses on the need for randomising samples across experimental batches: if you have some batches containing entirely cases and others entirely controls, then the all-too-pervasive spectre of batch effects can easily create false positive associations. In many cases batch effects can be recognised and corrected for post hoc (Lambert cites a good example from the original WTCCC study), but in other cases a failure to perform the right quality controls can have devastating consequences (Lambert cites the recent longevity GWAS paper in Science). I’d be interested to hear from my more GWAS-savvy colleagues (Carl, Jeff) whether randomisation is standard procedure in most large GWAS now. [DM]

We managed to miss this out last week, but the current issue of Nature Genetics has a strange and wonderful paper on breast cancer genetics. The study looked at 2838 individuals with BRCA1 mutations that strongly predispose to breast cancer, and looked for non-BRCA1 variants associations with breast cancer in this group. They found an associated variant of chromosome 19, and replicated it in another 5986 BRCA1 carriers (where do they find this many BRCA1 carriers?). To top it all off, they looked at this variant in another 6800 breast cancer patients without BRCA1 mutations, and found no association. However, when they stratified their samples into ER+ and ER- associations, they found associations in both, but going in opposite directions! The variant predisposes people to ER- cancer, but is protective against ER+, and taken together they pretty much perfectly balance out. [LJ]

Getting even with the odds ratio

In the recent report from the US Government Accountability Office on direct-to-consumer genetic tests, much was made of the fact that risk predictions from DTC genetic tests may not be applicable to individuals from all ethnic groups. This observation was not new to the report – it has been commented on by numerous critics ever since the inception of the personal genomics industry.

So, why does risk prediction accuracy vary between individuals and what can be done to combat this? Are the DTC companies really to blame?

To explore these questions it is first necessary to understand what is meant by the odds ratio (OR). In genetic case-control association studies the OR typically represents the ratio of the odds of disease if allele A is carried compared to if allele B is carried. If all else is equal, genetic loci with a higher OR are more informative for disease prediction – so getting an accurate estimate is extremely important if prediction underpins your business model. However, getting an accurate estimate of OR is far from easy because many, often unmeasured, factors can cause OR estimates to vary. In this post I will try to break down the concept of a single, fixed odds ratio for a disease association, and highlight a number of factors that can cause odds ratios to vary using examples from the scientific literature.

Continue reading ‘Getting even with the odds ratio’

Friday Links

Over at Your Genetic Genealogist, CeCe Moore talks about investigating evidence of low-level Ashkenazi Jewish descent in her 23andMe data. What I like about this story is how much digging CeCe did; after one tool threw up a “14% Ashkenazi” result, she looked for similar evidence in 23andMe’s tool. She then did the same analysis on her mother’s DNA, finding no apparant Ashkenazi heritage, and to top it all off got her paternal uncle genotyped, which showed even greater Ashkenazi similarity. [LJ]

A paper out in PLoS Medicine looks at the interaction between genetics and physical activity in obesity. The take-home message is pretty well summarized in the figure to the left; genetic predispositions are less important in determining BMI for those who do frequency physical excercise than for those who remain inactive. This illustrates the importance of including non-genetic risk factors in disease prediction; not only because they are very important in their own right (the paper demonstrates that physical activity is about as predictive of BMI as known genetic factors), but also because information on environmental influences allows better calibration of genetic risk. [LJ]

Trends in Genetics have published an opinion piece in their most recent issue outlining the types of genetic variants we might expect to see for common human diseases (defined by allele frequency and risk), and how exome and whole-genome sequencing could be used to find them.  They give a brief, relatively jargon-free, overview of gene-mapping techniques that have been previously used, and discuss how sequencing can take this research further, particularly for the previously less tractable category of low-frequency variants that confer a moderate level of disease risk. [KIM]

More Sanger shout outs this week; Sanger Institute postdoc Liz Murchison, along with the rest of the Cancer Genome Project, have announced the sequencing of the Tasmanian Devil genome. The CGP is interested in the Tasmanian Devil due to a rare, odd and nasty facial cancer, which is passed from Devil to Devil by biting. In fact, all the tumours are descended from the tumour of one individual; 20 years or so on, and 80% of the Devil population has been wiped out by the disease. As well as a healthy genome, the team also sequenced two tumour genomes, in the hope of learning more about what mutations made the cells go tumours, and what makes the cancer so unique.

I have to say, this isn’t going to be an easy job; assembling a high-quality reference genome of an under-studied organism is a lot of work, especially using Illumina’s short read technology, and identifying and making sense of tumour mutations is equally difficult. Add to this the fact that the tumour genome is from a different individual to the healthy individual, this all adds up to a project of unprecedented scope. On the other hand, the key to saving a species from extinction could rest on this sticky bioinformatics problem, and if anyone is in the position to deal with it, it’s the Cancer Genome Project. [LJ]

Tasmanian Devil image from Wikimedia Commons.

Friday Links

A lot of the Genomes Unzipped crew seem to be away on holiday at the moment, so today’s Links post may lack the the authorial diversity that you’re accustomed to.

I just got around to reading the August addition of PLoS Genetics, and found a valuable study from the Keck School of Medicine in California. They authors looked at the effect of known common variants in five American ethnic groups (European, African, Hawaiian, Latino and Japanese Americans), to assess how similar or different the effects sizes were across the groups.

The authors calculated odds ratios for each variant in each ethnic group, and looked for evidence of heterogeneity in odds ratios. They find that, in general, the odds ratios tend to show surprisingly little variation between ethnic groups; the direction of risk was the same in almost all cases, and the mean odds ratio was roughly equal across populations (the authors note that this pretty effectively shoots down David Goldstein’s “synthetic association” theory of common variation). One interesting exception was that the effect size of the known T2D variants was significantly larger in Japanese Americans, who had a mean odds ratio of 1.20, compared to 1.08-1.13 for other ethnic groups. The graph to the left shows the distribution of odds ratios in European and Japanese Americans.

These sorts of datasets will be very useful for personal genomics in the future, as a decade of European-centered genetics research has left non-Europeans somewhat in the lurch with regards to disease risk predictions. However, the problem with the approach in this paper is that even this in large a study (6k cases, 7k controls) the error bounds on the odds ratios within each group are still pretty large. [LJ]

Over at the Guardian Science Blog, Dorothy Bishop explains the difference between learning that a trait is heritable (e.g. from twin studies), and mapping a specific gene “for” a trait (e.g. via GWAS). Her conclusion is worth repeating:

The main message is that we need to be aware of the small effect of most individual genes on human traits. The idea that we can test for a single gene that causes musical talent, optimism or intelligence is just plain wrong. Even where reliable associations are found, they don’t correspond to the kind of major influences that we learned about in school biology. And we need to realise that twin studies, which consider the total effect of a person’s genetic makeup on a trait, often give very different results from molecular studies of individual genes.

There are also interesting questions to be asked about why there is such a gap between heritabilities estimated by twin studies, and the heritability that can be explained by GWAS results. That is, however, is a question for another day. [LJ]

Another article just released in PLoS Genetics provides a powerful illustration of just how routine whole-genome sequencing is now becoming for researchers: the authors report on complete, high-coverage genome sequence data for twenty individuals. The samples included 10 haemophilia patients and 10 controls, taken as part of a larger study looking at the genetic factors underlying resistance to HIV infection. While this is still a small sample size by the standards of modern genomics, there are a few interesting insights that can be gleaned from the data: for instance, the researchers argue from their data that each individual has complete inactivation of 165 protein-coding genes due to genetic variants predicted to disrupt gene function. I’ll be following up on this claim in a future post. [DM]

Finally, a quick shout-out to our fellow Sanger researchers, including Verneri Anttila and Aarno Palotie, along with everyone else in the International Headache Genetics Consortium, for finding the first robust genetic association to migrane. They looked at 3,279 cases and >10k controls (and another 3,202 cases to check their results), and found that the variant rs1835740 was significantly associated with the disease.

To tie in with the above story, in the region of 40-65% of variation in migraine is heritable, but only about 2% of this was explained by the rs1835740 variant. However, explaining heritability isn’t the main point of GWAS studies: a little follow-up found that rs1835740 was correlated with expression of the gene MTDH, which in turn suggests a defect in glutamate transport; hopefully this new discovery will help shed some light on the etiology of the disease. [LJ]

Friday links

Welcome to the inaugural Friday links post. We’ll be using these posts to share interesting articles stumbled across by Unzipped members during the week.

We’re still tweaking the format, but the basic idea will be a brief paragraph of commentary followed by the initials of the person who wrote it.

Dan Koboldt reviews a recent paper reporting the use of whole-genome sequencing to find the mutation responsible for a severe genetic disease. Interestingly, in this case the disease was undiagnosed, and the causal variant was used to produce a diagnosis of sitosterolemia; more interestingly, this diagnosis had already been ruled out by another test, that was shown to be a false negative. [DM]

Sitting Bull Stamp ScienceNews reports that researchers from the University of Copenhagen have got permission to sequence the genome of Sitting Bull, the native American war chief that led the battle of Little Bighorn. I don’t know exactly what they intend to learn from the genome scientifically, but it seems like this might serve primarily as a monument to a major figure in native American resistance. So the question I have is this: how can we go from a genome sequence (which is generally just a text file on a computer) to a public rememberance, something akin to the 1989 postage stamp shown to the left? [LJ]

Two papers in the current issue of Nature Genetics highlight recent inroads made in understanding the genetics of infectious disease susceptibility. The first found an association between risk of meningococcal disease and CFH, a gene previously implicated in age related macular degeneration. The second identified a susceptibility locus for tuberculosis in African samples. Paul de Bakker and Amalio Telenti have a nice News and Views piece about them as well, remarking on this welcome advance not only in understanding infection, but also in using GWAS to gain insight about disease risk in non-Europeans. [JCB]

Update: Dan Frost from the GoldenHelix blog has drawn our attention to a thought-provoking post on the future of GWAS studies. The post suggests that much of the missing heritability in complex disease is hiding in the set of variants that are badly tagged by existing chips, and proposes that GWAS studies in the future may include a sequencing phase to discover new variants in cases, followed by genotyping using custom genotype chips to capture this variation. The question, from my point of view, is how many common SNPs are there that aren’t well tagged by existing chips, and thus how much heritability could be hidden there? This is exactly the sort of question that the 1000 Genomes dataset was designed to answer. [LJ]


Page optimized by WP Minify WordPress Plugin