Archive for the 'Journal Club' Category

Page 2 of 3

Size matters, and other lessons from medical genetics

Size really matters: prior to the era of large genome-wide association studies, the large effect sizes reported in small initial genetic studies often dwindled towards zero (that is, an odds ratio of one) as more samples were studied. Adapted from Ioannidis et al., Nat Genet 29:306-309.

[Last week, Ed Yong at Not Exactly Rocket Science covered a paper positing an association between a genetic variant and an aspect of social behavior called prosociality. On Twitter, Daniel and Joe dismissed this study out of hand due to its small sample size (n = 23), leading Ed to update his post. Daniel and Joe were then contacted by Alex Kogan, the first author of the study in question. He kindly shared his data with us, and agreed to an exchange here on Genomes Unzipped. In this post, we expand on our point about the importance of sample size; Alex’s reply is here.

Edit 01/12/11 (DM): The original version of this post included language that could have been interpreted as an overly broad attack on more serious, well-powered studies in psychiatric disease genetics. I’ve edited the post to reduce the possibility of collateral damage. To be clear: we’re against over-interpretation of results from small studies, not behavioral genetics as a whole, and I apologise for any unintended conflation of the two.]

In October of 1992, genetics researchers published a potentially groundbreaking finding in Nature: a genetic variant in the angiotensin-converting enzyme ACE appeared to modify an individual’s risk of having a heart attack. This finding was notable at the time for the size of the study, which involved a total of over 500 individuals from four cohorts, and the effect size of the identified variant–in a population initially identified as low-risk for heart attack, the variant had an odds ratio of over 3 (with a corresponding p-value less than 0.0001).

Readers familiar with the history of medical association studies will be unsurprised by what happened over the next few years: initial excitement (this same polymorphism was associated with diabetes! And longevity!) was followed by inconclusive replication studies and, ultimately, disappointment. In 2000, 8 years after the initial report, a large study involving over 5,000 cases and controls found absolutely no detectable effect of the ACE polymorphism on heart attack risk. In the meantime, the same polymorphism had turned up in dozens of other association studies for a wide range of traits ranging from obstet­ric cholestasis to menin­go­­coccal disease in children, virtually none of which have ever been convincingly replicated.
Continue reading ‘Size matters, and other lessons from medical genetics’

Going green: lessons from plant genomics for human sequencing studies

This is a guest post by Jeffrey Rosenfeld. Jeff is a next-generation sequencing advisor in the High Performance and Research Computing group at the University of Medicine and Dentistry of New Jersey, working on a variety of human and microbial genetics projects. He is also a Visiting Scientist at the American Museum of Natural History where he focuses on whole-genome phylogenetics. He was trained at the University of Pennsylvania, New York University and Cold Spring Harbor Laboratory.

As human geneticists, it is all too easy to ignore papers published about non-human organisms – especially when those organisms are plants. After all, how much can the analysis of (say) Arabidopsis genome diversity possibly assist in my quest to better understand the human genome and determine which genes cause disease? Quite a bit, as it happens: a fascinating recent paper in Nature demonstrates a number of lessons that we can learn from our distant green relatives.

By exploiting the small genome size of Arabidopsis (~120 million bases, compared to the relatively gargantuan 3 billion bases of Homo sapiens), researchers were able to perform complete genome sequencing and transcriptome profiling in 18 different ecotypes of the plant (similar to what we would call strains of an animal).

In a normal genome re-sequencing experiment, the procedure is to obtain DNA from an individual, sequence the DNA, align it to a reference sequence and then to call variants (i.e. differences from the reference). This approach is used by the 1000 Genomes Project and basically all of the hundreds of disease-focused human sequencing projects currently underway around the world. This approach allows researchers to relatively easily identify single-base substitution (SNP) and small insertion/deletion (indel) differences between genomes. However, the amount of variability that can be identified is restricted by the use of a reference: regions where there is extreme divergence between the reference and sample genomes are often badly called, and more complex variants (e.g. large, recurrent rearrangements of DNA) can be missed. Additionally, and crucially, sequences that are not present in the reference genome will be completely missed by this approach.
Continue reading ‘Going green: lessons from plant genomics for human sequencing studies’

Report on clinical genome sequencing

The PHG Foundation, an independent genomics think-tank, has launched a new report on next generation sequencing and its impact on health and health systems. The Report, Next steps in the sequence: the implications of whole genome sequencing for health in the UK can be freely downloaded and aims to provide a comprehensive overview of the many and varied issues relating to clinical genome sequencing.

When planning the work, we were motivated by the astonishingly rapid development of fast, affordable whole genome sequencing (WGS) technologies, which are set to change many aspects of health care. The sheer quantity and complexity of the information generated by genome sequencing, along with ever-changing understanding of the function of genomes in health and disease, presents new challenges for health systems.

The Report reviews the technologies, informatics pipeline and key clinical applications of WGS, and as well as the economic, ethical, legal and social implications and organisational challenges of offering WGS within the UK NHS. The final two policy chapters outline different scenarios for testing, storing and returning results, and contains 10 key recommendations reached with the help of several expert stakeholder workshops.

Continue reading ‘Report on clinical genome sequencing’

Revisiting RNA-DNA sequence differences

A few months ago, I discussed a paper by Li and colleagues reporting a large number of sequence differences between mRNA and DNA from the same individual [1]. While some such differences are expected due to known mechanisms of RNA editing (e.g. A->I editing, see [2]), Li et al. reported an astonishingly high number of them, including thousands of events inconsistent with any known regulatory mechanism. These results implied at least one, and probably many, new mechanisms of gene regulation, and called into question some basic assumptions in molecular biology.

An alternative explanation for the observations of Li et al. is less exciting–imagine two genes with similar (but not identical) sequences, which produce similar (but not identical) mRNAs. If you accidentally attributed both mRNA sequences to the same gene, you could erroneously conclude that one of the two sequences arose via RNA editing of the other. According to a new paper in by Schrider and colleagues [3], this banal artifact accounts for the majority of the reported RNA-DNA sequence differences in Li et al.

Schrider et al. show that RNA-DNA mismatches are enriched in genes with close paralogs or copy number variants, both of which are consistent with the technical artifact mentioned above. However, their most striking result is that, at many of the putative RNA editing sites, the “edited” base from the mRNA is actually present in genomic DNA. To show this, Schrider et al. took advantage of the fact that low-coverage DNA sequencing data is available for the individuals used in the Li et al. study. They searched through these data to find genomic sequences matching the “edited” mRNA form. If these sites were truly due to RNA editing, they shouldn’t find any. Instead, at ~75% of the tested sites, they could find a genomic match to the “edit” in at least one individual. There are some potential complications with the interpretation of this number (as they note, the genomic data could include sequencing errors that happen to be the same base as the “edit”), but this observation strongly suggests that a majority of the sites identified by Li et al. are false positives due to this single technical issue.

[1] Li et al. (2011) Widespread RNA and DNA Sequence Differences in the Human Transcriptome. Science. doi: 10.1126/science.1207018

[2] Levanon et al. (2004) Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nature Biotechnology. doi:10.1038/nbt996

[3] Schrider et al. (2011) Very Few RNA and DNA Sequence Differences in the Human Transcriptome. PLoS One. doi:10.1371/journal.pone.0025842

Genetic risk prediction in complex disease

I thought I’d point out a review article in Human Molecular Genetics that just came out in (open access) preprint form by Luke and myself on genetic risk prediction in complex disease. In it we discuss some of the strengths and weaknesses of genetic and risk prediction compared to classical epidemiological predictors, different statistical modelling considerations, and the effect of GWAS on prediction. Readers of this space might find the conclusion of some interest, where we consider some of the societal aspects of trying to bring the interpretation of genomes into mainstream medical practice.

Notes on the evidence for extensive RNA editing in humans

UPDATE 3/17/12: A more extensive analysis of the paper discussed in this post is here. Several groups have concluded that at least 90% of the sites identified are technical artifacts

The “central dogma” of molecular biology holds that the information present in DNA is transferred to RNA and then to protein. In a paper published online at Science yesterday, Li and colleagues report a potentially extraordinary observation: they show evidence that, within any given individual, there are tens of thousands of places where transcribed RNA does not match the template DNA from which it is derived [1]. This phenomenon, called RNA editing, is generally thought to be limited (in humans) to conversions of the base adenosine to the base inosine (which is read as guanine by DNA sequencers), and occasionally from cytosine to uracil. In contrast, these authors report that any type of base can be converted to any other type of base.

If these observations are correct, they represent a fundamental change in how we view the process of gene regulation. However, in this post I am going to point out a couple of technical issues that, if not properly taken into account, have the potential to cause a large number of false positives in this type of data. The main point can be summarized like this: RNA editing involves the production of two different RNA and/or protein sequences from a single DNA sequence. To infer RNA editing from the presence of two different RNA and/or protein sequences, then, one must be very sure that they derive from the same DNA sequence, rather than from two different copies of the DNA (due to, for example, paralogs or copy number variants). Although this issue has the potential to be a large source of false positives in a study like this, I will discuss an additional technical problem that could also result in false positives.

Continue reading ‘Notes on the evidence for extensive RNA editing in humans’

How do variants outside genes influence disease risk?

Over the last several years, the number of genetic variants unambiguously associated with disease risk has grown dramatically. However, interpreting these signals has been extremely difficult—most of the identified variants do not disrupt genes, and indeed many don’t fall anywhere near genes (this observation has even led some to discount these signals entirely). To an investigator interested in following up on these signals, this is somewhat depressing: how can we hope to explore how polymorphisms affect disease risk if they don’t seem to fall in any sort of genome annotation that we understand?

In this context, I thought I’d point to an important paper that, among many other things, gives the first systematic evidence that variants which influence disease are not just randomly scattered across the genome, but instead tend to fall in particular regions—in particular, enhancer elements (regions where DNA-binding proteins interact with DNA to influence gene expression).

The authors rely on the fact that, in the cell, DNA is wrapped around proteins called histones, which control how accessible the DNA is to things like transcription factors (see above figure). These proteins can be chemically modified, and it is now clear that particular patterns of modifications are predictive of the function of the DNA in the region—some modifications indicate transcribed genes, others regions of enhancer activity, others repressed regions, etc.

What the authors did in this study was generate genome-wide maps of several histone modifications in nine different cell types, and use this data to predict the function of each 200 base pair segment of the human genome in each cell type. There are a number of interesting analyses of these “maps” of genome function in the paper, but for our purposes here there’s one of particular interest: the authors took sets of SNPs associated with various diseases and simply asked, are these variants enriched in regions with any particular functional prediction? And indeed, for several phenotypes, there is a striking enrichment of association signals in enhancers elements in a relevant cell type. For example, SNPs which influence lipid levels are enriched in enhancers in a liver cancer cell line, and SNPs which influence the autoimmune disease lupus are enriched in enhancers in a lymphoblastoid cell line.

As these types of functional maps are generated in more cell types, I imagine there will be more stories like this. The problem with interpreting disease association studies, it seems likely, is largely due to our lack of understanding of genome function.

Citation: Ernst et al. (2011) Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. doi:10.1038/nature09906

Are synthetic associations a man-made phenomenon?

Early last year David Goldstein and colleagues published a provocative paper claiming that many GWAS associations are driven not by common variants of modest effect (the canonical common disease – common variant hypothesis underpinning GWAS) but instead by a local cluster of lower frequency  variants that have much bigger effects on disease risk. They dubbed this hypothesized phenomenon “synthetic association” and the term quickly became a genetics buzzword. The paper was widely discussed in both the specialist and mainstream media, and caused quite a stir among academic statistical geneticists.

That debate has been re-opened today by a set of Perspectives in PLoS Biology: a rebuttal by us (Carl & Jeff) and our colleagues at Sanger, a rebuttal by Naomi Wray, Shaun Purcell and Peter Visscher, a rebuttal to the rebuttals by David Goldstein and an editorial by Robert Shields to tie it all together.

Continue reading ‘Are synthetic associations a man-made phenomenon?’

Solving Medical Mysteries Using Sequencing

There is a real “wow” paper out in pre-print at the journal Genetics in Medicine. It is a wonderful example of the application of cutting edge sequencing technology to solve a medical mystery. Even better, the authors also include an auxiliary discussion about the medical and ethical issues surrounding the diagnosis, which raises some interesting issues about the transition from research to clinical sequencing.

The Case

A child manifested severe inflammation of the bowel at 15 months; antibiotics failed to clear it up, and he started to lose weight. Standard treatments seemed to have only sporadic effects, and only severe treatment with immunosuppressants, surgery and full bowel clearing could slow down the disease, which is not a long term solution. No cause could be found; the patient’s active immune system seemed to be acting abnormally, but all tests for the known congenital immune deficiencies came back negative. The doctors could try a full bone-marrow transplant, but without knowing what was causing the disease, and where it was localised, they had no way of knowing if such an extreme intervention would be successful.

Such a severe and early onset disease is likely to be genetic, but testing immune genes at random to find the mutation could take years before it turned anything up. Meanwhile, the child was seriously malnourished, and at times required daily wound care under general anaesthetic. A few years ago this might have been the end of the story.

Continue reading ‘Solving Medical Mysteries Using Sequencing’

Our favourite papers of 2010

To celebrate the end of the blogging year here at Genomes Unzipped, we wanted to spend a bit of time reminiscing about the papers we enjoyed the most in 2010. Feel free to add your own suggestions in the comments!

Joe: Mice, men, and PRDM9. A key goal in evolutionary biology is to identify the mechanisms leading to speciation. One way to get at that goal is to identify genes that cause sterility or reduced fitness in hybrids between species or diverged populations. In mammals, exactly one such gene has been identified to date: the DNA-binding protein PRDM9. This year, three groups working on a seemingly different problem–deciphering the molecular mechanisms by which recombination shuffles genetic variation between generations–stumbled across an important gene in this process: PRDM9. Variation in this gene influences recombination patterns in both mice and humans, and is responsible for the dramatic differences in recombination patterns between humans and chimpanzees. Is it a simple coincidence that a gene which influences recombination also appears to have a role in speciation? Time will tell.

Parvanov et al. (2010) Prdm9 Controls Activation of Mammalian Recombination Hotspots. Science. DOI: 10.1126/science.1181495.

Baudat et al. (2010). PRDM9 Is a Major Determinant of Meiotic Recombination Hotspots in Humans and Mice. Science. DOI: 10.1126/science.1183439.

Myers et al. (2010). Drive Against Hotspot Motifs in Primates Implicates the PRDM9 Gene in Meiotic Recombination. Science. DOI: 10.1126/science.1182363.

Daniel: Whole-genome sequencing to develop personalised cancer assays. The area of medicine where the transforming power of new DNA sequencing technologies is moving the fastest is in cancer diagnostics and therapy. There were many studies relevant to this field in 2010 (with a fair proportion featuring on the excellent MassGenomics blog), but this paper was a simple, elegant example: the authors performed low-coverage whole-genome sequencing of four tumour samples, identified large genomic rearrangements present in the tumour cells but not in the patient’s healthy tissue, and then designed personalised, quantitative assays measuring the proportion of cells carrying these rearrangements in the patients’ blood. These assays allowed them to track, almost in real time, how the patients’ cancers responded to various therapies, like so:

Leary et al. (2010) Development of personalized tumor biomarkers using massively parallel sequencing. Science Translational Medicine. DOI: 10.1126/scitranslmed.3000702.
Continue reading ‘Our favourite papers of 2010’

Page optimized by WP Minify WordPress Plugin