Author Archive for Daniel MacArthur

Guidelines for finding genetic variants underlying human disease

Authors: Daniel MacArthur and Chris Gunter.

New DNA sequencing technologies are rapidly transforming the diagnosis of rare genetic diseases, but they also carry a risk: by allowing us to see all of the hundreds of “interesting-looking” variants in a patient’s genome, they make it potentially easy for researchers to spin a causal narrative around genetic changes that have nothing to do with disease status. Such false positive reports can have serious consequences: incorrect diagnoses, unnecessary or ineffective treatment, and reproductive decisions (such as embryo termination) based on spurious test results. In order to minimize such outcomes the field needs to decide on clear statistical guidelines for deciding whether or not a variant is truly causally linked with disease.

In a paper in Nature this week we report the consensus statement from a workshop sponsored by the National Human Genome Research Institute, on establishing guidelines for assessing the evidence for variant causality. We argue for a careful two-stage approach to assessing evidence, taking into account the overall support for a causal role of the affected gene in the disease phenotype, followed by an assessment of the probability that the variant(s) carried by the patient do indeed play a causal role in that patient’s disease state. We argue for the primacy of statistical genetic evidence for new disease genes, which can be supplemented (but not replaced by) additional informatic and experimental support; and we emphasize the need for all forms of evidence to be placed within a statistical framework that considers the probability of any of the reported lines of evidence arising by chance.

The paper itself is open access, so you can read the whole thing – we won’t rehash a complete summary here. However, we did want to discuss the back story and expand on a few issues raised in the paper.
Continue reading ‘Guidelines for finding genetic variants underlying human disease’

Ten guidelines for tweeting at conferences

Many of the Genomes Unzipped team are spending the week at the American Society of Human Genetics meeting in San Francisco. This year the coverage of the meeting on Twitter is more intense than ever before, and social media is becoming an increasingly mainstream component of the conference. Chris Gunter, Jonathan Gitlin, Jeannine Mjoseth, Shirley Wu and I will be presenting a workshop on social media use for scientists this evening, and we prepared these guidelines for those interested in live coverage of meetings.

  1. Check the conference social media guidelines first.
    If there aren’t any, ask an organizer what the rules are. If there is no formal policy, you may want to take the initiative and ask speakers if they’re OK with their talks being tweeted.
  2. Use the right #hashtag when you tweet.
    This ensures that everything written about a meeting is aggregated in a single channel. When you search a hashtag it filters those posts for you.
  3. Remember that people are listening.
    Twitter is a public conversation. Don’t say anything you wouldn’t be prepared to tell the speaker to their face. Also, bear in mind that your boss and potential employers may be following.
  4. Remember that people are listening who aren’t at the meeting.
    In general, leave off the conference hashtag for in-jokes and social chatter unless it’s likely to be genuinely entertaining to outsiders.
  5. Be careful tweeting new findings.
    If a speaker is presenting unpublished data, don’t write about it unless you’re sure they’re happy to share.
  6. Do your best to ensure that your tweets don’t misrepresent presented material.
    Add as much context as you can, and actively correct misunderstandings that arise about something you tweet.
  7. Add value by contributing your specific area(s) of expertise to provide insight into presented material.
    Don’t just be the fifth person to tweet the easy soundbite from the plenary; instead, explain the unappreciated but profound scientific significance of their fourteenth slide.
  8. At the same time, don’t tweet everything a speaker says.
    One to three key take-home messages per talk is usually enough, unless a presentation is particularly fascinating.
  9. Don’t swamp the hashtag by quote-tweeting everyone else.
    Use the official retweet function, or “break the hashtag” (for instance, delete the # character) in your quote-tweets.
  10. If you’re organizing a conference, be proactive with a social media policy.
    Make sure both the presenters and the audience at the meeting are aware in advance what this policy is.

The ENCODE project: lessons for scientific publication

The ENCODE Project has this week released the results of its massive foray into exploring the function of the non-protein-coding regions of the human genome. This is a tremendous scientific achievement, and is receiving plenty of well-deserved press coverage; for particularly thorough summaries see Ed Yong’s excellent post at Discover and Brendan Maher at Nature.

I’m not going to spend time here recounting the project’s scientific merit – suffice it to say that the project’s analyses have already improved the way researchers are approaching the analysis of potential disease-causing genetic variants in non-coding regions, and will have an even greater impact over time. Instead, I want to highlight what a tremendous feat of scientific publication the project has achieved.
Continue reading ‘The ENCODE project: lessons for scientific publication’

Genome interpretation costs will not spiral out of control

Mo' genomes, mo' money?

An article in Genetic Engineering & Biotechnology News argues that as the cost of genome sequencing decreases, the cost of analysing the resulting data will balloon to extraordinary levels. Here is the crux of the argument:

We predict that in the future a large sum of money will be invested in recruiting highly trained and skilled personnel for data handling and downstream analysis. Various physicians, bioinformaticians, biologists, statisticians, geneticists, and scientific researchers will be required for genomic interpretation due to the ever increasing data.

Hence, for cost estimation, it is assumed that at least one bioinformatician (at $75,000), physician (at $110,000), biologist ($72,000), statistician ($70,000), geneticist ($90,000), and a technician ($30,000) will be required for interpretation of one genome. The number of technicians required in the future will decrease as processes are predicted to be automated. Also the bioinformatics software costs will plummet due to the decrease in computing costs as per Moore’s law.

Thus, the cost in 2011 for data handling and downstream processing is $285,000 per genome as compared to $517,000 per genome in 2017. These costs are calculated by tallying salaries of each person involved as well as the software costs.

These numbers would be seriously bad news for the future of genomic medicine, if they were even remotely connected with reality. Fortunately this is not the case. In fact this article (and other alarmist pieces on the “$1000 genome, $1M interpretation” theme) wildly overstate the economic challenges of genomic interpretation.

Since this meme appears to be growing in popularity, it’s worth pointing out why genome analysis costs will go down rather than up over time:
Continue reading ‘Genome interpretation costs will not spiral out of control’

All genomes are dysfunctional: broken genes in healthy individuals

Breakdown of the number of loss-of-function variants in a "typical" genome

I don’t normally blog here about my own research, but I’m making an exception for this paper. There are a few reasons to single this paper out: firstly, it’s in Science (!); and secondly, no fewer than five Genomes Unzipped members (me, Luke, Joe, Don and Jeff) are co-authors. For me it also represents the culmination of a fantastic postdoc position at the Wellcome Trust Sanger Institute (for those who haven’t heard on Twitter, I’ll be starting up a new research group at Massachusetts General Hospital in Boston next month).

Readers who don’t have a Science subscription can access a pre-formatted version of the manuscript here. In this post I wanted to give a brief overview of the study and then highlight what I see as some of the interesting messages that emerged from it.

First, some background

This is a project some three years in the making – the idea behind it was first conceived by my Sanger colleague Bryndis Yngvadottir and I back in 2009, and it subsequently expanded into a very productive collaboration with several groups, most notably Mark Gerstein’s group at Yale University, and the HAVANA gene annotation team at the Sanger Institute.

The idea is very simple. We’re interested in loss-of-function (LoF) variants – genetic changes that are predicted to be seriously disruptive to the function of protein-coding genes. These come in many forms, ranging from a single base change that creates a premature stop codon in the middle of a gene, all the way up to massive deletions that remove one or more genes completely. These types of DNA changes have long been of interest to geneticists, because they’re known to play a major role in really serious diseases like cystic fibrosis and muscular dystrophy.

But there’s also another reason that they’re interesting, which is more surprising: every complete human genome sequenced to date, including celebrities like James Watson and Craig Venter, has appeared to carry hundreds of these LoF variants. If those variants were all real, that would indicate a surprising degree of redundancy in the human genome. But the problem is we don’t actually know how many of these variants are real – no-one has ever taken a really careful look at them on a genome-wide scale.
Continue reading ‘All genomes are dysfunctional: broken genes in healthy individuals’

Review of the Lumigenix “Comprehensive” personal genome service

This is the first of a new format on Genomes Unzipped: as we acquire tests from more companies, or get data from others who have been tested, we’ll post reviews of those tests here. The aim of this series is to help potential genetic testing customers to make an informed decision about the products on the market. We’re still tweaking the format, so if you have any suggestions regarding additional analyses or areas that should be covered in more detail, let us know in the comments.


Lumigenix is a relative newcomer to the personal genomics scene: the Australian-based company launched back in March this year, offering a SNP chip-based genotyping service similar in concept to those provided by 23andMe, deCODEme and Navigenics.

The company kindly provided Genomes Unzipped with 12 free “Comprehensive” kits, which provide genotypes at over 700,000 positions in the genome, to enable us to review their product. We note that the company offers several other services, including a lower-priced “Introductory” test that covers fewer SNPs, and whole-genome sequencing for the more ambitious personal genomics enthusiast. This review should be regarded as entirely specific to the Comprehensive test.
Continue reading ‘Review of the Lumigenix “Comprehensive” personal genome service’

On bad genetics reporting

This short article on the Independent’s website may not be the worst piece of genetics reporting ever, but given its brevity it may well take a new record for the density of errors and misconceptions. (To save you the trouble of hunting down the article it’s actually referring to, which of course is not linked, it’s this online article in Molecular Psychiatry).

Let’s start with the headline:

Sleeping is all in the genes

No. Data from twin studies suggest that the length of time people sleep for is around 44% heritable – that is, around 44% of the variation in this trait is due to inherited (and presumably mostly genetic) factors. The article being discussed in the piece provides no new information about the heritability of this trait.

Scientists have found the reason why some people need more sleep than others lies in their genes.

Scientists have found that one of the reasons people sleep longer than others is possibly a variant in a non-coding region of the gene ABCC9. Even if this association is real (and the evidence in the article is less than compelling), it explains just 5% of the variation in sleep length between people.

A survey of more than 10,000 people …

A survey of 4,251 people found the association between sleep length and the ABCC9 variant. This association was not replicated in a separate set of 5,949 individuals. The authors have a potential explanation for this lack of replication (based on the season in which the sleep length measurements were collected), and then did a post hoc re-analysis of their combined sample accounting for season that produced positive results.

showed those carrying the gene ABCC9, present in one in five of us,

The gene ABCC9 is present in all of us (hell, it’s even present in fruitflies). However, there is a genetic variation in one region of the ABCC9 gene, and one version of this variation is present in 17.3% of Europeans.
Continue reading ‘On bad genetics reporting’

Size matters, and other lessons from medical genetics

Size really matters: prior to the era of large genome-wide association studies, the large effect sizes reported in small initial genetic studies often dwindled towards zero (that is, an odds ratio of one) as more samples were studied. Adapted from Ioannidis et al., Nat Genet 29:306-309.

[Last week, Ed Yong at Not Exactly Rocket Science covered a paper positing an association between a genetic variant and an aspect of social behavior called prosociality. On Twitter, Daniel and Joe dismissed this study out of hand due to its small sample size (n = 23), leading Ed to update his post. Daniel and Joe were then contacted by Alex Kogan, the first author of the study in question. He kindly shared his data with us, and agreed to an exchange here on Genomes Unzipped. In this post, we expand on our point about the importance of sample size; Alex’s reply is here.

Edit 01/12/11 (DM): The original version of this post included language that could have been interpreted as an overly broad attack on more serious, well-powered studies in psychiatric disease genetics. I’ve edited the post to reduce the possibility of collateral damage. To be clear: we’re against over-interpretation of results from small studies, not behavioral genetics as a whole, and I apologise for any unintended conflation of the two.]

In October of 1992, genetics researchers published a potentially groundbreaking finding in Nature: a genetic variant in the angiotensin-converting enzyme ACE appeared to modify an individual’s risk of having a heart attack. This finding was notable at the time for the size of the study, which involved a total of over 500 individuals from four cohorts, and the effect size of the identified variant–in a population initially identified as low-risk for heart attack, the variant had an odds ratio of over 3 (with a corresponding p-value less than 0.0001).

Readers familiar with the history of medical association studies will be unsurprised by what happened over the next few years: initial excitement (this same polymorphism was associated with diabetes! And longevity!) was followed by inconclusive replication studies and, ultimately, disappointment. In 2000, 8 years after the initial report, a large study involving over 5,000 cases and controls found absolutely no detectable effect of the ACE polymorphism on heart attack risk. In the meantime, the same polymorphism had turned up in dozens of other association studies for a wide range of traits ranging from obstet­ric cholestasis to menin­go­­coccal disease in children, virtually none of which have ever been convincingly replicated.
Continue reading ‘Size matters, and other lessons from medical genetics’

Going green: lessons from plant genomics for human sequencing studies

This is a guest post by Jeffrey Rosenfeld. Jeff is a next-generation sequencing advisor in the High Performance and Research Computing group at the University of Medicine and Dentistry of New Jersey, working on a variety of human and microbial genetics projects. He is also a Visiting Scientist at the American Museum of Natural History where he focuses on whole-genome phylogenetics. He was trained at the University of Pennsylvania, New York University and Cold Spring Harbor Laboratory.

As human geneticists, it is all too easy to ignore papers published about non-human organisms – especially when those organisms are plants. After all, how much can the analysis of (say) Arabidopsis genome diversity possibly assist in my quest to better understand the human genome and determine which genes cause disease? Quite a bit, as it happens: a fascinating recent paper in Nature demonstrates a number of lessons that we can learn from our distant green relatives.

By exploiting the small genome size of Arabidopsis (~120 million bases, compared to the relatively gargantuan 3 billion bases of Homo sapiens), researchers were able to perform complete genome sequencing and transcriptome profiling in 18 different ecotypes of the plant (similar to what we would call strains of an animal).

In a normal genome re-sequencing experiment, the procedure is to obtain DNA from an individual, sequence the DNA, align it to a reference sequence and then to call variants (i.e. differences from the reference). This approach is used by the 1000 Genomes Project and basically all of the hundreds of disease-focused human sequencing projects currently underway around the world. This approach allows researchers to relatively easily identify single-base substitution (SNP) and small insertion/deletion (indel) differences between genomes. However, the amount of variability that can be identified is restricted by the use of a reference: regions where there is extreme divergence between the reference and sample genomes are often badly called, and more complex variants (e.g. large, recurrent rearrangements of DNA) can be missed. Additionally, and crucially, sequences that are not present in the reference genome will be completely missed by this approach.
Continue reading ‘Going green: lessons from plant genomics for human sequencing studies’

Results of Nature poll on personal genetic analysis by scientists

Last month we pointed to a poll over at Nature looking primarily at the use of personal genetic tests among scientists (Nature‘s Brendan Maher was kind enough to consult us when designing the poll, so we were able to pass on some of the lessons we learned when doing our own reader survey last year). The results are now in, and Brendan has a brief article taking a look at the results.

Firstly, there was a fantastic response to the survey – nearly 1,600 participants. Of those, 289 (18%) had taken some kind of genetic test; interestingly, a further 54% said they’d be interested in doing so if given the opportunity. The vast majority of genetic tests done were genome scans (50% 23andMe). The motivations of those who had tests done were very similar to those from our readers – intellectual curiosity ranked at the top, with interest in health, genealogy and ancestry ranking lower.

Brendan’s piece has some nice vignettes from survey respondents. He was also kind enough to pass on the raw (anonymised) data to us for further analysis, and we’ll be poking around in there over the next week or so. Some immediately interesting results emerge from the comparison of the results from participants who fell into the “biology” vs “medicine” discipline: of those who had taken genetic tests, biologists were far more likely to have been tested by 23andMe (73.3% vs 47.2%) and were more likely to have cited “intellectual curiosity” as a major factor in their decision (71.1% vs 42.6%), whereas “medicine” respondents were more likely to cite a specific health risk as a major factor (34.4% vs 18.7%), were more likely to have consulted a clinician beforehand (23.0% vs 7.8%) and were more likely to report negative outcomes to testing (8.2% vs 4.8%). Of those who hadn’t yet had a genetic test done, biologists were more likely to be interested in doing so if given the opportunity (74.4% vs 67.6%). Nothing terribly shocking, but some useful insight into the basis of the “culture war” between basic and medical researchers over the issues surrounding personal genomics.

Anyway, kudos to Brendan and the Nature team – and to their readers, of course – for generating such an interesting data-set. No doubt you’ll be hearing more about the results of this survey soon.

Page optimized by WP Minify WordPress Plugin