Archive for the 'Journal Club' Category

Page 3 of 3

The cell is a messy place: understanding alternative splicing with RNA sequencing

Though this site is largely dedicated to discussions of personal genomics, I’d like to use this post to discuss some of my recent work (done with Athma Pai, Yoav Gilad, and Jonathan Pritchard) on mRNA splicing. Our paper, in which we argue that splicing is a relatively error-prone and noisy process, has just been published in PLoS Genetics [1].

Continue reading ‘The cell is a messy place: understanding alternative splicing with RNA sequencing’

Friday Links

At the risk of turning Friday Links into a self-trumpet-blowing occasion, we are happy to report that a number of GNZ contributors (Jeff, Carl and Luke) are authors on a new Crohn’s disease GWAS meta-analysis of 6000 patients that came out in Nature Genetics this week. The study brings the number of Crohn’s associations up to 71, with 30 novel, bringing the proportion of heritability explained up to about 24%; also worth noting that all of the associations from the previous meta-analysis were replicated it this one, showing how the cross-platform independent replication experiments that are now standard have largely obliterated false positives in GWAS. There were also 5 loci that showed evidence of a second, independent signal, which I think is a promising sign of things to come.

Continue reading ‘Friday Links’

Friday Links

At long last, the 1000 Genomes Project pilot paper has been published this week in Nature. The paper describes the whole-genome sequencing of 179 individuals from 4 populations, and two mother-father-child trios, looking at the whole range of genomic variation, including SNPs, small indels and larger structural variants. A total of 15m variants were called, about 8 million of which were never seen before (shown in the Venn-diagram to the right), and all the data generated (including sequence, site locations and genotypes) has been released online for anyone to use.

GNZ authors feature pretty heavily in the paper’s author list. Daniel looked for loss-of-function mutations (variants that entirely break a gene), and found about 2000. Don looked at calling de-novo mutations (mutations that occur between parent and child) from the trios, and found around 100 total, which gives a mutation rate of about 10^-8 per base per generation, or around 60 new mutations for every baby born. Luke called 2,780 variants on the Y chromosome, and put together a new Y haplogroup tree (with branch lengths!), and Jeff was involved in the validation effort.

This paper only describes the pilot phase of the 1000 Genomes Project. There is a lot more to come yet, including extending the sample size and introducing new variant calling methods. The project is going to cross the 1000th genome sequenced any day now, and eventually thousands of individuals from dozens of populations will be included.[LJ]

Friday Links

The largest genome-wide association study ever undertaken was published in Nature this week. The appropriately named Genetic Investigation of ANthropocentric Traits (GIANT) consortium combined data from 183,727 individuals and identified around 180 loci influencing human height. The loci were enriched with genes underlying skeletal growth and other relevant biological pathways. Interestingly, these 180 loci are estimated to only account for 10% of the phenotypic variation in height (or around 12.5% of the heritability). [CAA]

Christophe Lambert from Golden Helix has an excellent, thorough post looking at the importance of careful experimental design in large-scale genetic association studies. In particular, Lambert focuses on the need for randomising samples across experimental batches: if you have some batches containing entirely cases and others entirely controls, then the all-too-pervasive spectre of batch effects can easily create false positive associations. In many cases batch effects can be recognised and corrected for post hoc (Lambert cites a good example from the original WTCCC study), but in other cases a failure to perform the right quality controls can have devastating consequences (Lambert cites the recent longevity GWAS paper in Science). I’d be interested to hear from my more GWAS-savvy colleagues (Carl, Jeff) whether randomisation is standard procedure in most large GWAS now. [DM]

We managed to miss this out last week, but the current issue of Nature Genetics has a strange and wonderful paper on breast cancer genetics. The study looked at 2838 individuals with BRCA1 mutations that strongly predispose to breast cancer, and looked for non-BRCA1 variants associations with breast cancer in this group. They found an associated variant of chromosome 19, and replicated it in another 5986 BRCA1 carriers (where do they find this many BRCA1 carriers?). To top it all off, they looked at this variant in another 6800 breast cancer patients without BRCA1 mutations, and found no association. However, when they stratified their samples into ER+ and ER- associations, they found associations in both, but going in opposite directions! The variant predisposes people to ER- cancer, but is protective against ER+, and taken together they pretty much perfectly balance out. [LJ]

Friday Links

Over at Your Genetic Genealogist, CeCe Moore talks about investigating evidence of low-level Ashkenazi Jewish descent in her 23andMe data. What I like about this story is how much digging CeCe did; after one tool threw up a “14% Ashkenazi” result, she looked for similar evidence in 23andMe’s tool. She then did the same analysis on her mother’s DNA, finding no apparant Ashkenazi heritage, and to top it all off got her paternal uncle genotyped, which showed even greater Ashkenazi similarity. [LJ]

A paper out in PLoS Medicine looks at the interaction between genetics and physical activity in obesity. The take-home message is pretty well summarized in the figure to the left; genetic predispositions are less important in determining BMI for those who do frequency physical excercise than for those who remain inactive. This illustrates the importance of including non-genetic risk factors in disease prediction; not only because they are very important in their own right (the paper demonstrates that physical activity is about as predictive of BMI as known genetic factors), but also because information on environmental influences allows better calibration of genetic risk. [LJ]

Trends in Genetics have published an opinion piece in their most recent issue outlining the types of genetic variants we might expect to see for common human diseases (defined by allele frequency and risk), and how exome and whole-genome sequencing could be used to find them.  They give a brief, relatively jargon-free, overview of gene-mapping techniques that have been previously used, and discuss how sequencing can take this research further, particularly for the previously less tractable category of low-frequency variants that confer a moderate level of disease risk. [KIM]

More Sanger shout outs this week; Sanger Institute postdoc Liz Murchison, along with the rest of the Cancer Genome Project, have announced the sequencing of the Tasmanian Devil genome. The CGP is interested in the Tasmanian Devil due to a rare, odd and nasty facial cancer, which is passed from Devil to Devil by biting. In fact, all the tumours are descended from the tumour of one individual; 20 years or so on, and 80% of the Devil population has been wiped out by the disease. As well as a healthy genome, the team also sequenced two tumour genomes, in the hope of learning more about what mutations made the cells go tumours, and what makes the cancer so unique.

I have to say, this isn’t going to be an easy job; assembling a high-quality reference genome of an under-studied organism is a lot of work, especially using Illumina’s short read technology, and identifying and making sense of tumour mutations is equally difficult. Add to this the fact that the tumour genome is from a different individual to the healthy individual, this all adds up to a project of unprecedented scope. On the other hand, the key to saving a species from extinction could rest on this sticky bioinformatics problem, and if anyone is in the position to deal with it, it’s the Cancer Genome Project. [LJ]

Tasmanian Devil image from Wikimedia Commons.

Estimating the size of the DTC genomics market

Over the last few years, I’ve found that the same question keeps cropping up again and again at meetings whenever we talk about direct-to-consumer genetic tests:  “How many people are actually buying these tests?”. And because the companies (for whatever reason) have thus far been rather reticent about telling us how many kits they’ve sold, until recently the answer has simply been “I don’t know”. Yet if we’re going to talk about their sociological impact, their knock-on effect on health systems, and re-writing our regulatory laws around them, surely this is something we ought to have a handle on.

So… how many people have actually bought these tests then?

The problem is how to go about estimating a market size when there is precious little data, and the companies are all privately owned? First, we teamed up with some enthusiastic MBA students, who came up with the simple but elegant idea of using website hits as a proxy for market share. Using Compete.com, we found that the ‘big three’ – 23andMe, deCODEme and Navigenics – together had just over 662,000 unique hits during 2009, of which 23andMe received the lion’s share at nearly 80%. (They received fairly constant internet traffic throughout the year, with an average of around 43, 4 and 8 thousand unique visitors per month respectively). Pathway had only just launched when we did the analysis, resulting initially in a large transient spike in internet traffic, so we left it out.

Second, fortunately for us, in October 2009 23andMe stated publicly that their database contained “30,000 active genomes”, which were either sold or given away at a substantially reduced rate. (This rose to 50,000 in June 2010, but that doesn’t really alter the calculations). So, assuming a steady rate of uptake, this equates to perhaps 15,000 genome scans sold during 2009. Combining this with the internet traffic data, we estimate in this month’s Genetics in Medicine [Wright CF, Gregory-Jones S. Genet Med. (2010) 12: 594] that around 20-30,000 genome scans were sold in 2009, at a cost of between $300-1,000, which probably equates to a commercial value of around $10-20 million.

Is that really true?

Obviously there are substantial margins of error in any estimate made from such limited data, and caveats include the fact that we only considered tests sold during 2009 and we ignored (as much as possible) non-medical tests like paternity and ancestry testing. Nonetheless, this seems like a realistic ballpark figure, and importantly it is neither millions of people, nor hundreds of millions of dollars. We don’t know (yet) how big the market for whole genome sequences will be, or what impact preconception carrier testing might have, but at the moment it is clear that the market for DTC genetic testing is much smaller than expected or than one might surmise from all the media attention. Which means that the alleged harms to consumers, and the reputed knock-on effects on health systems, must necessarily be limited.

Nonetheless, I would dearly love to hear from any DTC genomics companies out there willing to share some more concrete data…

Friday Links

A lot of the Genomes Unzipped crew seem to be away on holiday at the moment, so today’s Links post may lack the the authorial diversity that you’re accustomed to.

I just got around to reading the August addition of PLoS Genetics, and found a valuable study from the Keck School of Medicine in California. They authors looked at the effect of known common variants in five American ethnic groups (European, African, Hawaiian, Latino and Japanese Americans), to assess how similar or different the effects sizes were across the groups.

The authors calculated odds ratios for each variant in each ethnic group, and looked for evidence of heterogeneity in odds ratios. They find that, in general, the odds ratios tend to show surprisingly little variation between ethnic groups; the direction of risk was the same in almost all cases, and the mean odds ratio was roughly equal across populations (the authors note that this pretty effectively shoots down David Goldstein’s “synthetic association” theory of common variation). One interesting exception was that the effect size of the known T2D variants was significantly larger in Japanese Americans, who had a mean odds ratio of 1.20, compared to 1.08-1.13 for other ethnic groups. The graph to the left shows the distribution of odds ratios in European and Japanese Americans.

These sorts of datasets will be very useful for personal genomics in the future, as a decade of European-centered genetics research has left non-Europeans somewhat in the lurch with regards to disease risk predictions. However, the problem with the approach in this paper is that even this in large a study (6k cases, 7k controls) the error bounds on the odds ratios within each group are still pretty large. [LJ]

Over at the Guardian Science Blog, Dorothy Bishop explains the difference between learning that a trait is heritable (e.g. from twin studies), and mapping a specific gene “for” a trait (e.g. via GWAS). Her conclusion is worth repeating:

The main message is that we need to be aware of the small effect of most individual genes on human traits. The idea that we can test for a single gene that causes musical talent, optimism or intelligence is just plain wrong. Even where reliable associations are found, they don’t correspond to the kind of major influences that we learned about in school biology. And we need to realise that twin studies, which consider the total effect of a person’s genetic makeup on a trait, often give very different results from molecular studies of individual genes.

There are also interesting questions to be asked about why there is such a gap between heritabilities estimated by twin studies, and the heritability that can be explained by GWAS results. That is, however, is a question for another day. [LJ]

Another article just released in PLoS Genetics provides a powerful illustration of just how routine whole-genome sequencing is now becoming for researchers: the authors report on complete, high-coverage genome sequence data for twenty individuals. The samples included 10 haemophilia patients and 10 controls, taken as part of a larger study looking at the genetic factors underlying resistance to HIV infection. While this is still a small sample size by the standards of modern genomics, there are a few interesting insights that can be gleaned from the data: for instance, the researchers argue from their data that each individual has complete inactivation of 165 protein-coding genes due to genetic variants predicted to disrupt gene function. I’ll be following up on this claim in a future post. [DM]

Finally, a quick shout-out to our fellow Sanger researchers, including Verneri Anttila and Aarno Palotie, along with everyone else in the International Headache Genetics Consortium, for finding the first robust genetic association to migrane. They looked at 3,279 cases and >10k controls (and another 3,202 cases to check their results), and found that the variant rs1835740 was significantly associated with the disease.

To tie in with the above story, in the region of 40-65% of variation in migraine is heritable, but only about 2% of this was explained by the rs1835740 variant. However, explaining heritability isn’t the main point of GWAS studies: a little follow-up found that rs1835740 was correlated with expression of the gene MTDH, which in turn suggests a defect in glutamate transport; hopefully this new discovery will help shed some light on the etiology of the disease. [LJ]

Setting the record straight

The current issue of Cell has some important correspondence in response to an essay published by Jon McClellan and Mary Claire King in April. Daniel covered the original piece and hosted a guest post from Kai Wang which detailed some of the more obvious flaws in their argument. Now, Wang and his colleagues from Philadelphia have published an official response in Cell, in parallel with a similar letter from Robert Klein and colleagues from New York. Accompanying these is a further reply from McClellan and King. Read on for an overview of three contentious statements made in the original piece, and the rebuttals to each.

Continue reading ‘Setting the record straight’


Page optimized by WP Minify WordPress Plugin