Over the last several years, the number of genetic variants unambiguously associated with disease risk has grown dramatically. However, interpreting these signals has been extremely difficult—most of the identified variants do not disrupt genes, and indeed many don’t fall anywhere near genes (this observation has even led some to discount these signals entirely). To an investigator interested in following up on these signals, this is somewhat depressing: how can we hope to explore how polymorphisms affect disease risk if they don’t seem to fall in any sort of genome annotation that we understand?
In this context, I thought I’d point to an important paper that, among many other things, gives the first systematic evidence that variants which influence disease are not just randomly scattered across the genome, but instead tend to fall in particular regions—in particular, enhancer elements (regions where DNA-binding proteins interact with DNA to influence gene expression).
The authors rely on the fact that, in the cell, DNA is wrapped around proteins called histones, which control how accessible the DNA is to things like transcription factors (see above figure). These proteins can be chemically modified, and it is now clear that particular patterns of modifications are predictive of the function of the DNA in the region—some modifications indicate transcribed genes, others regions of enhancer activity, others repressed regions, etc.
What the authors did in this study was generate genome-wide maps of several histone modifications in nine different cell types, and use this data to predict the function of each 200 base pair segment of the human genome in each cell type. There are a number of interesting analyses of these “maps” of genome function in the paper, but for our purposes here there’s one of particular interest: the authors took sets of SNPs associated with various diseases and simply asked, are these variants enriched in regions with any particular functional prediction? And indeed, for several phenotypes, there is a striking enrichment of association signals in enhancers elements in a relevant cell type. For example, SNPs which influence lipid levels are enriched in enhancers in a liver cancer cell line, and SNPs which influence the autoimmune disease lupus are enriched in enhancers in a lymphoblastoid cell line.
As these types of functional maps are generated in more cell types, I imagine there will be more stories like this. The problem with interpreting disease association studies, it seems likely, is largely due to our lack of understanding of genome function.
—-
Citation: Ernst et al. (2011) Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. doi:10.1038/nature09906
No doubt, there are many ways in which variants in the non-coding area of the genome can affect disease risk and variants located on enhancer elements is just one way this can happen.
As the cost of sequencing continues to drop and an increasing number of people have their genomes sequenced, we can expect to uncover many more such statistical correlations in disease association studies.
re Sajid above – who will maintain these large libraries of genomes cheaply made and how will access to this information come? Do we trust FDA to be involved in this part of the story? I wonder what 23 and Me have to say on this point.
Here is more written on genetics: Genetics Blogstream
Is there much known about if the tertiary structure affects gene control? A variant can be spatially very close to a gene but be vary far away in the linear sequence – is it feasible that this could influence expression?
@Keith,
I think the current model for how distant enhancers affect gene expression is that they loop around and interact with the promoter of the relevant gene. So SNPs in those regions could definitely influence expression. See, for example, the case of 8q24 and cancer, eg:
http://www.nature.com/ng/journal/v41/n8/abs/ng.403.html