All Genomes Are Dysfunctional: Broken Genes in Healthy Individuals

Loss-of-function (LoF) variants have the potential to throw protein-coding genes out of order, from slight alterations in the genetic code to huge changes like the complete removal of one or more genes. 

Not only do DNA changes like this largely influence serious diseases such as muscular dystrophy or cystic fibrosis, but they also suggest more redundancy in the human genome than originally expected. 

All Genomes Are Dysfunctional: Broken Genes in Healthy Individuals

With an aim to predict whether a disease is caused by a novel mutation, myself and the HAVANA Gene Annotation team at the Sanger Institute carried out testing to identify ‘real’ variants and study their effect on gene changes. 

Breakdown of a “typical” genome and the number of LoF variants

Dysfunctional Genomes

After applying multiple filters, we looked at a specific core set of ‘real’ LoF variants, whittled down to 253 of the most common.

It was interesting to find that many correlated to blood type, drug metabolism, and muscle performance, for example, but almost nothing to show a link with complex diseases like type 2 diabetes. 

The genes that were affected by common LoF variants were often less evolutionarily conserved, had more similar genes in the genome, and had fewer protein-protein interactions (that we could tell).

We concluded that these genes are therefore less functionally important, hence why they’re not found in a large portion of the population. 

Our research showed that it’s highly rare for genuine LoF variants to be detected, with most found in under 2% of the population.

This indicates that natural selection has phased out the mildly or seriously deleterious LoF variants and the rarer variants have more of an effect on disease risk. 

Error Rates

These days, even students at Ph.D. level are sequencing genomes, and more testing brings more results.

It’s not uncommon for enthusiastic researchers to allow themselves to believe that variants are less likely to be false in genomes with, say, 99.5% accuracy overall. 

It’s actually more likely to be a false positive if the predicted function impact of a sequence variant is greater.

This is mostly due to natural selection as if a variant has more effect, it’s more likely to be harmful, however, it can also be down to errors like false positives. 

In terms of the study, this meant a lot of discarded variants initially which is typical of any first-stage data sequencing, but the most recent data and algorithms are more accurate as a result of these first hurdles. 

What Next?

The aim is to continue growing the 1000 Genomes Project and improve the filters to then apply them to human sequence data on a larger scale, which will help to grow an index of definite LoF variants.

From there, we can determine their effect on human variation and the risk of disease. 

Some instances where rare LoF variants may actually be beneficial in preventing disease include PCSK9 and heart disease, IFIH1 and type 1 diabetes, and CARD9 and Crohn’s disease have already been identified, but there is still a long way to go.