For many diseases we have very little ability to determine who is at high or low risk; the risk factors are unreplicated, complicated, or understudied. However, for other diseases we can do much better. Alzheimer’s disease is a form of senile dementia that is characterised by abnormal clustering of proteins in the brain (right). We know a number of important risk factors for Alzheimer’s, and knowing your own risk factors may seriously change your estimate of the chance of developing the disease. But how can you calculate this risk?
This is going to be somewhat of an information deluge, as I go through everything to think about when you predict a complex disease, including how to calculate genetic and environmental risks, and how important these risks are, both individually and all together. I will demonstrate all of the calculations on the various GNZ contributors, and in particular how I have worked out my own risk.
I’ll measure the risks in terms of odds ratios; you may want to read the introduction to Carl’s post from earlier this year to refresh your mind on what this means. I will also use the disease probability; this is simply the chance of developing Alzheimer’s, or equally, the percentage of people with this set of risk factors who will develop the disease.
Also note that an important factor to consider is the baseline lifetime risk, the total proportion of people who will develop Alzheimer’s before they die. I am going to use a lifetime risk of 9% for men and 17% for women, taken from an Alzheimer’s Association report, but getting a good estimate of this is actually very difficult, and will vary from country to country.
Inferring APOE status without genotyping
The most important risk factor for Alzheimer’s disease is a gene called Apolipoprotein E, or APOE. This gene has a number of different alleles, called ε2, ε3 and ε4. ε2 and ε3 protect against Alzheimer’s, whereas ε4 increases your risk of developing it. Every individual has two copies of this allele, and the combination you have determines how likely you are to develop the disease; ε2/ε2 is the lowest risk, and ε4/ε4 is the highest.
The APOE alleles are defined by two genetic variants in the gene, rs429358 and rs7412. The different genotypes, along with odds ratios (taken from this paper) are below:
Now, if you know your genotypes at rs7412 and rs429358, you can get your APOE genotype, and if you are a deCODEme or a 23andMe V3 customer, you can do this easily; look up your SNPs, check the table above, and get your odds ratio. However, most of the GNZ cohort have been genotyped on the 23andMe V2 chip; while this contains rs7412, it does not have rs429358.
However, all is not lost. Variants in the genome are correlated, and it is possible to make educated guesses about variants that you haven’t genotyped, using variants for which you have genotypes. This process is called genotype imputation. The best way of performing imputation is to use one of the many free imputation programs written by statistical geneticists, such as Beagle or IMPUTE2, and to use a state-of-the-art reference set, such as one from the 1000 Genomes Project.
I will write something next week on how you can do this. However, for right now I will just give the results for the GNZ members:
Imputation is probabilistic, and provides us with a certainty for our predictions. The ε3/ε3 guesses are very confident, but the rarer types are less so. Impute2 is pretty uncertain about Caroline’s somewhat strange (and pretty rare) ε2/ε4 genotype, though it did make the correct guess when compared to her deCODEme results. To compensate for the uncertainty, I’ve shrunk the down the odds ratios at uncertain guesses.
My APOE status is definitely ε3/ε3; impute is 100% certain, plus I have checked against other genotyping. My odds ratio is thus 0.56.
Other genetic factors
APOE is a very important genetic risk factor for Alzheimer’s, but it is not the only one that we know about. There are at least five other well-supported risk variants (I have taken these from this study):
To get your odds ratio, you just count the number of risk alleles you have for each variant, look up the number in the table, and times them together for all the variants. For instance, my genotypes are GG, CT, CT, TT and CC. This is 0,1,1,2 and 2 risk alleles for each variant, and my odds ratio at these sites is therefore 0.89 X 0.97 x 0.92 x 1.13 x 1.08 = 0.97.
I can combine this with my APOE odds ratio, to give 0.56 x 0.97 = 0.543. The odds of a man developing Alzheimer is 0.09 to 0.81, or 0.111 to 1. My odds ratio of 0.643 changes this to 0.543 x 0.111 = 0.0603 to 1, or 5.7%. Doing this for all GNZ members gives us these odds ratios and disease probabilities across all the genetic factors:
|Name||Combined OR||Alzheimer Probability|
We can also visualise how much each individauls’ risk changes. In the plot below, red arrows show an increase in risk, and green arrows show a decrease:
Notice that Kate’s overall risk is higher than Don’s, but she is still “luckier” in some sense, because her genetic factors majorly decrease her risk from her elevated level as a female, whereas Don’s risk slightly increases his from his lower male baseline.
While there aren’t any environmental factors that influence Alzheimer’s as strongly as APOE, there are a number of known and suspected risks. Most of these risk factors can only really be applied reliability to older people, and include things like vascular disease, head trauma and the like. However, there are three well-associated traits that can apply to everyone: level of education, physical activity and alcohol consumption:
There is has been some research on the interaction between genetics and environment in Alzheimer’s, but not enough for me to attempt to model it. Making the (probably wrong) assumption that we can consider genetics and environment separately, we can just multiply the odds ratios together. So as an alcohol-imbibing PhD student who exercises regularly, my environmental odds ratio is 0.72 x 0.89 x 0.73 = 0.47, and my overall environment+genetics odds ratio is thus 0.47 x 0.543 = 0.26. My Alzheimer’s probability is thus 2.8%.
How predictive are these factors
So how predictive are these risk factors, when taken together? Well, assuming that they are independent, we can make a good guess.
So these factors can capture just over a fifth of the total variance in disease risk. Notice that the majority of this risk comes from APOE, with sex, other genetics and environment factors accounting for just a third of the total explained variance.
We can also calculate an AUC value using the GENROC calculator. The value is 0.75. This means that, given two individuals, one of whom has Alzheimer’s, one of whom doesn’t, both of whom have calculated their odds ratio using the method above, the person with the disease will have a larger total odds ratio than the one without the disease 75% of the time.
To look at it a final way, of those who take this test, 5% will find that they have a risk above 31%, and a further 5% will find that they have a risk below 3%.
Prediction, privacy and prevention
Via genotype imputation, we can accurately assess APOE risk using 23andMe’s v2 chip, which does not actually genotype the APOE allele. Thus a strong medically predictive factor can be inferred using entirely “non-medical” variants taken from the surrounding region. This really illustrates how the distinction between medically and non-medically relevant genetic variants is pretty artificial. This has been observed before, and raises some real issue about privacy, and the attempt to separate (and separately regulate) medical and non-medical genetic information.
So what can we do with this information? A lot of people (myself included) would just want to know, as health information about themselves, regardless of what can be done with that information. For Alzheimer’s, there are no strong preventative measures, and treatment options are currently limited. If you aren’t doing regular exercise, you can knock 30% off your risk by taking up a sport (which, of course, would have a lots of other benefits), and eating less fat and red meat may knock another few percent off. While these things are worth doing, they are worth doing for everyone regardless of risk, and the effects are pretty small compared to the overall risk anyway. Perhaps the most useful thing that a high-risk prediction can do is to look out for the early signs of the disease; if you have a 2% lifetime risk, you may not be that worried if you start forgetting your keys, but if you risk is 50%, you should be considering seeing a medical professional.
What the personal utility of this information is will vary from individual to individual. What the above risk factors can give you is a large (though by no-means complete) insight into your personal risk. What you will do with this is up to you.
Amyloid plaque image taken from this paper in PLoS Medicine
Update 29/05/2015: Updated some numbers due to a numerical error in the APOE odds ratios