Last week, scientists at the European Molecular Biology Laboratory reported that they had sequenced the genome of the Henrietta Lacks, or “HeLa”, cell line. This report was met with considerable consternation by those who (justifiably, in my opinion) wondered why scientists are still experimenting on a cell line obtained without consent in the 1950s . In response to a bit of a backlash, the researchers removed the HeLa sequence from the public internet, and even the paper itself might disappear from the formal scientific literature.
However, it is unfair to treat the authors of this paper as scapegoats for the systematic failure of scientists to deal with issues surrounding genomic “privacy”. Consider this important piece of information: the genome sequence of the HeLa cell line has been publicly available for years (and remains so).
Continue reading ‘Henrietta Lacks’s genome sequence has been publicly available for years’
This is a guest post by Peter Cheng and Eliana Hechter from the University of California, Berkeley.
Suppose that you’ve had your DNA genotyped by 23andMe or some other DTC genetic testing company. Then an article shows up in your morning newspaper or journal (like this one) and suddenly there’s an additional variant you want to know about. You check your raw genotypes file to see if the variant is present on the chip, but it isn’t! So what next? [Note: the most recent 23andMe chip does include this variant, although older versions of their chip do not.]
Genotype imputation is a process used for predicting, or “imputing”, genotypes that are not assayed by a genotyping chip. The process compares the genotyped data from a chip (e.g. your 23andMe results) with a reference panel of genomes (supplied by big genome projects like the 1000 Genomes or HapMap projects) in order to make predictions about variants that aren’t on the chip. If you want a technical review of imputation (and the program IMPUTE in particular), we recommend Marchini & Howie’s 2010 Nature Reviews Genetics article. However, the following figure provides an intuitive understanding of the process.
Continue reading ‘Learning more from your 23andMe results with Imputation’
UPDATE 3/17/12: A more extensive analysis of the paper discussed in this post is here. Several groups have concluded that at least 90% of the sites identified are technical artifacts
The “central dogma” of molecular biology holds that the information present in DNA is transferred to RNA and then to protein. In a paper published online at Science yesterday, Li and colleagues report a potentially extraordinary observation: they show evidence that, within any given individual, there are tens of thousands of places where transcribed RNA does not match the template DNA from which it is derived . This phenomenon, called RNA editing, is generally thought to be limited (in humans) to conversions of the base adenosine to the base inosine (which is read as guanine by DNA sequencers), and occasionally from cytosine to uracil. In contrast, these authors report that any type of base can be converted to any other type of base.
If these observations are correct, they represent a fundamental change in how we view the process of gene regulation. However, in this post I am going to point out a couple of technical issues that, if not properly taken into account, have the potential to cause a large number of false positives in this type of data. The main point can be summarized like this: RNA editing involves the production of two different RNA and/or protein sequences from a single DNA sequence. To infer RNA editing from the presence of two different RNA and/or protein sequences, then, one must be very sure that they derive from the same DNA sequence, rather than from two different copies of the DNA (due to, for example, paralogs or copy number variants). Although this issue has the potential to be a large source of false positives in a study like this, I will discuss an additional technical problem that could also result in false positives.
Continue reading ‘Notes on the evidence for extensive RNA editing in humans’
[Editor’s Note: This guest post is contributed by Blaine Bettinger. Blaine is the author of The Genetic Genealogist, a blog that examines the intersection of genetics and ancestry, and a patent attorney at Bond, Schoeneck & King in Syracuse, NY.]
As you may have heard, I recently made my 23andMe and Family Tree DNA autosomal testing results available for download online at “mygenotype,” and dedicated the information to the public domain (if dedicating DNA sequence to the public domain is even possible – I’m currently doing some research in this area and expect to write more in the future). [Editor’s Note: see additional comments on personal genomics data in the public domain at the end of this post.]
At “mygenotype” you can download the following:
My Family Tree DNA Results:
- Affymetrix Autosomal DNA Results (2010)
- Affymetrix X-Chromosome DNA Results (2010)
- Illumina Autosomal DNA Results (2011)
- Illumina X-Chromosome DNA Results (2011)
My 23andMe Results:
- V2 Results (2008)
- V3 Results (2010)
- Y-DNA Results (2010)
- mtDNA Results (2010)
You can also find my SNPedia Promethease reports:
In addition to my genome, Razib Khan of Gene Expression has a spreadsheet of approximately 48 other genomes that are available for download online.
A Challenge To YOU
Now that the information is out there, available to anyone who might be interested, it remains to be seen who might be interested in the information.
Continue reading ‘My Genome Online – A Challenge To You’
For many diseases we have very little ability to determine who is at high or low risk; the risk factors are unreplicated, complicated, or understudied. However, for other diseases we can do much better. Alzheimer’s disease is a form of senile dementia that is characterised by abnormal clustering of proteins in the brain (right). We know a number of important risk factors for Alzheimer’s, and knowing your own risk factors may seriously change your estimate of the chance of developing the disease. But how can you calculate this risk?
This is going to be somewhat of an information deluge, as I go through everything to think about when you predict a complex disease, including how to calculate genetic and environmental risks, and how important these risks are, both individually and all together. I will demonstrate all of the calculations on the various GNZ contributors, and in particular how I have worked out my own risk.
I’ll measure the risks in terms of odds ratios; you may want to read the introduction to Carl’s post from earlier this year to refresh your mind on what this means. I will also use the disease probability; this is simply the chance of developing Alzheimer’s, or equally, the percentage of people with this set of risk factors who will develop the disease.
Also note that an important factor to consider is the baseline lifetime risk, the total proportion of people who will develop Alzheimer’s before they die. I am going to use a lifetime risk of 9% for men and 17% for women, taken from an Alzheimer’s Association report, but getting a good estimate of this is actually very difficult, and will vary from country to country.
If you want to know more about Alzheimer’s, including prevention, diagnosis and treatment, you can read about the disease on the Mayo Clinic or NHS Choices websites.
Continue reading ‘Calculating your Alzheimer’s risk’
When Daniel first asked me if I wanted to be involved in Genomes Unzipped, I was one of the more hesitant participants. I weighed up the pros and cons, but in the end what sold me was that after almost a decade of curiosity I finally had the opportunity to find out my genotype for the hereditary haemochromatosis (HH) variants in the gene HFE. But things didn’t unfold quite how I’d expected, and I’m still left with some unanswered questions about HH in my family.
Continue reading ‘Digging deeper into my disease risk’
For some people, genetic information is formidably powerful. It can reveal that you have inherited a debilitating disease which lies unavoidably in your future, that you have a massively increased susceptibility to a particular cancer which can only be mitigated by surgery, or that you are not biologically related to your parents and siblings.
But, for many people, it’s actually quite mundane and uninformative. I’m one of those people. Undergoing genome-wide profiling was interesting, educational and worthwhile, and I would certainly recommend it as a voyage of exploration. But it hasn’t really been useful for my health. (It’s perhaps worth noting that a SNP profile covers only a fraction of the potential disease-causing variants, but nonetheless I doubt there’s anything I need to be scared of lurking in the rest of my genome.)
So, what did I learn?
Continue reading ‘My delightfully uninteresting genome’
In my last post, I discussed how I used 23andMe data to test hypotheses about my ancestry. In particular, I was intrigued by Dienekes Pontikos’s result suggesting that I (and my colleague Vincent) might be partly Ashkenazi Jewish. Ultimately, however, I concluded that his algorithm was not properly modeling my southern European ancestry (inherited from one Italian grandparent), and that this was leading to a spurious result.
I was wrong.
Continue reading ‘Am I partly Jewish? An unexpected turn of events’
The story behind this post is that my wife recently gave birth to our first son and we experienced a funny story about genetics the day following the birth. Before I start I should say, to reassure the reader, that I have no doubt that I am indeed the father of my child. But as you will see, a non-geneticist might have become worried when faced with the same situation.
Firstly, my wife has a negative rhesus type. This has important medical implications because if the baby were to have a positive rhesus type, she would create antibodies against this marker which could be life-threatening for any subsequent child of positive rhesus type. Basically this is a relatively big deal, but there are ways to deal with this, and therefore knowing the blood type of the baby is essential.
The day after the birth, while we are both lying on our bed, very tired, a midwife comes by and asks us whether we know the rhesus status of the baby. We answer negatively, she checks her notes and says, “Ah, good news, the baby is rhesus negative. The father must also be rhesus negative then!” Well, I am not…
Continue reading ‘Rhesus, paternity tests and 23andMe’
In a previous post I discussed copy number variation, a form of genetic variation not broadly reported by DTC companies. In today’s post I provide a very simple program that allows one to identify potential deletions on the basis of high density SNP genotypes from a parent-offspring trio, and report on the results of running this program on data from my own family.
The program uses an approach that I applied as a graduate student to mine deletions from the very first release of data from the International HapMap Project in 2004. The idea, explained in my last post, is to look for stretches of homozygous genotypes interspersed with mendelian errors, which might indicate the transmission of a large deletion. Let’s be clear, this is a simple analysis that most programmers and computational biologists would find straightforward to implement. It is probably a good practice problem for graduate students and would-be DIY personal genomicists.
I obtained 23andMe data from both my mom and dad, and, with their consent, ran the three of us through the program. I was mildly surprised to find only two potential deletions; I had previously speculated that one would find 5-10 deletions per trio with the 550K platform used by 23andMe.
Continue reading ‘Finding the holes in our genomes’