In May of last year, Li and colleagues reported that they had observed over 10,000 sequence mismatches between messenger RNA (mRNA) and DNA from the same individuals (RDD sites, for RNA-DNA differences) . This week, Science has published three technical comments on this article (one that I wrote with Yoav Gilad and Jonathan Pritchard; one by Wei Lin, Robert Piskol, Meng How Tan, and Billy Li; and one by Claudia Kleinman and Jacek Majewski). We conclude that at least ~90% of the Li et al. RDD sites are technical artifacts [2,3,4]. A copy of the comment I was involved in is available here, and Li et al. have responded to these critiques .
In this post, I’m going to describe how we came to the conclusion that nearly all of the RDD sites are technical artifacts. For a full discussion, please read the comments themselves.
Position biases in alignments around RDD sites. For each RDD site with at least five reads mismatching the genome, we calculated the fraction of reads with the mismatch (or the match) at each position in the alignment of the RNA-seq read to the genome (on the + DNA strand). Plotted is the average of this fraction across all sites, separately for the alignments which match and mismatch the genome.
Continue reading ‘Questioning the evidence for non-canonical RNA editing in humans’
UPDATE 3/17/12: A more extensive analysis of the paper discussed in this post is here. Several groups have concluded that at least 90% of the sites identified are technical artifacts
The “central dogma” of molecular biology holds that the information present in DNA is transferred to RNA and then to protein. In a paper published online at Science yesterday, Li and colleagues report a potentially extraordinary observation: they show evidence that, within any given individual, there are tens of thousands of places where transcribed RNA does not match the template DNA from which it is derived . This phenomenon, called RNA editing, is generally thought to be limited (in humans) to conversions of the base adenosine to the base inosine (which is read as guanine by DNA sequencers), and occasionally from cytosine to uracil. In contrast, these authors report that any type of base can be converted to any other type of base.
If these observations are correct, they represent a fundamental change in how we view the process of gene regulation. However, in this post I am going to point out a couple of technical issues that, if not properly taken into account, have the potential to cause a large number of false positives in this type of data. The main point can be summarized like this: RNA editing involves the production of two different RNA and/or protein sequences from a single DNA sequence. To infer RNA editing from the presence of two different RNA and/or protein sequences, then, one must be very sure that they derive from the same DNA sequence, rather than from two different copies of the DNA (due to, for example, paralogs or copy number variants). Although this issue has the potential to be a large source of false positives in a study like this, I will discuss an additional technical problem that could also result in false positives.
Continue reading ‘Notes on the evidence for extensive RNA editing in humans’