In May 2019, it was reported in an article that over ten thousand sequence mismatches were observed between messenger RNA and DNA from the same individuals.
More recently, three technical comments were published by Science surrounding this article. It was concluded that at least 90% of the Li et al. RDD sites are technical artifacts. Here, we are going to explain how this conclusion was drawn.
For each RDD site that had at least five reads mismatching the genome, it was calculated that the fraction of reads with the mismatch, or the match, at each position in the alignment of the RNA-seq read to the genome on the + DNA strand.
Over 10,000 exonic RDD sites were found, which included thousands of RDD sites that were predicted to change protein sequence. These results actually implied the existence of a minimum of one, if not more, novel mechanisms of gene regulation.
This questioned some of the basic assumptions that are used daily in genetics.
It turns out that it is not the existence of RDD sites that was so surprising; it was the significant biological impact of the sites and the subsequent implication that there are new regulatory pathways that were not previously known to us.
The reason that some think that all of the RDD sites in Li et al. are false positives is that two groups have raised issues regarding the reported RDD sites. Both of these sources claimed that the majority of the sites presented in the findings were false positives.
Actually, mismatches to the genome at RDD sites are almost always occurring at the ends of sequencing reads. All three of the technical comments that were made after the paper was published had included this observation.
In response to the comments being made, a plausible explanation was proposed for the observations. To generate the cDNA, they added random short DNA sequences to each sample that acted as a primer for a DNA synthesis reaction.
At some sites, the random primers were not perfect matches to the mRNA, but they were still able to bind. During synthesis, the mismatches from the primers were incorporated into the cDNA, and this, in turn, led to a false signal of RNA editing. This explains the previous results.
Another exercise that aimed to validate the finding had involved identifying peptide sequences that correspond to ‘edited’ RDD sites. It was pointed out that many of these sequences are actually good matches to multiple genes.
It was concluded that these RDD sites are false positives due to mismatched reads from paralogous genes.
In conclusion, when looking for regions of the genome that look strange during an analysis, an individual will find strange and unexpected results. This applies even when a systematic error only affects 0.001% of the bases in the genome.
It turns out that the most interesting findings are actually less likely to be real. This was discovered through the process of finding false positives and mismatched results.
However, it does remain a possibility that forms of RNA editing that have not always been known are active in humans, and RNA sequencing technology can be helpful in finding out if new forms do exist.