Guest post by Ben Neale: Evaluating the impact of de novo coding mutation in autism

[Dr. Neale is currently an Assistant in Genetics in the Analytic and Translational Genetics Unit at Massachusetts General Hospital and Harvard Medical School and an affiliate of the Broad Institute of Harvard and MIT. Dr. Neale’s research centers on statistical genetics and how to apply those methods to complex traits, with a particular focus on childhood psychiatric illness such as autism and ADHD.]

Today, in Nature, three letters (1, 2, 3) were published on the role of de novo coding mutations in the development of autism. I am lead author on one of these manuscripts, working in collaboration with the ARRA Autism Consortium. In this post, I’ll describe the main findings of our work as they relate to autism and how we approached the interpretation of de novo mutations. In essence, de novo point mutation is likely relevant to autism in ~10% of cases, but a single de novo event is not likely to be sufficient to cause autism. Underscoring this is that fewer than half of the cases had an obviously functional point mutation in the exome. However, three genes, SCN2A, KATNAL2 and CHD8 have emerged as likely candidates for contributing to autism pathogenesis.

De novo is Latin for “from the beginning,” and when describing genetic variation or mutation means that the variant has spontaneously arisen and was not inherited from either parent. In autism, de novo copy number variants are among the earliest clearly identified genetic risk factors (see Sanders et al. and Pinto et al. for reviews). Given that these events are novel, natural selection has not acted on them, except for instances where the point mutation is lethal in early life. With next generation sequencing (NGS), we now have the opportunity to identify these events directly.

In this study we explored the impact of de novo mutations on autism by performing targeted sequencing of the protein-coding regions of the genome (known collectively as the exome, and comprising just 1.5% of the genome as a whole) in 175 mother-father-child trios in which the child was diagnosed as autistic. Having sequence from all three members of each family allowed us to find mutations that had arisen spontaneously in a patient’s genome, rather than being inherited from their parents.

We have made a pre-formatted version of our manuscript available here. In this post I just wanted to highlight some of the key lessons emerging from our study.

We must carefully calibrate our prior expectation to evaluate de novo mutation

To evaluate the observed de novo events, we calculated the expected number of events in the exome taking into account the sequence context. Basically, different sequences of bases have different levels of mutability. A key driver of this variation in mutation rate is the amount of GC content (the proportion of DNA that is C-G rather than A-T base pairs). The GC content of the exome is approximately 50% compared to the 40% genome-wide average. As a consequence, protein-coding sequences are inherently more mutable. Taking this into account, the expected number of de novo events per person in the exome is a shade over 1. However, current exome sequencing technologies do not capture all regions equally well (and some regions aren’t captured at all), which revises down the expectation to 0.87 per person.

It’s worth emphasizing these numbers: that means the majority of people who have their exome sequenced will be found to carry at least one de novo mutation in a protein-coding gene, even if they are perfectly healthy. That means that human geneticists must be extremely cautious in assigning disease-causing status to such mutations.

We observe only a modest increase in rate, suggesting a limited role of de novo coding mutation

Overall we observed an average of 0.92 events per trio, slightly higher than expected, but not significantly so. Furthermore, a majority of cases did not have an obviously deleterious point mutation in the exome. However, we did observe more nonsense mutations than expected, suggesting that some of the nonsense events are relevant. We also observed a significant excess of protein-protein interaction for genes that harbor de novo missense, splice site or nonsense mutations.

Few genes are hit multiple times, highlighting the complex genetic basis of psychiatric illness

When we combined the events identified in our paper with the two companion papers, we identify 18 genes that have de novo functional mutations in two separate individuals, where we expect ~12 by chance. These results reinforce the idea that many different genes are involved in the causation of autism, which has long been hypothesized for this disease and for other psychiatric traits such as schizophrenia.

We observe two loss-of-function (LoF) de novo events in three genes (for a nice overview of LoF mutations, Daniel’s previous post is a great resource). These three genes are SCN2A, CHD8 and KATNAL2. While our paper was in review, we also performed additional trio sequencing, identifying a third de novo LoF allele in SCN2A. To put this in perspective, across approximately 600 trios, we observe only one gene hit independently by three likely functional mutations. In other words, de novo mutations contributing to autism risk are not concentrated in just a few critical genes – they are spread across many genes, each contributing just a small proportion of the overall genetic risk of this disease.

We explored these three genes in an expanded exome sequencing dataset of 935 cases and 870 controls and the Exome Variant Server (EVS). The EVS contains approximately 3,500 European Americans and 1,850 African Americans. For SCN2A no additional LoF alleles were observed in cases, controls, or the EVS. So across approximately 1,500 autism patients, we observe 3 cases with LoF mutations in SCN2A, which works out to be 0.2% of cases. For CHD8 we observe an additional 3 LoF alleles in the cases cohort, but none in any control sample, bringing the total to 5 LoF alleles in 1,500 cases (0.33%). For KATNAL2 we observe 3 additional LoF alleles in cases, but also observe 3 LoF alleles in the control and EVS data, which works out to be an odds ratio of approximately 5, with again 0.33% cases having such an allele. All three of these genes are now strong candidates for playing a role in autism, and the identification of these three genes is certainly progress for understanding the biological basis of autism. The interpretation of these results was strongly informed by the inclusion of additional exome sequencing data, suggesting that further trio and case control sequencing will inform gene identification efforts for autism.

Future directions

The identification of gene candidates for autism is still a clear priority for gaining insight into the biological basis of the disease. The genes highlighted by this work are just the first few pieces of the complex puzzle. Further efforts to fully integrate all of the sequencing data of cases, controls and trios are currently being facilitated by the Autism Sequencing Consortium (ASC), a collaboration organized by the NIMH. Clearly, more sequencing data must be generated to identify additional genetic effects that predispose to this disease.

(1) Neale et al. (2012) Patterns and rates of exonic de novo mutations in autism spectrum disorders. doi:10.1038/nature11011

(2) Sanders et al. (2012) De novo mutations revealed by whole-exome sequencing are strongly associated with autism. doi:10.1038/nature10945.

(3) O’Roak et al. (2012) Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. doi:10.1038/nature10989.

  • Digg
  • StumbleUpon
  • Facebook
  • Twitter
  • Google Bookmarks
  • FriendFeed
  • Reddit

15 Responses to “Guest post by Ben Neale: Evaluating the impact of de novo coding mutation in autism”

  • Dr John Allen Berger

    Dear Dr. Ben:
    Really enjoyed your article and your research domain. i.e. de novo coding muations; I too have been working on de novo research but on the other side of the Genetic Code equation. i.e. purine and pyrmidine nucleotides which as you know are the “real” not abstract molecular structures for the RNA and DNA nucleic acids which encode the structure-sequence instruction sets to make the functional tools of organic life i.e. proteins, enzymes, and nearest neighbor amino acids.
    I am especially targeting the purine metabolic pathways for causality and consequences of the (hyoxanthine, xanthine) families. Inosine, xanthosine, adenosine, guanosine and their mono,duo,trio phosphates eg; IMP (the parent purine nucleotide – first instance of closed purine ring.
    There are many,many,in born errors of metabolism which include DSM IV symptoms of autism. i.e. Scids (hpgrt) hypoxanthine-guanine salvage enzyme plus the ommission of the entire inosine-xanthosine family metabolic intermediates because of the RNA “Tie club” substituted Uracil family from Thymine family in DNA to RNA to Protein linear central dogma.
    My work of 12 years on why every man made therapeutic molecule i.e. prescription drugs in particular, have multiple toxic side effects. We believe it is because Inosine = I and xanthosine = X should have been added to the existing DNA (ATGC),& RNA (AUGC)genetic codes. Addition, not substitution is nature’s main evolutionary mathematical/geometric operators. Base Pairing (A+T)(G+C) (Purine + Pyrmidine), Male + Female, are all examples of the theory of threes which seems to be a universal pattern for quantum atomic molecular wave/particles to replicate, reproduce, and growth-development etc.
    If you check your three protein exome genes, I am quite certain you will find links to IMPDH 1,2,3, Adar, 1,2,3 Adat,xanthine oxidase, xanthine dehydrogenase, HPGRT, nucleotide kinases, and a whole host of tRNA cell cycle regulators P53, IL6, and many sulfur (thioester) compounds which have been excluded from the conversation.
    I would like to discuss our mutual shared interests on the topic at hand. We are both working on different sides of the coin, code or external/internal mirror images.
    Hope to hear from you soon.
    Best regards, drjab699

  • Ben – congrats to you, Mark, and the other members of your team as well as the State and Eichler groups on this landmark set of papers – landmark in the sense that it plants a new flag farther along the trail to a complete understanding of autism than perhaps ever before.

    Was hoping you might comment on a couple of things.

    First, were you and your colleagues surprised by the findings? Based on the de novo CNV lit and by extension from the many rare variants for mental retardation, I seem to recall that some commentators fully expected that exome sequencing would completely crack autism open, that its complexities would reduce to a couple of handfuls of genes.

    Second, unless I am mistaken, the rare exon variants identified by these studies can’t account for the heritability of autism (which is probably in excess of 70%, save for one divergent study), particularly if the penetrance is incomplete.

    Third, you did look at chrX and chrY, yes?

    As with all outstanding science, some questions get answered and new questions are raised. Best, pfs

  • I’m also interested in things along the lines of the questions raised by PF Sullivan. Ben, your paper seems somewhat less optimistic about the de novo paradigm than the other two papers, especially the one by Sanders et al. Am I reading this correctly?

  • It’s great that technology enables the addition of so much more genetic data to these studies. However, without knowing more about the subphenotypes of autism, isn’t this study design lacking a very important component?

    A new phenotyping tool is being developed here in the Bay Area called ChARM tracker, Hopefully resources like this will be widely adopted by parents and physicians and provide critically important information to autism research.

  • Thanks for a really clear explanation of the findings. But as a non-geneticist, I’m struggling to grasp their significance.

    Overall, you don’t find significantly more mutations in the autistic individuals than you’d expect by chance.

    Nevertheless, between the 3 studies, you’ve identified 18 mutations that are found in more than one autistic individual. You’d expect at least 12 by chance, but presumably 13 or 14 wouldn’t have been *significantly* above chance.

    Even if some of these mutations are actually causative of autism in a tiny minority of cases, you have no way of knowing which of the 18 these might be.

    What am I missing? Genuine question.

  • I’d like to pick up on Linda Avey’s point about subphenotypes of autism.

    In your post you refer to autism as a ‘disease’. I’m aware that this term can be used to encompass anything that remotely pathological, but am I wrong to infer from it that you view autism as a homogeneous condition?

    Strictly speaking, autism began as, and continues to be, a set of behavioural signs and symptoms. It will probably remain so until we find the cause(s) of those signs and symptoms. Autism is associated with a wide range of somatic disorders, including viral and bacterial infections; we don’t know the causal relationship, but it’s a pretty safe bet that at least some of them are the cause of autistic characteristics. A genetic predisposition might or might not be involved. I suppose what I’m asking is what your definition of autism is.


  • Genetics definitely not my area. But I’ll add a question about your sample.

    Of the 175 trios (350 parents), only 3 of the parents had any autistic traits. Were trios selected for parents free of autistic traits?

    By the way, the supplementary info seems to say there were 174 trios, which is a bit confusing.

  • The research into gene mutations and de novo gene mutations could learn a lot about this area by studying the severe genetic syndromes and the implications for idiopathic cases.

    For a full discussion, including the comments section with references see:

  • Benjamin Neale


    A couple of questions have been posted and I’ll do my best to answer accordingly:

    1) Was I surprised by the results – difficult to say. I didn’t expect de novo mutation to explain a huge fraction of autism. Related to your second question Pat – de novo mutations with respect to heritability is challenging. For the twin method there may be sharing in MZs, though no guarantee because of when the mutation might have arisen and how that relates to mosaicism. For parent offspring estimates of heritability then the answer is no, in terms of the estimate. For siblings there might be sharing of de novo events, but that sharing is unlikely, depending on when the event arose and how frequent it is in the progenitor cells in the father. The approach did identify a handful of genes – perhaps not as many as some had hoped/promised, but some nonetheless which is definitely an improvement for autism genetics.

    2) Yes we looked at the sex chromosomes. Nothing exciting in our data, though remember point mutations more often than not come from the father, and most cases with autism are male in the sample – and so point X de novo mutation is lower.

    3) Our paper is less optimistic for a couple of reasons, I think. Our results show the smallest effect size in terms of rate and functional categorization. Secondly, I’m only comfortable with claiming the genes because of the additional loss of function alleles observed in the case control sample and the rate of those events in larger additional control samples. The Sanders claim about the risk of de novo events restricts to brain expressed loss of function alleles and argues for a 13:3 in the proband vs. sibling, so larger sample sizes are need to validate and confirm this observation. In terms of the overall rate, the relative increase is fairly similar in the Sanders MS as in ours, suggesting that a great many of these de novo events are not contributing to disease.

    4) On the topic of confidence of the genes being claimed – the restriction to loss of function alleles coupled with the lookup in the case control samples really helps validate the observations.

    5) The definition of autism that we used in the work is based on standard diagnostic instruments by research- reliable research personnel, the Autism Diagnostic Interview-Revised (ADI-R) and the Autism Diagnostic Observation Schedule-Generic (ADOS), DSM-IV diagnosis of a pervasive developmental disorder by a clinician and received a medical screen. These are the standardly used for autism research. Whether this reflects a single disorder or a collection of disorders is very much an open question, but there is pretty good evidence that genetic effects play a role in some of the cases we’ve analyzed.

  • Thanks very much to Ben for taking the time to post to Genomes Unzipped about his very nice paper in Nature, which I just read. One question: can you direct us to the best source of information on what fraction of all autistic cases are definitely de novo, where an ADI/ADOS ruled out any manifestations of autism, Apserger’s or any other major psychiatric illness in the parents? I assume that this was done rigorously for the Simons Simplex collection, but I’d like to know what fraction of “autism” this actually includes? From my own experience as a child psychiatrist, it seems to be often the case that the father or mother of children with autism or other severe mental illness themselves have subthreshold symptoms, which of course might be due to decreased expressivity of the disorder in these parents, due to environment, genetic background, or other effects. Also, if one extends the family to grandparents, uncles, aunts, cousins, etc…. one finds often that there are other family members with psychiatric symptoms, which I believe has also been documented epidemiologically. One criticism would therefore be that a de novo design automatically eliminates these inherited variants, so one must of course be absolutely certain that the parents are unaffected and that there isn’t evidence for a recessive disorder manifesting itself but with variable expressivity, hence not found easily in low-resolution linkage studes, particularly if the rare variants are unique to single families. Anyway, I’d be very interested to have you direct us to the best sources about the fraction of autism arising from parents and extended family members with zero evidence of major psychiatric illness, as you have focused on in your paper. Of course, this has relevance for all major psychiatric illness.

    Just as an aside, I will post below something I am writing now of relevance to the above discussion. Any comments on this are welcome:
    It is very likely that there will be a continuum of disease, with a blending of oligogenic into polygenic modes of inheritance. This is in part simply a semantic argument, given that the “penetrance” or “effect size” of any particular mutation will obviously vary according to genetic background and environment, as demonstrated repeatedly in model organisms (1). Thus, while a mutation causing hemochromatosis or breast cancer might have high penetrance in one particular pedigree or clan, that same mutation may have very low penetrance in another pedigree, clan or group of unrelated individuals (2). The reasons for variable penetrance can be quite variable and are currently a mystery in many instances. For example, although tri-allelism has been advocated to explain variable penetrance in Bardet-Biedl syndrome, this has recently been strongly called into question due to what we consider to have been low penetrance (or, as per those authors, low expressivity) of the disease in certain members of families (3). This is due to the fact that in the original report two brothers with the same two mutations in one gene were classified as one “affected” and the other “unaffected”, prompting a search for a third mutation in another allele to account for this perceived penetrance issue (4). However, variable penetrance could easily explain this, if one of the brothers maybe had retinitis pigmentosa or some other subthreshold symptom of Bardet-Biedl syndrome, leading to an incorrect classification as “unaffected”. Only time will tell on this, however, as the families in the two reports are different, so we await further phenotypic clarification.

    Thanks again for posting to this great blog forum! and I hope you can take the time to respond to this question too.

    1 Casanueva, M. O., Burga, A. & Lehner, B. Fitness trade-offs and environmentally induced mutation buffering in isogenic C. elegans. Science 335, 82-85, doi:10.1126/science.1213491 (2012).
    2 Kohane, I. S., Hsing, M. & Kong, S. W. Taxonomizing, sizing, and overcoming the incidentalome. Genet Med, doi:10.1038/gim.2011.68 (2012).
    3 Abu-Safieh, L. et al. In search of triallelism in Bardet-Biedl syndrome. Eur J Hum Genet, doi:10.1038/ejhg.2011.205 (2012).
    4 Katsanis, N. et al. Triallelic inheritance in Bardet-Biedl syndrome, a Mendelian recessive disorder. Science 293, 2256-2259, doi:10.1126/science.1063525 (2001).

  • Hi Ben

    Thanks for the response (I’m assuming 4 was directed at me). Re-reading the post you did pretty much answer my question there, although it was unclear (in my mind at least) whether a “de novo functional mutation” (of which you reported 18) was the same thing as a “loss-of-function (LoF) de novo event”, of which you focused on 3.

    But I’m still unsure about what it all means in terms of understanding autism. There are three mutations that you’re excited about, but even if they are causative, together they account for less than 1% of the autism sample. If the history of autism genetic research is anything to go by, the same genes will crop up in studies of epilepsy, schizophrenia, intellectual disability.

    Is this not further evidence that genetics is the wrong level of analysis to be focusing on to try and understand autism? Either that or, as other commenters have implied, “having an ASD diagnosis” is the wrong target phenotype.

    Thanks again for posting.


  • Jon;
    There is no single mechanism that can predict ‘autism’. Large chromosomal mutations, common polymorphisms, unfavorable pre, peri and neonatal events and evironmnetal hazards have all been linked to ‘autism’ but the same evemts have also been linked to all the major developmental disorders.

    As far as causation is concerned a number of studies have linked environmental hazards to risk for de nove sperm mutations:

  • I would like to remind Linda Avey that children with autism grow up to be adults with autism, and stay autistic long after their parents have perished.

    I don’t quite understand the need there seems to exist for people to find the magic bullet for autism. It is a natural human variation. Of course, my point of view is influenced by being an Asperger, which I would not change for all the money this universe could give me or all your lives. It has benefits. Seeing things others don’t see, being able to shut irritants off, a lack of emotional neurotic behaviors and a high intelligence. Yes I understand having a severely autistic child is a hardship for parents especially as they cannot love you or even see you as anything more than mobile, humanoid gnats but that is how they are. But going the neurotic way of Lorenzo’s Oil is not the way forward.

  • Benjamin Neale


    Gholson had a couple of questions about the work that I think deserve a response:

    1) For the family history piece, there’s a pretty large chunk in the supplement about the ascertainment. In almost all instances the parents have been assessed at some level. We don’t see a significant difference between family history positive and negative families, in terms of rate, but power is low.

    More generally, there is an implicit assertion that the family history positive individuals do not have any de novo contribution to disease. I don’t think this is necessarily accurate – there are certainly instances of families with other affected individuals for whom cases have de novo CNVs that predispose to psychiatric illness. That is to say, if de novo mutations predispose to disease then they should do so generally, unless a de novo derived disease is totally different from an inherited derived disease.

    On the other side, it is no guarantee that a parent with a given complex disease will give rise to a child with said same disease. Therefore exclusion of patients with family history would miss risk-conferring de novo events [where again the CNV literature shows no significant difference between family history positive and negative offspring].

    2) On the topic of penetrance – the definition of penetrance according to wikipedia is the probability of disease conditional on the variant [and just the variant]. This is a property of the population and the variant, so this concept of variable expressivity is built into penetrance [e.g. incomplete penetrance]. For almost all complex traits there is pretty good evidence that the environment plays some role – so incomplete penetrance is a matter of course, I’m not sure we can gain much further traction than that. There are open questions about the extent to which nonlinear risk exists [e.g. epistasis etc.] but I think we’re a ways off understanding that especially for rare variants.

  • Daniel MacArthur

    David – there’s no problem with being critical here, but keep it civil and substantive. Comment deleted.

Comments are currently closed.

Page optimized by WP Minify WordPress Plugin