This guest post was contributed by Taru Tukiainen, a postdoctoral research fellow in the Analytic and Translational Research Unit at Massachusetts General Hospital and the Broad Institute of MIT and Harvard.
The X chromosome contains around 5% of DNA in the human genome, but has remained largely unexplored in genome-wide association studies (GWAS) – to date, roughly two thirds of GWAS have thrown the X-chromosomal data out of their analyses. In a paper published in PLOS Genetics yesterday we dig into X chromosome associations and demonstrate why this stretch of DNA warrants particular attention in genetic association and sequencing studies. This post will focus on one of our key results: the possibility that some of the X chromosome loci contribute to sexual dimorphism, i.e. biological differences between men and women.
Let’s start with the basics. The X chromosome is part of the sex-determination system: women carry two copies of the chromosome, while in men the other X is replaced by the male-specific Y chromosome. The Y chromosome is roughly one third of the size of the X chromosome and has counterparts for only a small subset of the X chromosome genes. This would obviously cause an imbalance in X chromosome gene dosages between men and women if it weren’t for X chromosome inactivation, a fascinating process (see this recent NY Times piece) that randomly silences one of the two X chromosomes in every cell of a female body. This dosage compensation mechanism is, however, occasionally incomplete: up to 15% of the X-chromosomal genes may be consistently expressed from the inactive, i.e., silenced, X chromosome and further 10% may be escaping from X inactivation in at least some women. These ‘escape’ loci are among the very few areas of human genome where men and women differ in the gene dosages, and therefore are great candidates for playing a role in sexually dimorphic phenotypes.
In our study, we assessed the contribution of X chromosome SNPs to the levels of twelve quantitative anthropometric (such as BMI and height) and cardiometabolic (e.g. blood pressure and fasting insulin) phenotypes in more than 20,000 individuals from Finnish and Swedish population cohorts by taking advantage of the recent update from the 1000 Genomes projects, which provides the first comprehensive reference panel for X chromosome genotype imputation. Of the three significantly associated loci in our X chromosome-wide association analysis – these represent the first X-chromosomal ones for fasting insulin and height in European populations – one appeared particularly interesting. Not only were the height-associated SNPs just next to a great candidate gene, ITM2A, that has been linked to early cartilage development, and the lead SNP associated with ITM2A expression, but also, unlike in the two other associated loci, here, when assuming full dosage compensation between men and women, the genetic effects were not equal between the sexes, but twice the size in women. As no sexually dimorphic associations for height were found in autosomes, we thought X chromosome inactivation, specifically the lack of it, could be involved.
But how to be sure that this would be the case? To the best of our knowledge, no ‘escape’ loci have been linked with a phenotype in a population-level study before. To convince ourselves that the observation could indeed be due to incomplete dosage compensation in the ITM2A locus, we gathered further pieces of evidence. A statistical comparison showed that, given the effect estimates and standard errors in each sex, escape from inactivation is a much more likely model than full dosage compensation in the ITM2A locus, but not in the other two associated loci. In our expression data women had a higher ITM2A expression than men, in line with there being two actively transcribed X chromosomes in this locus. Also, the difference in the genetic effects was present already in boys and girls aged 8-10 years, suggesting that pubertal sex hormone changes don’t explain the observed sex-difference. Lastly, probably the most compelling piece of support for our hypothesis: a study that profiled the gene expression from inactivated chromosome X in women (this was published in 2005 but probably still is the most detailed one conducted up to date) demonstrated that our candidate gene, ITM2A, is among those X chromosome genes that escape from X inactivation in most yet not all women. Putting these pieces of together, we feel pretty confident that here we have a locus where the sex-difference in the phenotype association is, in large part if not completely, due to escape from inactivation.
The obvious next question is: how much of the height difference between men and women does this locus explain? These escape loci contribute to the height difference, or to the phenotype they are associated with, in (at least) two key ways. Firstly, given two actively transcribed X chromosomes, gene expression is higher in women in these regions escaping from X inactivation. Estimating how much this contributes is tricky and likely requires larger and more diverse gene expression data sets than we had available. Secondly, the causative variant of such a locus has a different effect on the mean of height in men and women – while men can only have one ‘risk’ allele, some women carry two of these alleles – and the difference in means depends on the allele frequency and the allelic effect of the variant. Assuming that our lead SNP is the causative one (MAF=0.36) and that the effect estimate (-0.55cm per minor allele) is not biased we calculate that 1.5% of the height difference between men and women in Finland is accountable by the ITM2A locus. Clearly much of the difference still remains unexplained, but after all 1.5% is not bad for a single SNP.
Sexual dimorphisms are widespread in human complex phenotypes, spanning from morphological traits such as height to diseases and disorders, like autism, cardiovascular disease and rheumatoid arthritis, with widely different prevalences in men and women, and the biological bases of these differences are often poorly known. Like the above example from our study shows, the X-chromosomal regions where the gene dosages are not fully compensated between men and women likely contribute to the multifactorial interplay of genetics and environment that gives rise to these diverse sex differences. There may be tens or even more than a hundred of such X chromosome loci, hence offering intriguing grounds for mining for links between genes and sexual dimorphisms.
While the X chromosome is likely to be highly enriched for sexually dimorphic associations, the role of the chromosome in harboring “standard” complex trait loci should not be undermined. Given our estimate that on average approximately 2.6% of phenotype heritability is accounted by the X chromosome, there should be dozens of associated chromosome X loci for GWAS and sequencing efforts to harvest. An increasing number of association studies are beginning to include the X chromosome as a part of their analysis pipeline, as imputation and analysis tools suited also for the X chromosome have become available. These studies would be further enhanced through the availability of more comprehensive panels of X chromosome variants and transcriptome studies that elaborate the regulatory properties of these variants.
Finally, let’s not forget that the X chromosome is not the only chunk of DNA neglected in studies of human complex traits. Maybe we’ll soon be including also the Y chromosome and mitochondrial variants in our association studies?
Taru Tukiainen is a postdoctoral research fellow in the Analytic and Translational Research Unit at Massachusetts General Hospital and the Broad Institute of MIT and Harvard. Her work focuses on exploring RNA sequencing data in population and rare disease patient samples. The work described here was done as part of her previous research at the Institute for Molecular Medicine Finland.
Interesting piece. Similarly lacking is X chromosome CNV data in public databases, probably relating to its generation as a “by-product” of GWAS primary studies. This makes clinical interpretation of X chromosome findings patriculary trickly, especially given the “escapee” issue you describe. Another example of why high quality clinical grade / curated data-sharing is becoming a moral responsibility; especially in a publicly funded Healthy care system. Free the data!