.
O P E N A C C E S S S O U R C E : bioRXiv
Abstract
Aging clocks dissociate biological from chronological age. The estimation of biological age is important for identifying gerontogenes and assessing environmental, nutritional or therapeutic impacts on the aging process. Recently, methylation markers were shown to allow estimation of biological age based on age-dependent somatic epigenetic alterations. However, DNA methylation is absent in some species such as Caenorhabditis elegans and it remains unclear whether and how the epigenetic clocks affect gene expression. Aging clocks based on transcriptomes have suffered from considerable variation in the data and relatively low accuracy. Here, we devised an approach that uses temporal scaling and binarization of C. elegans transcriptomes to define a gene set that predicts biological age with an accuracy that is close to the theoretical limit. Our model accurately predicts the longevity effects of diverse strains, treatments and conditions. The involved genes support a role of specific transcription factors as well as innate immunity and neuronal signaling in the regulation of the aging process. We show that this transcriptome clock can also be applied to human age prediction with high accuracy. This transcriptome aging clock could therefore find wide application in genetic, environmental and therapeutic interventions in the aging process.
Introduction
Aging is the driving factor for several diseases, the declining organ function and overall progressive loss of physiological integrity1. Aging biomarkers that predict the biological age of an organism are important for identifying genetic and environmental factors that influence the aging process and for accelerating studies examining potential rejuvenating treatments. Initial studies have shown evidence that methods predicting the biological age are indeed sensitive enough to detect the effect of geroprotective therapies2–5. Diverse studies tried to identify biomarkers and predict the age of individuals, ranging from proteomics, transcriptomics, the microbiome, frailty index assessments to neuroimaging and DNA methylation6–16. Several age predictors based on easy to obtain medical records or hematological data have shown some promising results but are still lacking in overall accuracy17–21.
DNA Methylation clocks
Currently, the most common predictors are based on DNA methylation. An elastic net regression model based on whole blood-derived 71-methylation sites could predict a validation cohort with a correlation coefficient of 0.91 and a root-mean-square error (RMSE) of 4.9 years (for details on parameters reported from the literature, see methods). Genes with nearby age-associated methylation markers predicted the age of 488 public available whole blood gene expression samples with an RMSE of 7.22 years22. The first multi-tissue predictor of age comprising 51 distinct tissue and cell types utilized 353 CpG methylation sites and resulted in a correlation of 0.96 and a median absolute difference of 3.6 years23. Consequently, a variety of epigenetic aging clocks in humans24,25 with as few as 8 CpG sites26 and also other organisms3,27 were devised. Despite these advances there are still challenges and weaknesses28: A recent report showed that the improved prediction of chronological age from DNA methylation might limit its use as a biomarker of aging. A hypothetical perfect prediction of the chronological age would not give any information of the biological age of the organism. The deviation of the prediction from the chronological age (so called age acceleration residual) can therefore give insight into the probable mortality. The improvement of the chronological age predictor isthereby accompanied by a decreased association between mortality and the bias of prediction29. The same study found a potential cellular composition confounder effect in those epigenetic clocks. After correction for this confounder, no significant association between the age acceleration residual and mortality was found, suggesting that the biggest effect is driven by differences in the cellular composition and thereby might limit the usage of the DNA methylation marks itself as biomarkers.
Transcriptomics clocks
The DNA methylation marks themselves might influence the transcriptional response22,30,31, but aging also affects the transcriptional network by altering the histone abundance32, histone modifications33–37 as well as the 3D organization of chromatin38,39. The difference in RNA molecule abundance, thereby, integrates a variety of regulation and influences resulting in a notable gene expression change during the lifespan of an organism40–46, aging-associated changes in transcriptional elongation47 and a systemic length-driven transcriptome imbalance48. A recent study identified six gene expression hallmarks of cellular aging across eukaryotes from yeast to humans49 and the suppression of the transcriptional drift has been shown to extend the lifespan of C. elegans50. These studies sparked interest in the identification of transcriptomic aging biomarkers, an RNA expression signature for age classification and the development of transcriptomic aging clocks.
Peters et al. extended previous classification approaches51–55 to a regression, which allows the computation of the predicted age and developed a transcriptional aging clock based on whole-blood microarray samples for half of the human genome and reported an r2 of up to 0.6, an average difference of 7.8 years and an association of the predicted age to blood pressure as well as smoking status56. Similarly, Mamoshina et al. build a transcriptomic aging clock of human muscle tissue. A deep feature selection model performed best with an r2 of 0.83 and a mean absolute error of 6.24 years57. Recently, Affymetrix samples of the cortex, the hippocampus and the cerebellum were used to train a deep learning predictor to an r2 of 0.91 and an RMSE of 7.76 years58.
However, microarray data have the drawbacks of a limited range of detection, high background levels and the detection of just a subset of the transcriptome. To overcome these limitations, a
transcriptional age predictor based on human RNA-seq data from the GTEx project59 yielded Spearman correlation coefficients of up to 0.84, dependent on the tissue60. By applying an ensemble of linear discriminant analysis classifiers, a model with an r2 of 0.81, a mean absolute error of 7.7 years and a median absolute error of 4.0 years was obtained in a dataset derived from cell culture of healthy donors61. The same model also predicted an accelerated age in 10 patients with the premature aging disease Hutchinson-Gilford progeria syndrome (HGPS). The first across-tissue transcriptional age calculator using a LASSO regression showed that splicing events could predict with an overall accuracy of 71 %62, while an across-tissue prediction on gene expression data showed that an elastic net regression was the most accurate with an average Pearson correlation coefficient across tissues of 0.33 63. Apart from mRNA sequencing, a study of human peripheral blood micro RNAs samples was able to predict the age with an r2 of 0.49 64. Moreover, it has been shown that for lung tissue of mice transcriptional clocks are more accurate than epigenetic clocks65.
Proteomics clocks
Proteomics has been used in encouraging recent studies on human aging clocks and biomarker discovery. The first proteomics aging clock based on human blood plasma identified age-associated proteins that were used to build an elastic net prediction with an r2 of 0.88 66, which could later be improved to an r2 of 0.94 67. Recently, a study based on blood proteins predicted human age with a Pearson correlation of 0.88 more accurately than a combination of these proteins with metabolites and several clinical lab tests showing the versatility of also only a subset of proteins68. A review of 32 different human proteomics and aging studies aiming to identify common age-associated proteins that could be robustly identified regardless of the technique or population diversity used, resulted in an age predictor based on 83 proteins that were reported in three or more out of the 32 different studies and reported a Spearman correlation of 0.91 69. This study also showed the current limitation of the usage of proteomics for age prediction: there is no standard in the generation of proteomic data and different techniques detect different subsets of proteins, which might lead to a measurement bias, i.e. some important age-related proteins might have been overlooked, while other were reported too frequently.
Summarizing, a large variety of data, techniques and analyses have been used to identify agingbiomarkers and -clocks in humans. However, these analyses also showed a pronounced variability and difficulties in replicability. Indeed, a recent analysis70 of gene expression, plasma protein, blood metabolite, blood cytokine, microbiome and clinical marker data71,72, showed that individual age slopes diverged among the participants over the longitudinal measurement time and subsequently that individuals have different molecular aging pattern, called ageotypes73. Moreover, a recent 10-year longitudinal study74 showed that individuals are more similar to themselves than to others with the same age and a twin study75 demonstrated that the global effect of age in gene expression is small. These interindividual differences are even more pronounced between different ethnicities and sex19,76 and show that it is still difficult to pinpoint biomarkers for aging in humans.
C. elegans aging clocks
Model organisms, instead, can give a more controllable view on the aging process and biomarker discovery and several studies have been conducted in mice and rats3,27,65,77–79 and similarities between model organism and human aging have been described80–82. C. elegans has revolutionized the aging field and has vast advantages as a model organism83–87. Even isogenic nematodes in precisely controlled homogenous environments have surprisingly diverse lifespans, however, the underlying causes are still not completely understood88. Several predictive biomarkers of C. elegans aging have been described89–92 and the measurements of physiological processes, such as movement, pharyngeal pumping and reproduction have been used to predict lifespan93 and the age with an RMSE of 1.7 days94. A first transcriptomic clock of C. elegans aging using microarray data of 104 single wildtype worms predicted the chronological age with 71% accuracy95. When the prediction was based on modular genetic subnetworks inferred from microarray data with support vector regression, the age of sterile fer-15 mutants at 4 timepoints was predicted with an r2 of 0.91. The same approach on the 104 individual N2 wildtype worms yielded an r2 of 0.77 indicating that for microarray data subnetworks of genes result in better prediction compared to single gene predictors, likely due to the noisiness of the datatype96. Although the accuracy of this model is reasonable, it is limited by the fact, that no lifespanaffecting genotypes or treatments were tested and that the validation dataset, although tested on single worms, resulted in an increased prediction error. Recently, an initial age prediction based on microarray data predicted 60 RNA-seq samples with a Pearson correlation of 0.54 and was improved to an r of 0.86 when the chronological age was rescaled by the median lifespan of the corresponding sample97. Even though this model instead of chronological age predicted the biological age of a variety of C. elegans genotypes, it is limited by the accuracy of the prediction. Moreover, the biological age is not reported in days, but as a variable with values between 0 and ~2.5, which makes it harder to interpret.
.../...
.