.
F U L L T I T L E : An accurate aging clock developed from the largest dataset of microbial and human gene expression reveals molecular mechanisms of aging.
O P E N A C C E S S S O U R C E : bioRXiv
Abstract
Accurate measurement of the biological markers of the aging process could provide an “aging clock” measuring predicted longevity and allow for the quantification of the effects of specific lifestyle choices on healthy aging. Using modern machine learning techniques, we demonstrate that chronological age can be predicted accurately from (a) the expression level of human genes in capillary blood, and (b) the expression level of microbial genes in stool samples. The latter uses the largest existing metatranscriptomic dataset, stool samples from 90,303 individuals, and is the highest-performing gut microbiome-based aging model reported to date. Our analysis suggests associations between biological age and lifestyle/health factors, e.g., people on a paleo diet or with IBS tend to be biologically older, and people on a vegetarian diet tend to be biologically younger. We delineate the key pathways of systems-level biological decline based on the age-specific features of our model; targeting these mechanisms can aid in development of new anti-aging therapeutic strategies.
Introduction
Biological age refers to biological markers of the aging process, and may be accelerated or slowed in some individuals relative to their chronological age. Recent research has proposed computational aging clocks based on various biomarkers including metabolites, blood cell count and other routine lab tests (Earls et al., 2019; Momoshina, Kochetov et al. 2018), DNA methylation (Fraga & Esteller, 2007; Horvath & Raj 2018; Bell et al. 2019), gene expression in tissue (Momoshina, Volosnikova et al. 2018) or blood (Harries et al., 2011; Lin et al., 2019), taxonomic composition of the gut microbiome (Galkin et al., 2020), among others. Aging clocks propose to use a signal derived from these biomarkers as a health-related metric for aging. In this paper we present two biological age metrics, one derived from the metatranscriptome of the gut microbiome, and one from the transcriptome of capillary blood. These two metrics together arguably capture the most comprehensive view of human biology.
Molecular markers from both microbial and human cells have been used to develop aging clocks. The composition and function of the gut microbiome changes with age, and may modulate healthy aging through multiple mechanisms. The increased dysbiosis associated with age can lead to innate proinflammatory immune responses, and the small molecules secreted by the gut microbiome affect host metabolism and signalling pathways that vary with age (see review in Kim & Jazwinski, 2018, i.a.). There is evidence that these microbiome changes over time are directly implicated in human healthspan. Maffei et al. (2017) show that certain properties of the gut microbiome, notably taxonomic diversity, are more predictive of a frailty index measuring mortality risk than is chronological age. Similarly on the human side, several molecular markers may modulate healthy aging. Perhaps the strongest aging clocks proposed so far have relied on biomarkers related to the epigenome such as DNA methylation. While these can act as an estimator of the biological age, they are not comprehensive and they have limited ability to pinpoint the regulators of the biological clock. Both the gut microbiome and the human molecular mechanisms are known to participate in widespread epigenetic interactions (see survey in Watson & Søreide, 2017), so a biological clock based on both of these functions can potentially inform specific therapeutic avenues to slow down aging. These may include personalized diets, supplements (vitamins, minerals, prebiotics, probiotics, food extracts, etc.), pharmaceuticals, phages, immunotherapies (vaccines, antibodies), etc.
While there are many ways to define biological age and operationalize the development of an aging clock (Jia et al., 2017), a common approach is to fit a machine-learned model to predict the chronological age of the human subject from the biomarker. This model’s predictions will deviate from chronological age to some extent: for example, if the subject’s biomarker profile is more similar to biomarker profiles of older people than to their peers, the model will overpredict age. The model’s predictions can be interpreted as a biological age in the sense that they approximate the age of a typical subject with the given biomarker profile. In this paper we show that the gut microbiome metatranscriptome as well as the blood transcriptome display strong associations with age to allow the creation of an aging clock.
.../...
Results
Figure 1 presents descriptive statistics of the discovery cohort. Ages of sample donors range from <1 years to 104 years, with 2686 donors below 18 years of age (included with parental consent). Study participants come from over 60 countries (86% US, 8% Canada, 3% Australia, 1.5% EU, 1% UK, rest from other countries). We do not observe any differences in taxonomic richness by age (Fig 1b-c); nor do we find differences in taxonomic diversity or active function richness. None of these four measures were found to increase predictive accuracy when included in our models. Fig 1d-e shows the taxa at the species level and KOs that vary the most with age. To identify these, we calculated the mean CLR for each feature in each decade of age in 70% of the discovery cohort, and chose those with the highest variance across ages. Then we plotted the trend in mean CLR by decade in the remaining 30% of the data, grouped by Viome Functional Category (VFC). Notably, all of the KOs with the highest positive association with age are part of Methanogenesis Pathways resulting in production of methane gas.
Figure 1.
Descriptive statistics for the microbiome discovery cohort of Table 1.
(a) age distribution (b) richness and shannon diversity of active microbial richness by decade © richness and Pielou’s evenness index for active functions (d-e) mean CLR transformed expression levels of species/KOs by age for most variable species/KOs grouped by genus/VFC.
Our biological age model’s performance is presented in Table 1. For the independent validation cohort, the model predicts chronological age above the baseline MAE of the datasets, and accounts for around 46% of the variance in age by R2, the standard metric of quality of fit in regression tasks. Our biological age model’s predictions and most important predictors are shown in Figure 2. Figure 2b shows the features with highest absolute coefficients (above 0.3 for taxa and 0.15 for KOs) grouped by Viome Functional Categories (VFCs) and further grouped into themes (see Supplementary Materials).
Figure 2.
Biological aging model using the microbiome discovery cohort of Table 1.
(a) Predicted vs. actual age in held-out validation data (for clarity, only a random subset of points is shown) (b) Coefficients for the microbial taxonomic features (circles) and KO features (triangles) grouped into curated Viome Functional Categories (VFCs)
Table 1 presents performance of the model under 5-fold cross validation. The model accounts for around 53% of variance in ages in the dataset. Figure 3 presents the predictions of our aging model based on human blood transcriptome.
Table 1.
Model performance by cohort.
- The stool microbiome cohorts consist of samples obtained from unique customers of Viome’s Gut Intelligence product. These samples were divided into a discovery cohort of 78,637 samples, and a validation cohort of 11,666 samples.
- The Galkin et al. matched microbiome cohorts are intended to allow comparison of this model to the one presented in that work, and were constructed by randomly choosing one Viome customer from our validation set with the same age as each person in the Galkin et al. datasets. One person in the matched CV cohort could not be paired with a unique sample in our data, so our cohort has 1164 samples rather than Galkin et al.’s 1165. For the HC cohort we additionally matched on sex, which was impossible in the larger CV cohort.
- The human blood transcriptome cohort consists of samples obtained from 1494 unique customers of Viome’s Health intelligence product and associated research studies.
Figure 3.
Biological aging model using the human blood transcriptome discovery cohort of Table 1.
(a) Actual vs. Predicted (b) Top coefficients grouped by Viome Functional Categories (VFCs)
Cohort comparisons
To explore the biological age of specific populations of interest, we present summaries of paired sample t-tests in Figure 4a, which depicts the difference in mean biological ages for specific cohort comparisons of interest, together with p-values from corresponding t-tests. Figure 4b shows the difference in chronological age for these same populations within the discovery cohort. We note that the model picks up several interesting differences between these populations and their age-matched controls. Vegetarians and vegans both tend to have a lower biological age than omnivores, while those following the ketogenic or paleo diets are biologically older than omnivores. Heavy drinkers are biologically older than non-drinkers. People with diabetes or IBS appear older than healthy controls. Some of these results may reflect chance patterns in the training data, as discussed below.
Figure 4.
Cohort comparisons.
(a) Mean and standard error of biological age differences between cohorts and age-matched controls where p-values < 0.1 from paired t-tests. (b) Mean and standard error of chronological age differences between cohorts and controls in the discovery cohort
Discussion
Model performance
The models presented here are capable of predicting chronological age above the baseline MAE of the datasets, and account for around 46% (stool) and 53% (blood) of the variance in age by R2. We note that some discrepancy between predicted and actual age is expected in a useful biological age candidate. If age was perfectly predicted, it would indicate either that the aspects of health captured by the biomarker decline in lockstep with chronological age, or that the biomarker is statistically associated with properties that vary systematically with age but are irrelevant to health.
We present an in-depth comparison of our microbiome model work with the gut metagenomic aging clock reported by Galkin et al. (2020). Galkin et al. report MAE of 10.60 and R2 of 0.21 (vs our 9.49 and .42) in a dataset with a baseline MAE of 13.03 (vs our 12.98). In a secondary validation exercise, their model obtains MAE of 6.81 and R2 of 0.134 when applied to a separate dataset (HC) with a lower baseline MAE of 9.272. Since metatranscriptomic data is unavailable for that cohort, we created an additional validation cohort with exactly the same age distribution shown in Table 1. In this cohort, our model attains MAE of 7.64 (vs their 6.81) and R2 of .31 (vs their .13). R2 is the standard metric used for quality of fit in a regression task, and these numbers suggest that our metatranscriptomic model provides a better overall fit to this distribution of ages.
Interestingly, Galkin et al. report that an Elastic Net model was unable to extract significant signal from the data, and achieved their best performance using a deep neural net (DNN). In contrast, we found similar performance across model types, including when using a neural network architecture modeled after the one they report. Using a linear model is advantageous in terms of interpretability and actionability: the influence of each biological feature on the model’s age prediction is transparent, making it straightforward to determine a set of candidate targets to act on.
Although we report separate models for the microbiome and human gene cases, these models could be combined to give a single prediction. One straightforward way to do this would be to weight the predictions of each by the models’ precision. In future work we hope to collect a large dataset with both stool microbiome and human gene expression data collected simultaneously for the same users, which will allow us to fit a single joint model.
The resolution of the data supplied by our clinical grade and fully-automated lab analysis method allows identification of microorganisms at the strain level, although for this analysis we aggregate data to the species level. This contrasts with 16S gene sequencing, which does not discriminate between species within most genera. This additional resolution appears to be important to capture age-related variation. In several cases, some species of a genus are associated with older age, and others associated with younger age (shown in Figure S1 of supplementary materials).
Contrary to much published literature (de la Cuesta-Zuluaga et al., 2019; Hopkins et al., 2002; Mariat et al., 2009; Koenig et al., 2011; Yatsunenko et al., 2012), chronological age was not associated with significant changes in alpha diversity (richness or evenness) of taxa or KOs across decades (Figure 3b and Figure 3c) despite significant changes in individual taxa and KOs over time (Figure 3d and Figure 3e). This difference may be due to our RNA-based approach, whereas previous studies have used DNA-based approaches (amplicon or metagenomics). It is intriguing that the richness of active gene expression captured in this RNA-based data remains steady throughout life. It is possible that changes in taxonomic diversity observed in DNA-based approaches might help to retain a certain level of functional stability (Kang et al., 2015) obtained early in life.