About AI4AD
Artificial Intelligence for Alzheimer’s Disease (AI4AD) machine learning initiative Mission:
- Genomics: use AI to identify genomic motifs and features associated with AD, and clinical resilience and decline in whole genome sequence data;
- Imaging Harmonization and Disease Subtyping: use AI to merge and calibrate MRI, amyloid- and tau-PET and vascular imaging across cohorts to identify AD subtypes, and relate these subtypes to specific genomic predictors and outcomes;
- Imaging and Genomic Predictors of Cognition: use AI to identify genomic motifs that predict brain imaging signatures of AD and decline in specific cognitive domains;
- Genome-Guided Drug Repurposing: create a drug prioritization system to discover drugs to repurpose based on genomic markers discovered in the other Aims;
- Train the AD community in easy-to-use AI methods to accelerate AD research.
In response to PAR-19-269 “Cognitive Systems Analysis of Alzheimer’s Disease Genetic and Phenotypic Data (U01 Clinical Trial Not Allowed)”, our project unites experts in AD genomics, machine learning and AI (including deep learning), large-scale data integration, and international data harmonization to work in a carefully-designed Consortium Structure in close partnership with the NIH, ADSP, and NIAGADS. We will develop a suite of complementary big data analytic approaches for ultra-scale analysis of Alzheimer’s Disease (AD) genomic and phenotypic data. The vast data volumes now generated by the Alzheimer’s Disease Sequencing Project (ADSP), National Alzheimer’s Coordinating Center (NACC), Alzheimer’s Disease Neuroimaging Initiative (ADNI), Accelerating Medications Partnership AD (AMP-AD), and UK Biobank (UKBB), far exceed the capacity of all current analytic methods, which have not kept pace with the scale and speed of data collection. This vast amount of genetic and phenotypic data mandates new and more powerful algorithms to: (1) store, manage, and manipulate whole-genome sequences and associated data on an ever-growing scale; (2) discover novel AD risk and protective loci by merging informatics and AD genomics databases; (3) relate whole-genome changes to the ATN(v) biomarkers that now define biological AD. Our Ultra-scale Machine Learning Initiative, or “AI4AD” – will offer new AI and deep learning tools to discover features in massive scale genomics data – relating whole genome data to biomarker features by merging all relevant data sources.
Our team of experienced PIs will coordinate efforts across the U.S. to create these large-scale data analytic tools. Our MPI team and 6 Core Leads have decades of experience working together and with the AD community in pioneering machine learning methods for AD genetics and neuroimaging, including leadership of international neuroimaging consortia across the world. Dedicated Cores focus on Genomic, Imaging, and Cognitive Data Harmonization. Curated data will then be efficiently imported into AI approaches and informatics pipelines that will allow the AD research community to leverage ultra-scale, multidimensional genomic and phenotypic data from the ADSP, NACC, ADNI, AMP-AD, and others. Our work is organized by a carefully-designed and coordinated Consortium guided by all stake-holders, clinical leaders, and pioneering analysts in AD genomics and neuroimaging. Our ultra-scale AI tools will advance AD genomics research and will include efforts in training, and a dedicated Drug Repurposing Core. This team effort will accelerate understanding of the genetic, molecular and neurobiological mechanisms of AD, yielding significant translational impact on disease and drug development.
Progress to Date
Drs Sarah and Sasha Zaranek (Curii and Harvard University) have pioneered a method to represent whole genome sequences using a standardized and curated library of ‘tiles’ that can then be read into AI and machine learning methods to identify AD risk and protective factors.
This greatly reduces the massive-scale WGS data into tractable inputs for feature detection and integration. Tiling is a way to break up the genome into shorter sequences called tiles, which are defined by a set of two tags (24-mers). Pilot work on whole genome tiling shows that unsupervised sparse regression methods – such as adaptive LASSO – can discover genomic predictors of AD, and can combine them into weighted risk scores that merge the predictive effects of multiple genetic variants. In 4,000+ tiled genomes from ADSP and AD Neuroimaging Initiative (ADNI), the best fit model so far (GLM Adaptive Lasso) has identified 411 tile variants that help to predict AD status. Encouragingly, the top two coefficients were phase 0 and phase 1 of the APOE 𝜺4 variant (rs429358); ongoing work includes generating a ranked variant list and comparing discovered loci to those from standard mass-univariate GWAS.
In Lam (2021), we supplemented a standard convolutional neural network (CNN; upper left) with an attention mechanism (upper right) to identify imaging regions and features within them that can make predictions for an individual patient (here estimating their age, and AD diagnosis). Trained on diverse AD biobanks, automated AI-driven feature discovery is also now being tested for identifying brain signatures associated with AD genomic risk signatures.
The second approach, developed by Dr. Davatzikos and his laboratory (Bashyam 2020) was trained on a large cohort of the adult lifespan, and was tested in various cross-validation paradigms. Critically, that study revealed that both very tight and very loose brain age models produce suboptimal clinical value, when it comes to using brain age as a disease related biomarker
The Shen lab (U Penn) pioneered the application of a causal mediation model for AD imaging genetics studies, which can detect features in brain scans that mediate the effect of polygenic AD risk on clinical outcomes. In Eng et al. 2020, they performed a polygenic mediation analysis in an amyloid imaging genetic study of AD, and identified multiple imaging mediators linking genetic variants and polygenic risk scores (PRS) to AD outcomes. In the AI4AD initiative, the same strategy will be used to prioritize and identify imaging biomarkers (e.g., multimodal MRI and PET measures) that can best explain the association between sets of genetic variants and standard measures of clinical disease burden. This method will be integrated with deep learning to detect biomarker patterns in brain scans that mediate (explain) effects of specific sets of genomic features on clinical disease burden, enabling disease subtyping and genome-to-brain mapping to understand previously unknown effects of risk genes and deconvolute mechanistic complexity of the AD phenotypic outcomes.
The Jun Lab (Boston University) is building a bioinformatics system that encodes and learns from results from genomics (ADSP and other GWAS data) and multi-omics studies of AD risk as well as imaging and fluid biomarkers to determine relations between druggable targets, genomic patterns, and signature profiles – including transcriptomic, proteomic, methylomic, and metabolomic data – and multimodal brain imaging. Leading our AI4AD Drug Repurposing Core, which is coordinated with ADSP Functional Genomics efforts including AMP-AD, Dr Jun’s team has been adding existing data from drug informatics databases to her PSBO system (relating predictors, signatures, biomarkers, and outcomes; ) to identify and prioritize drug candidates by storing information on their known effects on biomarkers.