Artificial Intelligence for Alzheimer’s Disease (AI4AD) machine learning initiative Mission:
In response to PAR-19-269 “Cognitive Systems Analysis of Alzheimer’s Disease Genetic and Phenotypic Data (U01 Clinical Trial Not Allowed)”, our project unites experts in AD genomics, machine learning and AI (including deep learning), large-scale data integration, and international data harmonization to work in a carefully-designed Consortium Structure in close partnership with the NIH, ADSP, and NIAGADS. We will develop a suite of complementary big data analytic approaches for ultra-scale analysis of Alzheimer’s Disease (AD) genomic and phenotypic data. The vast data volumes now generated by the Alzheimer’s Disease Sequencing Project (ADSP), National Alzheimer’s Coordinating Center (NACC), Alzheimer’s Disease Neuroimaging Initiative (ADNI), Accelerating Medications Partnership AD (AMP-AD), and UK Biobank (UKBB), far exceed the capacity of all current analytic methods, which have not kept pace with the scale and speed of data collection. This vast amount of genetic and phenotypic data mandates new and more powerful algorithms to: (1) store, manage, and manipulate whole-genome sequences and associated data on an ever-growing scale; (2) discover novel AD risk and protective loci by merging informatics and AD genomics databases; (3) relate whole-genome changes to the ATN(v) biomarkers that now define biological AD. Our Ultra-scale Machine Learning Initiative, or “AI4AD” – will offer new AI and deep learning tools to discover features in massive scale genomics data – relating whole genome data to biomarker features by merging all relevant data sources.
Our team of experienced PIs will coordinate efforts across the U.S. to create these large-scale data analytic tools. Our MPI team and 6 Core Leads have decades of experience working together and with the AD community in pioneering machine learning methods for AD genetics and neuroimaging, including leadership of international neuroimaging consortia across the world. Dedicated Cores focus on Genomic, Imaging, and Cognitive Data Harmonization. Curated data will then be efficiently imported into AI approaches and informatics pipelines that will allow the AD research community to leverage ultra-scale, multidimensional genomic and phenotypic data from the ADSP, NACC, ADNI, AMP-AD, and others. Our work is organized by a carefully-designed and coordinated Consortium guided by all stake-holders, clinical leaders, and pioneering analysts in AD genomics and neuroimaging. Our ultra-scale AI tools will advance AD genomics research and will include efforts in training, and a dedicated Drug Repurposing Core. This team effort will accelerate understanding of the genetic, molecular and neurobiological mechanisms of AD, yielding significant translational impact on disease and drug development.
Progress to Date
Drs Sarah and Sasha Zaranek (Curii and Harvard University) have pioneered a method to represent whole genome sequences using a standardized and curated library of ‘tiles’ that can then be read into AI and machine learning methods to identify AD risk and protective factors.
This greatly reduces the massive-scale WGS data into tractable inputs for feature detection and integration. Tiling is a way to break up the genome into shorter sequences called tiles, which are defined by a set of two tags (24-mers). Pilot work on whole genome tiling shows that unsupervised sparse regression methods – such as adaptive LASSO – can discover genomic predictors of AD, and can combine them into weighted risk scores that merge the predictive effects of multiple genetic variants. In 4,000+ tiled genomes from ADSP and AD Neuroimaging Initiative (ADNI), the best fit model so far (GLM Adaptive Lasso) has identified 411 tile variants that help to predict AD status. Encouragingly, the top two coefficients were phase 0 and phase 1 of the APOE 𝜺4 variant (rs429358); ongoing work includes generating a ranked variant list and comparing discovered loci to those from standard mass-univariate GWAS.