Software &Data
Software
- DeepBrainNet (PI: Davatzikos)
- Brain age prediction on T1 brain scans, Convolutional Neural Network trained on (n=11,729) MRI Input: Skull stripped T1 brain images, registered to a common space, in NITFI Output: Predicted brain age
- Contact: Vishnu Bashyam (Vishnu.Bashyam@pennmedicine.upenn.edu)
- Smile-GAN (PI: Davatzikos)
- Deep semi-supervised clustering method designed to identify disease-related neuroanatomical heterogeneity. Inputs: CSV/TSV files containing image ROI as features, as well as covariates to be adjusted. Output: categorical clustering membership and continuous probability scores
- Contact: Zhijian Yang (Zhijian.Yang@Pennmedicine.upenn.edu)
- Surreal-GAN (PI: Davatzikos)
- Deep semi-supervised representation learning method designed to dissect disease-related neuroanatomical heterogeneity. Inputs: CSV/TSV files containing image ROI as features, as well as covariates to be adjusted. Output: continuous dimensional scores
- Contact: Zhijian Yang (Zhijian.Yang@Pennmedicine.upenn.edu)
- SOPNMF (PI: Davatzikos)
- Using unsupervised factorization method to parcellate the brain into patterns of strucutral covariance (PSC) and correlated these PSCs to common genetic variant to depict the genetic architecture of the human brain. Input: CSV/TSV containing the path to each MRI. Output: the parcellated PSCs and the loading coefficient, the ROI values.
- Contact: Junhao Wen (junhao.wen@pennmedicine.upenn.edu)
- Longitudinal Disease Progression Modeling (PI: Toosun)
- The trajectory of a given AD biomarker may take decades, however we are usually limited to much shorter follow up data in ADNI and other AD focused studies. This can impose a challenge when estimating the full course of AD disease progression. Extending the methodology described in Budgeon et. al, we can estimate a full-term disease pathology curve from short term follow up data for multivariate imaging marker data.
- Contact: Tamar Schaap (tamar.schaap@ucsf.edu)
- Clinical Trial Simulation (PI: Tosun)
- Clinical Trial Simulation allows a user to estimate the statistical power of a pre-defined treatment effect at a set of sample sizes. This may be useful in informing recruitment of future clinical trials enriched based on different biomarker signature criteria. Using a Linear-Mixed-Effects Model (LME) trained on pilot data, a user can estimate the power of a treatment effect at a sample size via Monte Carlo Simulation.
- Contact: Tamar Schaap (tamar.schaap@ucsf.edu)
- MultimodalGAN (PI: Huang)
- GAN model and knowledge distillation based machine learning algorithm to integrate multimodal genotype and phenotype data and especially deal with missing data modality in prediction, also can predict the longitudinal outcomes
- Contact: Alireza Ganjdanesh (alireza.ganjdanesh@pitt.edu)
- SWAT-CNN (PI: Saykin)
- A novel three-step approach for identification of genetic variants using deep learning to identify phenotype-related SNPs that can be applied to develop accurate disease classification models.
- Contact: Taeho Jo (tjo@iu.edu)
- DeepBrainNet (PI: Davatzikos)
- Brain age prediction on T1 brain scans, Convolutional Neural Network trained on (n=11,729) MRI Input: Skull stripped T1 brain images, registered to a common space, in NITFI Output: Predicted brain age
- Contact: Vishnu Bashyam (Vishnu.Bashyam@pennmedicine.upenn.edu)
- CommPool (PI: Zhan, Huang)
- Using brain network data as the input and hierarchically learn the representation from the entire brain graph. The learned latent space features can be used for clinical classification or predictions.
- Contact: Haoteng Tang (haoteng.tang@pitt.edu)
- DisentangledVGAE (PI: Huang, Thompson, Zhan)
- A new multi-view graph auto-encoder method to learn the disentangled representation from different brain data modalities (the inputs can be imaigng featuers, brain networks, genetics features) to help various downstream analysis (outcome or cognitive decline predictions).
- Contact: Yanfu Zhang(yaz91@pitt.edu)
- GLM Tiled Data Example (PI: Zaranek)
- Example for running a GLM on Tiled Data.
- Contact: Sarah Zaranek (ai4ad@support.curii.com)
- Lightning (PI: Zaranek)
- Lightning is the system that performs tiling. Genomes are split into small segments (tiles), on average roughly 250 bp long. All unique tile variants are collected into a tile library, where a genome can be stored by using position references into the lightning tile library.
- Contact: Tom Clegg (ai4ad@support.curii.com)
- ML with Tiled Data (PI: Zaranek)
- This shows examples as well as source code for ML with Tiled Data. Also includes imputation workflow for imputing gvcfs before tiling
- Contact: Sarah Zaranek (ai4ad@support.curii.com)
- adomics (PI: Shen)
- Mendelian randomization (MR) is a versatile tool to identify the possible causal relationship between an omics biomarker and disease outcome using genetic variants as instrumental variables. This work provides a framework for summary data based MR analysis where multiple omics biomarkers can be viewed as multiple exposures, with an emphasis on the combination tests and special handling due to correlated P-values from single-exposure MR tests.
- Contact: Chong Jin (Chong.Jin@Pennmedicine.upenn.edu)
- MTS2CCA (PI: Shen)
- MTS2CCA is a novel imaging genomics association algorithm to deliver interpretable results and improve integration of imaging and genomics dataset. This work revealed promising features of single-nucleotide polymorphisms and brain regions related to sleep. The identified features can be used to improve clinical score prediction using promising imaging genetic biomarkers.
- Contact: Mansu Kim (mansu.kim@catholic.ac.kr)
- temporal-GNN (PI: Shen)
- Temporal-GNN is an interpretable graph neural network (GNN) model for AD prognostic prediction based on longitudinal neuroimaging data while embracing the valuable knowledge of structural brain connectivity
- Contact: Mansu Kim (mansu.kim@catholic.ac.kr)
- xQTL-protocol (PI: Wang)
- A reproducible molecular QTL analysis for the NIH/NIA Alzheimer’s Disease Sequencing Project Functional Genomics Consortium.
- Contact: Gao Wang (wang.gao@columbia.edu)
- Style Transfer Harmonization (PI: Jahanshad)
- Harmonizes T1w images to a reference by attempting to match the “style” of the reference without altering anatomical features
- Contact: Mengting Liu (liumt55@mail.sysu.edu.cn)
- Smile-GAN (PI: Davatzikos)
- Deep semi-supervised clustering method designed to identify disease-related neuroanatomical heterogeneity. Inputs: CSV/TSV files containing image ROI as features, as well as covariates to be adjusted. Output: categorical clustering membership and continuous probability scores
- Contact: Zhijian Yang (Zhijian.Yang@Pennmedicine.upenn.edu)
- Surreal-GAN (PI: Davatzikos)
- Deep semi-supervised representation learning method designed to dissect disease-related neuroanatomical heterogeneity. Inputs: CSV/TSV files containing image ROI as features, as well as covariates to be adjusted. Output: continuous dimensional scores
- Contact: Zhijian Yang (Zhijian.Yang@Pennmedicine.upenn.edu)
- SOPNMF (PI: Davatzikos)
- Using unsupervised factorization method to parcellate the brain into patterns of strucutral covariance (PSC) and correlated these PSCs to common genetic variant to depict the genetic architecture of the human brain. Input: CSV/TSV containing the path to each MRI. Output: the parcellated PSCs and the loading coefficient, the ROI values.
- Contact: Junhao Wen (junhao.wen@pennmedicine.upenn.edu)
- Longitudinal Disease Progression Modeling (PI: Toosun)
- The trajectory of a given AD biomarker may take decades, however we are usually limited to much shorter follow up data in ADNI and other AD focused studies. This can impose a challenge when estimating the full course of AD disease progression. Extending the methodology described in Budgeon et. al, we can estimate a full-term disease pathology curve from short term follow up data for multivariate imaging marker data.
- Contact: Tamar Schaap (tamar.schaap@ucsf.edu)
- Clinical Trial Simulation (PI: Tosun)
- Clinical Trial Simulation allows a user to estimate the statistical power of a pre-defined treatment effect at a set of sample sizes. This may be useful in informing recruitment of future clinical trials enriched based on different biomarker signature criteria. Using a Linear-Mixed-Effects Model (LME) trained on pilot data, a user can estimate the power of a treatment effect at a sample size via Monte Carlo Simulation.
- Contact: Tamar Schaap (tamar.schaap@ucsf.edu)
- MultimodalGAN (PI: Huang)
- GAN model and knowledge distillation based machine learning algorithm to integrate multimodal genotype and phenotype data and especially deal with missing data modality in prediction, also can predict the longitudinal outcomes
- Contact: Alireza Ganjdanesh (alireza.ganjdanesh@pitt.edu)
- SWAT-CNN (PI: Saykin)
- A novel three-step approach for identification of genetic variants using deep learning to identify phenotype-related SNPs that can be applied to develop accurate disease classification models.
- Contact: Taeho Jo (tjo@iu.edu)
- CommPool (PI: Zhan, Huang)
- Using brain network data as the input and hierarchically learn the representation from the entire brain graph. The learned latent space features can be used for clinical classification or predictions.
- Contact: Haoteng Tang (haoteng.tang@pitt.edu)
- DisentangledVGAE (PI: Huang, Thompson, Zhan)
- A new multi-view graph auto-encoder method to learn the disentangled representation from different brain data modalities (the inputs can be imaigng featuers, brain networks, genetics features) to help various downstream analysis (outcome or cognitive decline predictions).
- Contact: Yanfu Zhang(yaz91@pitt.edu)
- GLM Tiled Data Example (PI: Zaranek)
- Example for running a GLM on Tiled Data.
- Contact: Sarah Zaranek (ai4ad@support.curii.com)
- Lightning (PI: Zaranek)
- Lightning is the system that performs tiling. Genomes are split into small segments (tiles), on average roughly 250 bp long. All unique tile variants are collected into a tile library, where a genome can be stored by using position references into the lightning tile library.
- Contact: Tom Clegg (ai4ad@support.curii.com)
- ML with Tiled Data (PI: Zaranek)
- This shows examples as well as source code for ML with Tiled Data. Also includes imputation workflow for imputing gvcfs before tiling
- Contact: Sarah Zaranek (ai4ad@support.curii.com)
- adomics (PI: Shen)
- Mendelian randomization (MR) is a versatile tool to identify the possible causal relationship between an omics biomarker and disease outcome using genetic variants as instrumental variables. This work provides a framework for summary data based MR analysis where multiple omics biomarkers can be viewed as multiple exposures, with an emphasis on the combination tests and special handling due to correlated P-values from single-exposure MR tests.
- Contact: Chong Jin (Chong.Jin@Pennmedicine.upenn.edu)
- MTS2CCA (PI: Shen)
- MTS2CCA is a novel imaging genomics association algorithm to deliver interpretable results and improve integration of imaging and genomics dataset. This work revealed promising features of single-nucleotide polymorphisms and brain regions related to sleep. The identified features can be used to improve clinical score prediction using promising imaging genetic biomarkers.
- Contact: Mansu Kim (mansu.kim@catholic.ac.kr)
- temporal-GNN (PI: Shen)
- Temporal-GNN is an interpretable graph neural network (GNN) model for AD prognostic prediction based on longitudinal neuroimaging data while embracing the valuable knowledge of structural brain connectivity
- Contact: Mansu Kim (mansu.kim@catholic.ac.kr)
- xQTL-protocol (PI: Wang)
- A reproducible molecular QTL analysis for the NIH/NIA Alzheimer’s Disease Sequencing Project Functional Genomics Consortium.
- Contact: Gao Wang (wang.gao@columbia.edu)
- Style Transfer Harmonization (PI: Jahanshad)
- Harmonizes T1w images to a reference by attempting to match the “style” of the reference without altering anatomical features
- Contact: Mengting Liu (liumt55@mail.sysu.edu.cn)
Datasets
- ADNI, WHICAP (PI: Davatzikos)
- Types of Processing: MUlti-atlas region Segmentation(MUSE), Deep-learning based intracranial volume (DLICV), harmonization, SPARE scores; visualize segemntation results, scripts to calculate z scores and rank and check extreme cases, plot ROI trend to detect possible outliers
- Measures: Harmonized MUSE ROI volumes, Harmonized RAVENS voxel-based maps, DLICV, Canonical SPARE-AD and SPARE-BA, Disentangled SPARE-AD and SPARE-BA, Smile-GAN MCI/AD patterns
- ADNI, ADSP, WHICAP (PI: Zaranek)
- Types of Processing: Tiled data (filtered by QC>20) , Annotation of Tile variants, GLM on Tiled data
- Measures: Filtered tile data, tile data annotations, top features of GLM from tiled data