SynthA1c: Towards Clinically Interpretable Patient Representations for Diabetes Risk Stratification

Michael S. Yao, 1, 2 , Allison Chae, 2 , Matthew T. MacLean3, Anurag Verma4, Jeffrey Duda3, James Gee3, Drew A. Torigian3, Daniel Rader4, Charles Khan2, 3, Walter R. Witschey, 2, 3 & Hersh Sagreiya, 2, 3, *

1Department of Bioengineering, University of Pennsylvania

2School of Medicine, University of Pennsylvania

3Department of Radiology, University of Pennsylvania

4Department of Medicine, University of Pennsylvania

Equal contribution. {michael.yao, jisoo.chae}

Equal contribution. {witschey, hersh.sagreiya}

*Corresponding Author.

Project Links


Early diagnosis of Type 2 Diabetes Mellitus (T2DM) is crucial to enable timely therapeutic interventions and lifestyle modifications. As the time available for clinical office visits shortens and medical imaging data become more widely available, patient image data could be used to opportunistically identify patients for additional T2DM diagnostic workup by physicians. We investigated whether image-derived phenotypic data could be leveraged in tabular learning classifier models to predict T2DM risk in an automated fashion to flag high-risk patients without the need for additional blood laboratory measurements. In contrast to traditional binary classifiers, we leverage neural networks and decision tree models to represent patient data as 'SynthA1c' latent variables, which mimic blood hemoglobin A1c empirical lab measurements, that achieve sensitivities as high as 87.6%. To evaluate how SynthA1c models may generalize to other patient populations, we introduce a novel generalizable metric that uses vanilla data augmentation techniques to predict model performance on input out-of-domain covariates. We show that image-derived phenotypes and physical examination data together can accurately predict diabetes risk as a means of opportunistic risk stratification enabled by artificial intelligence and medical imaging. Our code is available at

Abstract Figure Abstract Figure


  title={SynthA1c: {Towards} Clinically Interpretable Patient Representations for Diabetes Risk Stratification},
  authors={Yao, Michael S and Chae, Allison and MacLean, Matthew T and Verma, Anurag and Duda, Jeffrey and Gee, James C and Torigian, Drew A and Rader, Daniel and Khan, Charles E and Witschey, Walter R and Sagreiya, Hersh},