These authors share first authorship
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
The lack of sex-specific cardiovascular disease criteria contributes to the underdiagnosis of women compared to that of men. For more than half a century, the Framingham Risk Score has been the gold standard to estimate an individual’s risk of developing cardiovascular disease based on the age, sex, cholesterol levels, blood pressure, diabetes status, and the smoking status. Now, machine learning can offer a much more nuanced insight into predicting the risk of cardiovascular diseases. The UK Biobank is a large database that includes traditional risk factors and tests related to the cardiovascular system: magnetic resonance imaging, pulse wave analysis, electrocardiograms, and carotid ultrasounds. Here, we leverage 20,542 datasets from the UK Biobank to build more accurate cardiovascular risk models than the Framingham Risk Score and quantify the underdiagnosis of women compared to that of men. Strikingly, for a first-degree atrioventricular block and dilated cardiomyopathy, two conditions with non-sex-specific diagnostic criteria, our study shows that women are under-diagnosed 2× and 1.4× more than men. Similarly, our results demonstrate the need for sex-specific criteria in essential primary hypertension and hypertrophic cardiomyopathy. Our feature importance analysis reveals that out of the top 10 features across three sexes and four disease categories, traditional Framingham factors made up between 40% and 50%; electrocardiogram, 30%–33%; pulse wave analysis, 13%–23%; and magnetic resonance imaging and carotid ultrasound, 0%–10%. Improving the Framingham Risk Score by leveraging big data and machine learning allows us to incorporate a wider range of biomedical data and prediction features, enhance personalization and accuracy, and continuously integrate new data and knowledge, with the ultimate goal to improve accurate prediction, early detection, and early intervention in cardiovascular disease management. Our analysis pipeline and trained classifiers are freely available at
香京julia种子在线播放
Historically, women have been excluded from the biomedical literature, and clinical and animal trials have been biased toward male-only or male-dominated populations (
Cardiovascular disease is underdiagnosed in women compared to that in men; the lack of sex-specific diagnostic criteria contributes to this issue (
The need for sex-specific diagnostic criteria is also visible in heart failure with the preserved left ventricle ejection fraction where the cut-off is an ejection fraction of ≥50% (
Clinically used risk prediction models for cardiovascular diseases typically include components of the Framingham Risk Score: age, sex, total cholesterol, high-density lipoprotein (HDL) cholesterol, systolic blood pressure, blood pressure treated through medicine, diabetes status, and smoking status (
Tabular data consist of features that can be input into a spreadsheet, including continuous variables, like the age and binary variables and the smoking status, which are coded with zero for negative and one for positive. The state-of-the-art approaches for supervised learning on tabular data are gradient-boosted tree ensembles (
Although gradient boosting frameworks have been shown to be the best-performing approaches for tabular data in the past decade (
Machine learning models can effectively utilize a large number of input features, which allows them to discover new, more accurate risk models to better identify people at risk (
First, we investigate whether women are underdiagnosed for cardiovascular diseases in the UK Biobank cohort. Then, we compare the performance of three models, a multilayer perceptron deep learning baseline, XGBoost, and the novel deep learning framework, SAINT, on their ability to predict whether a person can be diagnosed with a cardiovascular disease. Lastly, we identify the top sex- and disease-specific risk factors from four cardiovascular-related tests, pulse wave analysis, electrocardiograms, magnetic resonance imaging, and carotid ultrasounds, against the traditional Framingham Risk Score.
The UK Biobank comprises data from half a million individuals from the UK who were over the age of 40 (
Demographic data. Sex differences for the population of 20,542 individuals used in this study categorized by the Framingham Risk Score features.
Age (years) | BMI (−) | Total cholesterol (mmol/L) | HDL cholesterol (mmol/L) | Smoker (−) | Diabetic (−) | Systolic blood pressure (mmHg) | |
---|---|---|---|---|---|---|---|
Female, 10,585 | 62.82 ± 7.41 | 25.88 ± 4.51 | 5.83 ± 1.07 | 1.64 ± 0.37 | 3,586 | 340 | 113.74 ± 18.98 |
Male, 9,957 | 63.95 ± 7.61 | 26.72 ± 3.64 | 5.60 ± 1.07 | 1.31 ± 0.30 | 3,991 | 662 | 114.05 ± 16.38 |
The mean and standard deviation are reported for age, body mass index (BMI), total cholesterol, HDL cholesterol, and end-systolic blood pressure. Fisher’s exact test is used for the categorical features, while the Wilcoxon rank-sum test is used for all other features to determine if the distributions of the male and female populations are significantly different; † indicates
Features for inclusion in the risk prediction analysis. The traditional Framingham Risk Score and body mass index are commonly used factors in clinical risk prediction for cardiovascular diseases. Features from magnetic resonance imaging, carotid ultrasounds, electrocardiogram recordings, and pulse wave analysis are also extracted from the UK Biobank.
Method | Features |
---|---|
Framingham Risk Score + body mass index (8) | Sex, age, high-density lipoprotein cholesterol, total cholesterol, end systolic blood pressure, smoking status, diabetes status, body mass index |
Magnetic resonance imaging (7) | Average heart rate, cardiac index, cardiac output, left ventricle ejection fraction, left ventricle end-diastolic volume, left ventricle end-systolic volume, left ventricle stroke volume |
Carotid ultrasound (12) | Max/mean/min carotid intima-media thickness 120/150/210/240 |
Electrocardiogram (12) | Ventricular rate, P duration, PP interval, PQ interval, QRS number, QRS duration, QT interval, QTC interval, RR interval, P axis, R axis, T axis |
Pulse wave analysis (19) | Position of pulse wave notch, position of pulse wave peak, position of shoulder on pulse waveform, pulse rate, pulse wave arterial stiffness index, pulse wave peak to peak time, pulse wave reflection index, augmentation index, central augmentation pressure, central pulse pressure, central systolic blood pressure, diastolic blood pressure, end-systolic pressure index, mean arterial pressure index, number of beats in waveform average, peripheral pulse pressure, stroke volume, systolic brachial blood pressure |
We created two feature groups: 1) eight features, including Framingham Risk Score features and the body mass index only, and 2) all 57 features. We labeled each person to be in the positive class if they were diagnosed with a given cardiovascular disease (
Disease classification criteria. Clinical diagnostic ICD-10 codes for different subsets of cardiovascular disease (
Disease | ICD10 codes |
---|---|
Any cardiovascular disease | I00-I78.9, G95.1, H334.1-2, O10.0-9, S06.60-61, Z95.1, Z95.5 |
Hypertensive diseases | I10-I15.9 |
Ischemic diseases | I20-I25.9 |
Conduction disorders | I44-I49.9 |
Using the three input groups (both sexes, female only, and male only) together with the four binary label sets (any disease, hypertensive disease, ischemic heart disease, and conduction disorders), we constructed 12 dataset variants to train the binary classifiers, as shown in
Dataset overview.
Using the cardiovascular and Framingham Risk Score features, we implemented three distinct model types: 1) a multilayer perceptron (MLP); 2) an XGBoost ensemble model, which is a state-of-the art approach for tabular data learning (
We implemented and evaluated an MLP network under TensorFlow (
We used XGBoost (
The SAINT is a novel approach for tabular data modeling that employs self-attention, intersample attention, an enhanced embedding framework, and a contrastive pre-training phase (
The receiver operating characteristic (ROC) curve is used to evaluate the performance of a diagnostic test where the predictors of the outcome are not binary, so there are many possible cut-points to classify a person with a positive or negative diagnosis (
The area under the curve (AUC) is a summary metric for the ROC curve that reports the overall accuracy of the test (
SHAP is a unified framework designed to interpret model predictions by giving a value for the importance of each feature to a specific prediction (
An underdiagnosis is calculated as the ratio of the number of people who, at a single time point, met the criteria for a given disease diagnosis but were never diagnosed for that disease across the entire timespan of the medical record divided by the number of people who had been diagnosed for the same disease at any point in time across the entire span of their medical records. The UK Biobank ICD-10 medical records start between 1981 and 1988, depending on the country in the UK, and last until 2022. Only 4% of the UK Biobank population has ICD-9 records, so we have not included these. The blood pressure was measured between 2006 and 2010, and data from the first time point of imaging visits, ECG, MRI, carotid ultrasound, and pulse wave analysis were collected after 2014. The
We first investigate whether women are underdiagnosed relative to men for cardiovascular diseases in the UK Biobank cohort. We chose diseases where the diagnosis is a simple, non-sex-specific cut-off.
Diagnosing cardiovascular disease via simple, non-sex-specific cut-offs. The red line indicates the diagnostic cut-off. The truncated violin plots show the distribution of men and women for each color-coded population, with the box plot inside showing the mean in white and the 25th and 75th percentiles.
The
ROC curves and AUC scores for the 60 classifiers evaluated on 12 test sets. The rows correspond to (1) both sexes, (2) female-only, and (3) male-only datasets. The columns correspond to the (1) any, (2) hypertensive, (3) ischemic, and (4) conduction diseases. The colors of the curves indicate the different model types: MLP deep learning baseline (blue), untuned XGBoost (orange), tuned XGBoost baseline for the state-of-the-art model (green), SAINT (red), and XGBoost trained and tuned on Framingham Risk Score features only (purple). The true positive rate is plotted versus the false positive rate.
Comparison of 60 classifiers. For each test set, we report the accuracy (Acc), precision (Prec), recall (Rec), and area under the curve (AUC) scores for each of the five evaluated model types: MLP, untuned XGBoost, tuned XGBoost, SAINT, and XGBoost trained and tuned with Framingham Risk Score features only.
MLP | XGBoost (untuned) | XGBoost (tuned) | SAINT | XGBoost (Fram. only) | ||
---|---|---|---|---|---|---|
Both sexes, any disease | Acc: | 0.652 | 0.704 |
|
0.682 |
|
Prec: | 0.426 |
|
|
0.455 | 0.469 | |
Rec: |
|
0.450 | 0.268 |
|
0.291 | |
AUC: |
|
0.692 | 0.696 |
|
0.656 | |
Both sexes, hypertension | Acc: | 0.663 | 0.747 |
|
0.633 |
|
Prec: | 0.354 | 0.410 |
|
0.343 |
|
|
Rec: |
|
0.420 | 0.212 |
|
0.226 | |
AUC: |
|
0.713 | 0.726 |
|
0.699 | |
Both sexes, ischemic | Acc: | 0.765 | 0.906 |
|
0.677 |
|
Prec: | 0.147 |
|
|
0.137 | 0.207 | |
Rec: |
|
0.162 | 0.060 |
|
0.056 | |
AUC: | 0.686 | 0.680 |
|
|
0.630 | |
Both sexes, conduction | Acc: | 0.765 | 0.934 |
|
0.739 |
|
Prec: | 0.085 |
|
|
0.101 | 0.050 | |
Rec: |
|
0.107 | 0.027 |
|
0.007 | |
AUC: | 0.638 | 0.633 |
|
|
0.599 | |
Female, any disease | Acc: | 0.634 |
|
|
0.692 | 0.734 |
Prec: | 0.336 |
|
|
0.378 | 0.341 | |
Rec: |
|
0.287 | 0.182 |
|
0.157 | |
AUC: | 0.639 | 0.646 |
|
|
0.630 | |
Female, hypertension | Acc: | 0.716 | 0.797 |
|
0.770 |
|
Prec: | 0.283 | 0.335 |
|
|
0.298 | |
Rec: |
|
0.237 | 0.111 |
|
0.160 | |
AUC: | 0.688 | 0.696 |
|
|
0.669 | |
Female, ischemic | Acc: | 0.857 |
|
|
0.700 | 0.948 |
Prec: | 0.089 |
|
|
0.075 | 0.000 | |
Rec: |
|
0.029 | 0.029 |
|
0.000 | |
AUC: | 0.621 |
|
0.649 |
|
0.609 | |
Female, conduction | Acc: | 0.880 |
|
|
0.945 | 0.964 |
Prec: | 0.070 |
|
0.000 |
|
0.000 | |
Rec: |
|
0.019 | 0.000 |
|
0.000 | |
AUC: | 0.567 | 0.593 |
|
|
0.567 | |
Male, any disease | Acc: | 0.603 | 0.675 |
|
0.669 |
|
Prec: | 0.434 | 0.518 |
|
0.505 |
|
|
Rec: |
|
0.426 | 0.296 |
|
0.348 | |
AUC: | 0.646 | 0.690 |
|
|
0.679 | |
Male, hypertension | Acc: | 0.662 | 0.718 |
|
0.690 |
|
Prec: | 0.406 | 0.459 |
|
0.444 |
|
|
Rec: |
|
0.356 | 0.222 |
|
0.285 | |
AUC: | 0.693 | 0.705 |
|
|
0.658 | |
Male, ischemic | Acc: | 0.723 | 0.878 |
|
0.751 |
|
Prec: | 0.167 |
|
0.257 | 0.209 |
|
|
Rec: |
|
0.154 | 0.063 |
|
0.084 | |
AUC: | 0.661 | 0.681 |
|
|
0.626 | |
Male, conduction | Acc: | 0.807 |
|
|
0.612 | 0.922 |
Prec: | 0.100 |
|
|
0.114 | 0.087 | |
Rec: |
|
0.092 | 0.020 |
|
0.020 | |
AUC: | 0.586 | 0.653 |
|
|
0.606 |
Although all 60 models were measurably better than a random classifier, none of the models demonstrated a high AUC score. Since we did not observe training set overfitting, this might be indicative of a high Bayes error rate and low feature-output correlations in the datasets. The ischemic disease and conduction disorder models performed rather poorly, most likely caused by the small training set sizes and the increasingly significant class imbalance in their test sets.
Cross-evaluation results using tuned XGBoost classifiers. The classifiers trained on both sexes are colored blue, the classifiers trained on only female data are colored orange, and the classifiers trained on only male data are colored green. The rows show the ROC and AUC for a given trained classifier in predicting a given disease for only-female data, top, or only-male data, bottom. The columns correspond to any cardiovascular disease, hypertensive diseases, ischemic diseases, and conduction diseases, from left to right. The true positive rate is plotted versus the false positive rate.
Top 10 features from the tuned XGBoost classifiers trained on both sexes, female only, and male only for any cardiovascular disease. For both sexes, the top four features for the prediction of cardiovascular disease are traditional risk factors, while ECG features and a blood pressure feature from pulse wave analysis make up the rest of the top 10. For the female-only dataset, in addition to the top four traditional risk factors, there is a mix of ECG, pulse wave, and carotid ultrasound features. For the male-only dataset, six of the features are traditional risk factors while the rest are ECG features. Each dot corresponds to a person in the SHAP analysis dataset. A positive SHAP value indicates the contribution to a diagnosis of cardiovascular disease. Bright red corresponds to a high feature value, e.g., old age, while bright blue corresponds to a low feature value, e.g., young age. The binary categories of sex, smoking status, and diabetes status are red for male, smoker, and diabetic, respectively, while blue represents the opposite.
For both sexes combined, six of the top 10 features, namely, age, body mass index, cholesterol, HDL cholesterol, sex, and blood pressure, are the factors that are traditionally associated with an increased cardiovascular risk. The other four features are ECG features. For the female-only dataset, five of the top 10 features are traditional risk factors but the rest of the features include a mix of ECG, pulse wave analysis, and carotid ultrasound features. For the male-only dataset, six of the traditional risk factors make up the top 10 features. The other four features are ECG features. Interestingly, the male-only dataset is the only place where the smoking status and the diabetes status make up the top 10 features. Age, body mass index, and cholesterol, either HDL or total, are consistently the top three features, regardless of sex. The ECG features of the PQ interval and T-axis appear in all three categories as well.
Top 10 features to predict the risk of cardiovascular diseases for each sex and disease group. Rankings are reported using SHAP on the XGBoost classifiers trained and tuned on each of the 12 datasets with all features from
Any disease | Hypertension | Ischemic | Conduction | |
---|---|---|---|---|
Both sexes | Age | Age | Age | Age |
BMI | BMI | HDL | PQ interval | |
Cholesterol | Cholesterol | Sex | QRS duration | |
HDL | HDL | Cholesterol | T-axis | |
Sex | Central systolic BP | Pulse rate | QTC interval | |
T-axis | Sex | T-axis | LV end systolic volume | |
QTC interval | R-axis | BMI | P duration | |
PQ interval | Mean arterial pressure | LV ejection fraction | HDL | |
Systolic brachial BP | Diabetes status | R-axis | BMI | |
QRS duration | QTC interval | LV end-systolic volume | Pulse rate | |
Female only | Age | Age | Age | Age |
BMI | BMI | HDL | HDL | |
HDL | Central systolic BP | Max carotid IM thickness 240 | QTC interval | |
Cholesterol | HDL | Cholesterol | P duration | |
QRS duration | Mean arterial pressure | T-axis | T-axis | |
T-axis | R-axis | BMI | QRS duration | |
Central systolic BP | Cholesterol | Pulse rate | Cent. augment. press | |
Mean carotid IM thickness 150 | QRS duration | PW arterial stiff. index | PQ interval | |
PQ interval | Systolic brachial BP | Mean carotid IM thickness 120 | Augmentation index | |
PW reflection index | End systolic BP | Mean carotid IM thickness 210 | P-axis | |
Male only | Age | Age | Age | Age |
BMI | BMI | Cholesterol | T-axis | |
Cholesterol | Cholesterol | HDL | QTC interval | |
QTC interval | Central systolic BP | PW stroke volume | PQ interval | |
T-axis | QTC interval | Pulse rate | QRS duration | |
HDL | Systolic brachial BP | T-axis | PW stroke volume | |
PQ interval | Diabetes status | BMI | BMI | |
Smoking status | R-axis | LV stroke volume | Pulse rate | |
R-axis | PQ interval | Max carotid IM thickness 210 | LV end systolic volume | |
Diabetes status | Cent. augment. press | Diastolic BP | P-duration |
HDL cholesterol is the only other feature to appear in all categories, while the body mass index, total cholesterol, and T-axis appear three times. The smoking and diabetes status do not appear at all. Hypertension is heavily predicted by four different measures of blood pressure, while ischemic heart disease is predicted by several measures of intima–media thickness from carotid ultrasounds. Lastly, the ECG feature, PQ interval, appears in two categories.
The body mass index is the only other feature to appear in all categories, while T-axis, total cholesterol, and the QTC interval appear in three of the disease categories. Six of the features for any disease are traditional risk factors. The other four are ECG features. For hypertension, blood pressure measures and diabetes status, along with traditional risk factors, like the age, body mass index, and cholesterol, contribute to risk prediction. Interestingly, three ECG features also make up the top 10 features. For ischemic diseases, the stroke volume and a carotid ultrasound feature add to the traditional risk factors of age, cholesterol, HDL cholesterol, body mass index, and blood pressure. Pulse rate and T-axis conclude the top 10. For conduction disorders, five of the features are ECG features. Age and body mass index are the only traditional risk factors. Magnetic resonance imaging and pulse wave analysis features of the pulse rate, stroke volume, and LV end-systolic volume are the rest of the top 10.
Traditional risk factors make up 47.5% of the top 10 features, while magnetic resonance imaging features make up 7.5%, carotid ultrasound 0%, ECG 32.5%, and pulse wave analysis 12.5%. When broken down by sex, for women, traditional risk factors contribute 37.5%, magnetic resonance imaging 0%, carotid ultrasound 10%, ECG 30%, and pulse wave analysis 22.5% to the top 10. For men, the breakdown is traditional risk factors 40%, magnetic resonance imaging 5%, carotid ultrasound 2.5%, ECG 32.5%, and pulse wave analysis 20%.
Women are traditionally underdiagnosed for cardiovascular diseases. The lack of sex-specific criteria is one factor contributing to the underdiagnosis of cardiovascular diseases in women compared to that in men (
For essential primary hypertension, based on the current sex-neutral criteria, women and men are equally underdiagnosed. Yet, as
The novel SAINT model outperforms XGBoost in predicting the risk for cardiovascular disease. Using a dataset of UK Biobank patients who underwent cardiovascular clinical tests, we designed 60 classifiers based on relevant features, sex, and disease categories. We compared the new deep learning model SAINT to the state-of-the-art approach for tabular data, XGBoost, and to an MLP deep learning baseline. We found that SAINT showed the highest cardiovascular disease prediction AUC in nearly every case, XGBoost typically achieved the second-best AUC, and MLP, the lowest AUC.
SAINT is specifically designed for tabular data, which makes its purpose-driven architecture significantly better for our task than an out-of-the-box MLP. The best performance of SAINT classifiers can be attributed, at least in part, to the fact that our data are composed of numerical, continuous features, which are known to favor the performance of SAINT over classical approaches, such as XGBoost (
To date, the SAINT architecture has not been applied for risk analysis in cardiovascular diseases. Its remarkable performance not only holds promise for further clinical studies of both cardiovascular diseases and other conditions but also suggests that deep learning approaches could re-surface as viable methods for tabular clinical data modeling. Since deep learning frameworks require large datasets for effective training, we expect SAINT to improve even more as the size of the available medical dataset increases.
Not all traditional risk factors are equally important. Age, body mass index, HDL and total cholesterol, and systolic blood pressure were the most common factors across sex and disease, with the smoking and diabetes status present only for men, as
When broken down by the disease category, for hypertension measures of blood pressure, the body mass index, age, and cholesterol ranked highly. For men, the diabetes status was an important feature but not for women. Interestingly, only total cholesterol, not HDL cholesterol, ranked in the top 10 for men, while both appeared for women. For ischemic diseases, age, HDL and total cholesterol, and body mass index were in the top 10 for both women and men. For conduction disorders, for women, age and HDL cholesterol were in the top features, while for men, the age and body mass index appeared. The traditional risk factors in the Framingham Risk Score appear to be the most important for a general calculation of cardiovascular risk, but our study suggests re-evaluating it by taking into account the sex- and disease-specific categories. However, the underlying distributions for the traditional risk factors for the male and female populations are significantly different for all factors, except for systolic blood pressure, likely impacting the feature rankings.
ECG recordings are the most effective feature to augment the Framingham Risk Score. For women, traditional risk factors made up 37.5% of the top 10 features, while for men, they made up 40%, as we conclude from
ECG features have previously been shown to be powerful predictors of cardiovascular disease (
Central blood pressure is more predictive of the cardiovascular disease risk than brachial blood pressure. Pulse wave analysis provides multiple measures of blood pressure. In multiple sex and disease categories, as shown in
Carotid ultrasounds provide an accessible way to monitor ischemic heart diseases. Carotid ultrasounds measure the carotid intima–media thickness; a thicker intima–media thickness may indicate atherosclerosis of the carotid artery, leading to the brain (
Limitations and future work: Our study provides a first step toward rethinking about risk indicators for cardiovascular diseases in view of big data and machine learning. Although our results provide encouraging evidence of the added value of leveraging both technologies, it is important to be aware of the limitations to our current approach and, ideally, address them in future follow-up studies. First, the Framingham score was designed to provide a 10-year prediction of risk for developing cardiovascular disease (
Women are underdiagnosed for cardiovascular diseases compared to men. Unarguably, there is an urgent need for sex-specific diagnostic criteria. Deep learning provides powerful tools to precisely quantify how well traditional risk factors, like the Framingham Risk Score, predict the risk of cardiovascular diseases, for females, males, or both sexes combined. Alarmingly, our deep learning study revealed that, for a first-degree atrioventricular block and dilated cardiomyopathy, women are underdiagnosed 2× and 1.4× more than men. Inversely, without much extra work, our deep learning approach allows us to identify and rank the most predictive features for different types of cardiovascular diseases, sex specifically and sex neutrally. We found that, out of the four commonly used clinical tests—electrocardiograms, magnetic resonance imaging, carotid ultrasounds, and pulse wave analysis—electrocardiogram features showed the most promise in increasing cardiovascular disease prediction. A more accurate individualized risk prediction of cardiovascular diseases would enable personalized treatment and prevention strategies, a more effective allocation of medical resources, and an early and precise identification of high-risk individuals, toward the ultimate goal to improve patient outcomes, reduce morbidity and mortality, and improve the quality of life.
The datasets presented in this article are not readily available because researchers must apply to access the UK Biobank dataset. Requests to access the datasets should be directed to
SS: formal analysis, methodology, software, visualization, and writing–original draft. BK: formal analysis, methodology, software, visualization, and writing–original draft. MP: conceptualization, supervision, visualization, and writing–original draft. EK: conceptualization, funding acquisition, supervision, and writing–original draft.
The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This project was supported by an NSF Graduate Research Fellowship to SS and by the NSF CMMI grants 2318188 and 2320933 to EK.
This research was conducted using the October 2022 release of the UK Biobank Resource under the application number 89726. It uses data provided by patients and that collected by the National Health Service, England, as part of their care and support; copyright 2022 National Health Service, England; re-used with the permission of the UK Biobank; all rights reserved. This project was carried out in part for the class CS230 Deep Learning, and the authors would like to thank Andrew Ng and the teaching assistants for their helpful feedback.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
MLP: A deep learning model for baseline comparison. We apply Bayesian optimization using the Keras Tuner to tune the hyperparameters of the multilayer perceptron over the validation data for the dataset with both sexes and any disease. The optimization shows that a multilayer perceptron architecture with six hidden layers, 30 units in each hidden layer, and an
XGBoost: A state-of-the-art analysis of tabular data. Since XGBoost training is significantly faster than multilayer perceptron training and is readily parallelizable over the 12 dataset variants, we apply a random search with five-fold cross-validation to tune the hyperparameters of each of the XGBoost models individually. Specifically, we tune the XGBoost hyperparameters for each of the 12 dataset variants, and for the two input feature groups, all available features (cardiovascular and the Framingham Risk Score), and Framingham Risk Score features only. Based on prior work (
SAINT: A novel approach for tubular learning. We did not have access to sufficient computed data to be able to perform full-scale hyperparameter tuning for the 12 SAINT classifiers. We did, however, make adjustments to hyperparameters that proved to have the most significant effect on the validation performance. Specifically, we increased the weight decay parameter of the AdamW optimization scheme (
Here, we present a brief synthesis of the SAINT framework structure, as described by