Front. Physiol. Frontiers in Physiology Front. Physiol. 1664-042X Frontiers Media S.A. 1339866 10.3389/fphys.2024.1339866 Physiology Original Research Sex-specific cardiovascular risk factors in the UK Biobank St. Pierre et al. 10.3389/fphys.2024.1339866 St. Pierre Skyler R. 1 * Kaczmarski Bartosz 1 Peirlinck Mathias 2 Kuhl Ellen 1 1 Department of Mechanical Engineering, Stanford University, Stanford, CA, United States 2 Department of BioMechanical Engineering, Delft University of Technology, Delft, Netherlands

Edited by: Dan Wu, Chinese Academy of Sciences (CAS), China

Reviewed by: Julia Ramírez, University of Zaragoza, Spain

Ling Xiao, Massachusetts General Hospital, Harvard Medical School, United States

*Correspondence: Skyler R. St. Pierre, sstpie@stanford.edu

These authors share first authorship

23 04 2024 2024 15 1339866 16 11 2023 26 02 2024 Copyright © 2024 St. Pierre, Kaczmarski, Peirlinck and Kuhl. 2024 St. Pierre, Kaczmarski, Peirlinck and Kuhl

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

The lack of sex-specific cardiovascular disease criteria contributes to the underdiagnosis of women compared to that of men. For more than half a century, the Framingham Risk Score has been the gold standard to estimate an individual’s risk of developing cardiovascular disease based on the age, sex, cholesterol levels, blood pressure, diabetes status, and the smoking status. Now, machine learning can offer a much more nuanced insight into predicting the risk of cardiovascular diseases. The UK Biobank is a large database that includes traditional risk factors and tests related to the cardiovascular system: magnetic resonance imaging, pulse wave analysis, electrocardiograms, and carotid ultrasounds. Here, we leverage 20,542 datasets from the UK Biobank to build more accurate cardiovascular risk models than the Framingham Risk Score and quantify the underdiagnosis of women compared to that of men. Strikingly, for a first-degree atrioventricular block and dilated cardiomyopathy, two conditions with non-sex-specific diagnostic criteria, our study shows that women are under-diagnosed 2× and 1.4× more than men. Similarly, our results demonstrate the need for sex-specific criteria in essential primary hypertension and hypertrophic cardiomyopathy. Our feature importance analysis reveals that out of the top 10 features across three sexes and four disease categories, traditional Framingham factors made up between 40% and 50%; electrocardiogram, 30%–33%; pulse wave analysis, 13%–23%; and magnetic resonance imaging and carotid ultrasound, 0%–10%. Improving the Framingham Risk Score by leveraging big data and machine learning allows us to incorporate a wider range of biomedical data and prediction features, enhance personalization and accuracy, and continuously integrate new data and knowledge, with the ultimate goal to improve accurate prediction, early detection, and early intervention in cardiovascular disease management. Our analysis pipeline and trained classifiers are freely available at https://github.com/LivingMatterLab/CardiovascularDiseaseClassification.

cardiovascular sex differences risk factors heart disease UK Biobank section-at-acceptance Computational Physiology and Medicine

香京julia种子在线播放

    1. <form id=HxFbUHhlv><nobr id=HxFbUHhlv></nobr></form>
      <address id=HxFbUHhlv><nobr id=HxFbUHhlv><nobr id=HxFbUHhlv></nobr></nobr></address>

      1 Motivation

      Historically, women have been excluded from the biomedical literature, and clinical and animal trials have been biased toward male-only or male-dominated populations (Garcia-Sifuentes and Maney, 2021). Including sex as a biological variable is increasingly recognized as being essential to decrease health inequities (Clayton and Collins, 2014; Cirillo et al., 2020). Sex is typically reported as a binary variable, but sex is inherently complex and relates to hormones, chromosomes, and physical characteristics, all of which follow distributions that overlap between the traditional male and female categories (Morgenroth and Ryan, 2021). In the UK Biobank, sex is reported as a binary variable and the language used in this study reflects that limitation (Sudlow et al., 2015).

      1.1 Women are underdiagnosed and undertreated compared to men

      Cardiovascular disease is underdiagnosed in women compared to that in men; the lack of sex-specific diagnostic criteria contributes to this issue (St. Pierre et al., 2022). With the currently used non-sex-specific criteria, the prevalence of dilated and hypertrophic cardiomyopathy is 3:1 and 3:2 for men-to-women, respectively, indicating that men are diagnosed more frequently than women for these cardiomyopathies (Olivotto et al., 2005; Cannatà et al., 2020). On average, women have a smaller wall thickness than men. For hypertrophic cardiomyopathy, the lack of sex-specific criteria implies that female hearts have to disproportionally increase more in thickness than male hearts to reach the diagnostic threshold of a wall thickness of 15 mm (van Driel et al., 2019). Strikingly, women are half as likely to be diagnosed during a routine examination for hypertrophic cardiomyopathy compared to men (Olivotto et al., 2005). Women are also diagnosed at an older age and with more symptoms than men for hypertrophic and dilated cardiomyopathies (Olivotto et al., 2005; Halliday et al., 2018; Cannatà et al., 2020).

      The need for sex-specific diagnostic criteria is also visible in heart failure with the preserved left ventricle ejection fraction where the cut-off is an ejection fraction of ≥50% (Ponikowski et al., 2016). However, women have a higher baseline ejection fraction than men on average (Rutkowski et al., 2020). Studies have already shown that women benefit from therapies at a higher range of ejection fractions than men (McMurray et al., 2020; Solomon and McMurray, 2021). Clearly, there is an urgent need for sex-specific research to understand the different impacts of heart failure on men and women (Peirlinck et al., 2021b; Lala et al., 2022; Yang et al., 2023).

      1.2 Risk prediction enables early detection

      Clinically used risk prediction models for cardiovascular diseases typically include components of the Framingham Risk Score: age, sex, total cholesterol, high-density lipoprotein (HDL) cholesterol, systolic blood pressure, blood pressure treated through medicine, diabetes status, and smoking status (Wilson et al., 1998). The body mass index (BMI) is also common to include in risk models (Alaa et al., 2019). These risk models are easy to use; they only require a handful of easy-to-measure variables, and risk evaluation is a simple score based on discrete thresholds for each of these variables. Clinicians use these risk models to determine if an otherwise asymptomatic person would benefit from medical intervention (Alaa et al., 2019; Kremers et al., 2008; D’Agostino et al., 2008; Sjöström et al., 2004).

      1.3 Machine learning models have historically outperformed deep learning for tabular data

      Tabular data consist of features that can be input into a spreadsheet, including continuous variables, like the age and binary variables and the smoking status, which are coded with zero for negative and one for positive. The state-of-the-art approaches for supervised learning on tabular data are gradient-boosted tree ensembles (Borisov et al., 2022), which are conventional machine learning methods. The top gradient-boosted models based on benchmark performance for five independent datasets (Borisov et al., 2022) are XGBoost (eXtreme Gradient Boosting) (Chen and Guestrin, 2016), LightGBM (Ke et al., 2017), and CatBoost (Prokhorenkova et al., 2018). The strengths of tree ensemble methods include robustness against outliers and noisy data (Aceña et al., 2022) and the fast exact extraction of feature importance via methods such as SHapley Additive exPlanations (SHAP) (Lundberg and Lee, 2017). A weakness of decision trees is that they are unstable and tend to overfit the training data (Aceña et al., 2022).

      Although gradient boosting frameworks have been shown to be the best-performing approaches for tabular data in the past decade (Shwartz-Ziv and Armon, 2022), deep learning architectures are becoming increasingly prevalent (Gorishniy et al., 2021), sometimes outperforming the state-of-the-art approaches (Somepalli et al., 2021; Borisov et al., 2022). In particular, deep learning frameworks, such as TabTransformer (Huang et al., 2020), DeepFM (Guo et al., 2017), TabNet (Arik and Pfister, 2020), and SAINT (Self-Attention and Intersample Attention Transformer) (Somepalli et al., 2021), showed significant promise for effective tabular data modeling, with the SAINT model outperforming the gradient-boosted frameworks on some learning tasks using the power of representation learning (Somepalli et al., 2021; Borisov et al., 2022). Nonetheless, a well-known downside of deep learning models, as compared to tree-based frameworks, is that they are generally much slower to train (Grinsztajn et al., 2022).

      1.4 Machine learning can discover the most predictive features for risk models

      Machine learning models can effectively utilize a large number of input features, which allows them to discover new, more accurate risk models to better identify people at risk (Madani et al., 2018; Alaa et al., 2019; Alber et al., 2019). XGBoost has been applied extensively for cardiovascular disease diagnoses (Rajliwall et al., 2018; Alaa et al., 2019; Athanasiou et al., 2020; Rajadevi et al., 2021; Papadopoulou et al., 2022). Prior statistical or deep learning models of cardiovascular diseases focused on the lifestyle factors (Sjöström et al., 2004; Alaa et al., 2019; Papadopoulou et al., 2022; Sharma et al., 2022), medical history (Alaa et al., 2019; Papadopoulou et al., 2022), sociodemographics (Alaa et al., 2019; Papadopoulou et al., 2022), dietary and nutritional information (Alaa et al., 2019; Papadopoulou et al., 2022), genetics (Papadopoulou et al., 2022; Sharma et al., 2022), and/or one of the four clinical tests: pulse wave analysis (Davies and Struthers, 2005; Said et al., 2018), electrocardiograms (Attia et al., 2019; Ramírez et al., 2021; Papadopoulou et al., 2022), carotid ultrasounds (Zhao et al., 2016; Sharma et al., 2022), or magnetic resonance imaging (Chung et al., 2006; Gilbert et al., 2019; Bai et al., 2020), but not all four.

      1.5 Objectives of this study

      First, we investigate whether women are underdiagnosed for cardiovascular diseases in the UK Biobank cohort. Then, we compare the performance of three models, a multilayer perceptron deep learning baseline, XGBoost, and the novel deep learning framework, SAINT, on their ability to predict whether a person can be diagnosed with a cardiovascular disease. Lastly, we identify the top sex- and disease-specific risk factors from four cardiovascular-related tests, pulse wave analysis, electrocardiograms, magnetic resonance imaging, and carotid ultrasounds, against the traditional Framingham Risk Score.

      2 Materials and methods 2.1 Dataset and features

      The UK Biobank comprises data from half a million individuals from the UK who were over the age of 40 (Sudlow et al., 2015). From these, we selected individuals who underwent ECG testing, magnetic resonance imaging, carotid ultrasounds, and pulse wave analysis, resulting in a population of 20,542 individuals. We also pulled features associated with the Framingham Risk Score, sex, age, total cholesterol, HDL cholesterol, smoking status, diabetes status, end-systolic blood pressure, the body mass index of all participants, and their medical diagnoses. Table 1 shows the demographic data on this population. We did not include treatment for blood pressure as a feature in our models as this directly reflects one of the diagnostic outcomes, hypertension, that we are trying to predict. All 57 features are shown in Table 2.

      Demographic data. Sex differences for the population of 20,542 individuals used in this study categorized by the Framingham Risk Score features.

      Age (years) BMI (−) Total cholesterol (mmol/L) HDL cholesterol (mmol/L) Smoker (−) Diabetic (−) Systolic blood pressure (mmHg)
      Female, 10,585 62.82 ± 7.41 25.88 ± 4.51 5.83 ± 1.07 1.64 ± 0.37 3,586 340 113.74 ± 18.98
      Male, 9,957 63.95 ± 7.61 26.72 ± 3.64 5.60 ± 1.07 1.31 ± 0.30 3,991 662 114.05 ± 16.38

      The mean and standard deviation are reported for age, body mass index (BMI), total cholesterol, HDL cholesterol, and end-systolic blood pressure. Fisher’s exact test is used for the categorical features, while the Wilcoxon rank-sum test is used for all other features to determine if the distributions of the male and female populations are significantly different; indicates p <0.05.

      Features for inclusion in the risk prediction analysis. The traditional Framingham Risk Score and body mass index are commonly used factors in clinical risk prediction for cardiovascular diseases. Features from magnetic resonance imaging, carotid ultrasounds, electrocardiogram recordings, and pulse wave analysis are also extracted from the UK Biobank.

      Method Features
      Framingham Risk Score + body mass index (8) Sex, age, high-density lipoprotein cholesterol, total cholesterol, end systolic blood pressure, smoking status, diabetes status, body mass index
      Magnetic resonance imaging (7) Average heart rate, cardiac index, cardiac output, left ventricle ejection fraction, left ventricle end-diastolic volume, left ventricle end-systolic volume, left ventricle stroke volume
      Carotid ultrasound (12) Max/mean/min carotid intima-media thickness 120/150/210/240
      Electrocardiogram (12) Ventricular rate, P duration, PP interval, PQ interval, QRS number, QRS duration, QT interval, QTC interval, RR interval, P axis, R axis, T axis
      Pulse wave analysis (19) Position of pulse wave notch, position of pulse wave peak, position of shoulder on pulse waveform, pulse rate, pulse wave arterial stiffness index, pulse wave peak to peak time, pulse wave reflection index, augmentation index, central augmentation pressure, central pulse pressure, central systolic blood pressure, diastolic blood pressure, end-systolic pressure index, mean arterial pressure index, number of beats in waveform average, peripheral pulse pressure, stroke volume, systolic brachial blood pressure

      We created two feature groups: 1) eight features, including Framingham Risk Score features and the body mass index only, and 2) all 57 features. We labeled each person to be in the positive class if they were diagnosed with a given cardiovascular disease (Said et al., 2018). We split the datasets based on four disease categories, as shown in Table 3, namely, any disease, hypertension (ICD-10 codes I10–I15), ischemic (I20–I25), and conduction disorders (I44–I49) (World Health Organization, 2004), such that the detection of each disease can pose as a binary classification. We chose these disease categories because they had the largest number of participants who both had these diagnoses and data from all four imaging studies, ECG, heart MRI, pulse wave analysis, and carotid ultrasounds. To train the sex-specific classifiers, we further designated three input groups, i.e., both sexes, female only, and male only.

      Disease classification criteria. Clinical diagnostic ICD-10 codes for different subsets of cardiovascular disease (Said et al., 2018).

      Disease ICD10 codes
      Any cardiovascular disease I00-I78.9, G95.1, H334.1-2, O10.0-9, S06.60-61, Z95.1, Z95.5
      Hypertensive diseases I10-I15.9
      Ischemic diseases I20-I25.9
      Conduction disorders I44-I49.9

      Using the three input groups (both sexes, female only, and male only) together with the four binary label sets (any disease, hypertensive disease, ischemic heart disease, and conduction disorders), we constructed 12 dataset variants to train the binary classifiers, as shown in Figure 1. We generated each of the datasets by the direct slicing of the randomly pre-shuffled data frame. Since the datasets are relatively small, we applied a 70–15–15 split to create the training, validation, and test sets for each of the variants. The positive class in all 12 dataset variants is significantly underrepresented relative to the negative class, so we applied oversampling to approximately equalize the number of negative and positive samples in the training sets of the corresponding 12 datasets.

      Dataset overview. (A) Out of 500,000+ participants in the UK Biobank study, we selected a group of 20,542 participants who underwent magnetic resonance imaging, carotid ultrasounds, ECG, and pulse wave analysis. We also selected participants with available data for all of the Framingham Risk Score features. (B) Data are separated in 12 variants with three sex groups and four cardiovascular disease categories, where n tot is the total number of people in the dataset and n diag is the number of people in that dataset who have been diagnosed with the corresponding condition.

      2.2 Models

      Using the cardiovascular and Framingham Risk Score features, we implemented three distinct model types: 1) a multilayer perceptron (MLP); 2) an XGBoost ensemble model, which is a state-of-the art approach for tabular data learning (Chen and Guestrin, 2016); and 3) the SAINT model (Somepalli et al., 2021). We used the MLP as a baseline for deep learning performance and the XGBoost as a baseline for a state-of-the-art performance. For each model type, we trained and evaluated 12 individual classifiers, according to our 12 dataset variants. For evaluation purposes, we consider both untuned and tuned XGBoost ensemble models and introduce an additional set of tuned XGBoost ensembles trained only on the Framingham Risk Score features. The 12 cardiovascular disease datasets, three model types, and two additional XGBoost variants result in a total of 60 individually trainable classifiers.

      2.2.1 MLP: a deep learning model for a baseline comparison

      We implemented and evaluated an MLP network under TensorFlow (Abadi et al., 2016) for each of the 12 dataset variants. All 12 MLP classifiers were trained using the binary cross-entropy function with L 2 regularization of the cost. To accelerate and stabilize training, the features are first passed through a standardization layer, and each hidden layer is followed by a batch normalization layer. Each layer uses ReLU non-linearity, and the output uses a sigmoid activation function. The tunable hyperparameters of the MLP are the number of hidden layers, the number of units in each hidden layer, the L 2-regularization parameter, and the parameters of the training procedure. Appendix A provides additional details about the MLP architecture and hyperparameter tuning.

      2.2.2 XGBoost: a state-of-the-art analysis of tabular data

      We used XGBoost (Chen and Guestrin, 2016) as a benchmarking baseline for the novel SAINT model. XGBoost trains an ensemble of decision tree models using an efficient second-order gradient boosting framework (Chen and Guestrin, 2016). We trained an individual XGBoost ensemble for each of the 12 dataset variants using the binary cross-entropy loss and training-test splits consistent with those used for MLP models. Since XGBoost is a state-of-the-art approach for tabular data learning, we include both the tuned and untuned XGBoost ensembles for each dataset. The untuned XGBoost models represent the out-of-the-box performance of the current state-of-the-art method, while the performance of the tuned XGBoost models represents the best-case learning result in each dataset. Appendix A provides details of hyperparameter tuning for XGBoost models.

      2.2.3 SAINT: a novel approach for tabular data learning

      The SAINT is a novel approach for tabular data modeling that employs self-attention, intersample attention, an enhanced embedding framework, and a contrastive pre-training phase (Somepalli et al., 2021). Transformers are a recent machine learning development that utilizes a multi-head attention mechanism, allowing the parallelized computation of the contextual representations of the input data. This new architecture is employed in cutting-edge generative machine learning applications, such as ChatGPT-4 (OpenAI, 2023). They have been shown to significantly outperform the previous state-of-the-art machine learning architecture for language modeling and machine translation tasks (Vaswani et al., 2017). Rather than using transformers for language processing purposes, SAINT adapts this architecture and the concept of self-attention to perform efficient learning on tabular data, such as the clinical UK Biobank data analyzed in our study. The SAINT architecture consists of multiple stages, each of which includes a self-attention block and an intersample attention block. The self-attention block applies attention on the features of a given sample, while intersample attention applies row-wise attention across different samples for a given feature. As a result, the final SAINT stage outputs a contextual representation of input embedding. Appendix B provides further information about the definitions, structure, and implementation of the SAINT framework (Somepalli et al., 2021). Although SAINT has been shown to outperform the state-of-the-art methods on some datasets (Borisov et al., 2022), it has not yet been used for cardiovascular data learning. Here, we applied SAINT to investigate its performance in cardiovascular disease classification tasks, in addition to the more established MLP and XGBoost methods.

      2.3 Model evaluation 2.3.1 ROC: receiver operating characteristic curve

      The receiver operating characteristic (ROC) curve is used to evaluate the performance of a diagnostic test where the predictors of the outcome are not binary, so there are many possible cut-points to classify a person with a positive or negative diagnosis (Mandrekar, 2010). The ROC curve is a plot of sensitivity (true positive rate) vs. 1−specificity (false positive rate). Sensitivity is the probability that an individual who is truly positive gets a positive test result, while specificity is the probability that an individual who is truly negative gets a negative test result (Parikh et al., 2008). The diagonal line indicates that whether or not a person is diagnosed is totally random.

      2.3.2 AUC: area under the curve

      The area under the curve (AUC) is a summary metric for the ROC curve that reports the overall accuracy of the test (Mandrekar, 2010). The AUC ranges from 0, completely inaccurate, to 1, completely accurate, with an AUC of 0.5, which means that the test result is random. We used the ROC curve and AUC metric to compare how accurately our different models can predict cardiovascular disease as the ROC curve does not depend on the scale of the test results and provides a helpful visual comparison (Mandrekar, 2010).

      2.3.3 Feature importance rankings

      SHAP is a unified framework designed to interpret model predictions by giving a value for the importance of each feature to a specific prediction (Lundberg and Lee, 2017). A positive SHAP value indicates that a feature has a positive impact on the prediction of the positive class, which, in our case, is a diagnosis of a cardiovascular disease, while a negative SHAP value indicates the opposite. The magnitude indicates the strength of the effect. We can easily integrate the SHAP pipeline with XGBoost using the TreeExplainer class.

      2.4 Calculation of an underdiagnosis

      An underdiagnosis is calculated as the ratio of the number of people who, at a single time point, met the criteria for a given disease diagnosis but were never diagnosed for that disease across the entire timespan of the medical record divided by the number of people who had been diagnosed for the same disease at any point in time across the entire span of their medical records. The UK Biobank ICD-10 medical records start between 1981 and 1988, depending on the country in the UK, and last until 2022. Only 4% of the UK Biobank population has ICD-9 records, so we have not included these. The blood pressure was measured between 2006 and 2010, and data from the first time point of imaging visits, ECG, MRI, carotid ultrasound, and pulse wave analysis were collected after 2014. The numerator represents the minimum possible value; it implies that at a single time point between 1981 and 2022, people met the disease criteria but never got a diagnosis. It is a minimum because there are likely people from the healthy category who, at any other time point, would have met the criteria but were not measured at that precise time and were also never diagnosed. The denominator represents the maximum possible value; it reflects whether people have ever been diagnosed across the entire timeline from 1981 to 2022. So, this metric of underdiagnosis is in itself an underestimation.

      3 Results 3.1 Women are underdiagnosed relative to men

      We first investigate whether women are underdiagnosed relative to men for cardiovascular diseases in the UK Biobank cohort. We chose diseases where the diagnosis is a simple, non-sex-specific cut-off.

      Figure 2 shows the cut-off criteria in red. Plots (a–d) show individuals who have not been diagnosed with the disease in orange and those who have in purple. Plots (e,f) are divided into four categories. The truncated violin plots show the distribution of each sex for each category with the box plots showing the mean in white and the 25th–75th percentiles. Each dot represents a single person in the Biobank dataset. There may be comorbidities or alternate medical diagnoses that result in similar presentations, so the magnitude of an underdiagnosis, in the following examples, should be understood as a first approximation.

      Diagnosing cardiovascular disease via simple, non-sex-specific cut-offs. The red line indicates the diagnostic cut-off. The truncated violin plots show the distribution of men and women for each color-coded population, with the box plot inside showing the mean in white and the 25th and 75th percentiles. (A, B) Essential primary hypertension is diagnosed with a systolic blood pressure greater than or equal to 140 mmHg and/or diastolic blood pressure greater than or equal to 90 mmHg (Williams et al., 2018). Women who are not diagnosed with hypertension, on average, have a lower systolic and diastolic blood pressure compared to men. (C) Hypertrophic cardiomyopathy is diagnosed with a wall thickness greater than 15 mm (Elliott et al., 2014). None of the individuals in this cohort met the condition. Healthy women have a notably lower wall thickness on average than men. (D) First-degree AV block is diagnosed with a PQ interval greater than 200 ms (Holmqvist and Daubert, 2013). Healthy women have a lower PQ interval on average than men. (E, F) Dilated cardiomyopathy is diagnosed by a left ventricle ejection fraction less than 45% and a left ventricle end-diastolic diameter greater than 112% of the diameter predicted based on the body surface area and age (Arora et al., 2010; Orphanou et al., 2022). Women have a slightly higher ejection fraction and lower left ventricle end-diastolic diameter on average than men, which is represented in orange.

      Essential primary hypertension is diagnosed by a systolic blood pressure of ≥140 mmHg and/or diastolic blood pressure of ≥90 mmHg (Williams et al., 2018). The corresponding ICD-10 code is I10. Figures 2A,B show that women have, on average, a lower systolic and diastolic blood pressure than men. In the Biobank cohort, 35.2% of men are diagnosed, while 52.3% of men meet the cut-off criteria. For women, 26.6% are diagnosed, while 40.2% meet the cut-off criteria. This means that women and men are underdiagnosed for essential primary hypertension at the same rate, 1.5×, when non-sex-specific criteria are used.

      Hypertrophic cardiomyopathy is diagnosed with a wall thickness of > 15 mm (Elliott et al., 2014); the ICD-10 codes are I42.1 and I42.2. Figure 2C shows that none of the approximately 900 people with cardiac magnetic resonance images met the criteria for hypertrophic cardiomyopathy or were diagnosed. Women, on average, have a distinctly smaller wall thickness than men.

      The first-degree AV block is diagnosed with a PQ interval of > 200 ms (Holmqvist and Daubert, 2013); the ICD-10 code is I44.0. As shown in Figure 2D, women have a smaller PQ interval than men on average. The first-degree AV block is generally asymptomatic but is no longer considered entirely benign, with nearly double the risk of developing atrial fibrillation and triple the risk of needing a pacemaker (Holmqvist and Daubert, 2013). As such, the current recommendation is to monitor patients regularly to see if the conduction delay continues to widen or if they are developing atrial fibrillation (Oldroyd et al., 2022). In the UK Biobank, 0.81% of men are diagnosed and 12.6% of men meet the cut-off. For women, 0.18% of them are diagnosed, while 5.4% meet the cut-off. So, women are underdiagnosed 30×, while men are underdiagnosed 15.6×, meaning that women are nearly 2× more underdiagnosed relative to men for a first-degree AV block with the given non-sex-specific criteria.

      Dilated cardiomyopathy is diagnosed by a left ventricle ejection fraction of < 45% and a left ventricle end-diastolic diameter of > 112% of the predicted diameter based on the age and sex (Orphanou et al., 2022). Left ventricle fractional shortening less than 25% can be used in place of the ejection fraction criteria, but these data were not available in the Biobank. Because the left ventricle end-diastolic volume is reported, we used the Teichholz formula (Arora et al., 2010), LVEDV = 7 ( LVEDD cal ) 3 / ( 2.4 + LVEDD cal ) , to calculate the end-diastolic diameter from the volume and the formula, LVEDDpre = 45.3(BSA)0.3–0.03 (age) − 7.2, to predict the end-diastolic diameter from the BSA and age. If LVEDDcal/LVEDDpre >1.12, the individual would meet the criteria and either be assigned a red or purple dot, as shown in Figure 2E,F, depending on whether they had also been diagnosed with dilated cardiomyopathy or not, respectively. If they did not meet this criterion, they were assigned an orange or blue dot, where blue indicates that they had been diagnosed and orange indicating that they had not been. In the Biobank cohort, 55 people were diagnosed with dilated cardiomyopathy, but only 35 met the cut-off using these calculations, with nearly all of these discrepancies for not meeting the ejection fraction criteria, as shown by the blue dot. Women, on average, have a slightly lower end-diastolic diameter and higher ejection fraction than men. Out of the men in the cohort, 0.23% of them were diagnosed, while 5.71% met the cut-off criterion. For women, 0.06% of them were diagnosed, while 2.02% met the cut-off criterion. As such, women are underdiagnosed 33.7×, men are underdiagnosed 24.8×, and women are 1.4× more underdiagnosed than men when non-sex-specific criteria are used.

      3.2 The SAINT model performs the best in predicting risks for cardiovascular diseases

      Figure 3 shows the ROC and AUC values for the five model types, MLP, untuned XGBoost, tuned XGBoost, SAINT, and XGBoost, with only Framingham Risk Score features across the 12 sex and disease categories. A large AUC score is designed to minimize false negatives (predicted healthy but actually diseased) and maximize true positives (predicted diseased and actually diseased). Table 4 summarizes the performance metrics for all classifiers, where we report the test set accuracy, precision, and recall in addition to the AUC score. In terms of the AUC metric, the SAINT model performed best on all datasets, except the female-only conduction disorder datasets, where only the corresponding tuned XGBoost model performed better. For accuracy and precision, XGBoost (tuned) models were the best performing in 11/12 cases and 8/12 cases, respectively. The SAINT had the best-performing recall in 9/12 cases.

      ROC curves and AUC scores for the 60 classifiers evaluated on 12 test sets. The rows correspond to (1) both sexes, (2) female-only, and (3) male-only datasets. The columns correspond to the (1) any, (2) hypertensive, (3) ischemic, and (4) conduction diseases. The colors of the curves indicate the different model types: MLP deep learning baseline (blue), untuned XGBoost (orange), tuned XGBoost baseline for the state-of-the-art model (green), SAINT (red), and XGBoost trained and tuned on Framingham Risk Score features only (purple). The true positive rate is plotted versus the false positive rate.

      Comparison of 60 classifiers. For each test set, we report the accuracy (Acc), precision (Prec), recall (Rec), and area under the curve (AUC) scores for each of the five evaluated model types: MLP, untuned XGBoost, tuned XGBoost, SAINT, and XGBoost trained and tuned with Framingham Risk Score features only.

      MLP XGBoost (untuned) XGBoost (tuned) SAINT XGBoost (Fram. only)
      Both sexes, any disease Acc: 0.652 0.704 0.732 0.682 0.707
      Prec: 0.426 0.473 0.551 0.455 0.469
      Rec: 0.667 0.450 0.268 0.651 0.291
      AUC: 0.716 0.692 0.696 0.733 0.656
      Both sexes, hypertension Acc: 0.663 0.747 0.786 0.633 0.767
      Prec: 0.354 0.410 0.498 0.343 0.417
      Rec: 0.695 0.420 0.212 0.780 0.226
      AUC: 0.747 0.713 0.726 0.758 0.699
      Both sexes, ischemic Acc: 0.765 0.906 0.931 0.677 0.919
      Prec: 0.147 0.243 0.542 0.137 0.207
      Rec: 0.491 0.162 0.060 0.681 0.056
      AUC: 0.686 0.680 0.729 0.742 0.630
      Both sexes, conduction Acc: 0.765 0.934 0.952 0.739 0.946
      Prec: 0.085 0.184 0.500 0.101 0.050
      Rec: 0.396 0.107 0.027 0.557 0.007
      AUC: 0.638 0.633 0.680 0.732 0.599
      Female, any disease Acc: 0.634 0.741 0.761 0.692 0.734
      Prec: 0.336 0.417 0.462 0.378 0.341
      Rec: 0.591 0.287 0.182 0.504 0.157
      AUC: 0.639 0.646 0.653 0.690 0.630
      Female, hypertension Acc: 0.716 0.797 0.837 0.770 0.799
      Prec: 0.283 0.335 0.527 0.354 0.298
      Rec: 0.469 0.237 0.111 0.477 0.160
      AUC: 0.688 0.696 0.730 0.740 0.669
      Female, ischemic Acc: 0.857 0.955 0.951 0.700 0.948
      Prec: 0.089 0.333 0.167 0.075 0.000
      Rec: 0.243 0.029 0.029 0.514 0.000
      AUC: 0.621 0.657 0.649 0.683 0.609
      Female, conduction Acc: 0.880 0.965 0.965 0.945 0.964
      Prec: 0.070 0.250 0.000 0.075 0.000
      Rec: 0.204 0.019 0.000 0.056 0.000
      AUC: 0.567 0.593 0.636 0.603 0.567
      Male, any disease Acc: 0.603 0.675 0.709 0.669 0.687
      Prec: 0.434 0.518 0.641 0.505 0.552
      Rec: 0.609 0.426 0.296 0.668 0.348
      AUC: 0.646 0.690 0.736 0.745 0.679
      Male, hypertension Acc: 0.662 0.718 0.755 0.690 0.722
      Prec: 0.406 0.459 0.603 0.444 0.461
      Rec: 0.589 0.356 0.222 0.677 0.285
      AUC: 0.693 0.705 0.748 0.750 0.658
      Male, ischemic Acc: 0.723 0.878 0.893 0.751 0.890
      Prec: 0.167 0.262 0.257 0.209 0.267
      Rec: 0.476 0.154 0.063 0.573 0.084
      AUC: 0.661 0.681 0.718 0.743 0.626
      Male, conduction Acc: 0.807 0.925 0.932 0.612 0.922
      Prec: 0.100 0.281 0.250 0.114 0.087
      Rec: 0.245 0.092 0.020 0.724 0.020
      AUC: 0.586 0.653 0.660 0.733 0.606

      Although all 60 models were measurably better than a random classifier, none of the models demonstrated a high AUC score. Since we did not observe training set overfitting, this might be indicative of a high Bayes error rate and low feature-output correlations in the datasets. The ischemic disease and conduction disorder models performed rather poorly, most likely caused by the small training set sizes and the increasingly significant class imbalance in their test sets.

      3.3 Additional features improve cardiovascular disease prediction

      Figure 3 suggests that including features from ECG, magnetic resonance imaging, pulse wave analysis, and carotid ultrasound, along with Framingham Risk Score features significantly increased the AUC score of the corresponding 48 models, as compared to the AUC scores for the Framingham-only XGBoost models. The XGBoost classifiers trained and tuned on the Framingham-only features were always the lowest or second-lowest performing models for a given dataset.

      3.4 Predicting the risk of cardiovascular disease for women is less accurate than that for men

      Figure 4 shows the performance of the 12 individually trained XGBoost classifiers on individual sexes. First, the classifiers trained on both sexes perform the best for all female-only datasets, top row. The best AUC values for the female-only data are lower than the best AUC values for the male-only data for all disease categories, except conduction disorders. Second, the male-only classifiers perform the best for the male-only datasets for any disease and hypertension categories, while the both-sex classifiers perform the best for ischemic and conduction diseases. Third, the performance of all classifiers is fairly similar for most sex- and disease-specific categories, except for three cases: 1) the female-only classifier is significantly worse at predicting male cases of ischemic diseases, 2) the male-only classifier is worse at predicting female cases of ischemic diseases, and 3) at predicting conduction diseases as compared to the female-only and both-sex classifiers.

      Cross-evaluation results using tuned XGBoost classifiers. The classifiers trained on both sexes are colored blue, the classifiers trained on only female data are colored orange, and the classifiers trained on only male data are colored green. The rows show the ROC and AUC for a given trained classifier in predicting a given disease for only-female data, top, or only-male data, bottom. The columns correspond to any cardiovascular disease, hypertensive diseases, ischemic diseases, and conduction diseases, from left to right. The true positive rate is plotted versus the false positive rate.

      3.5 A subset of the Framingham Risk Score and ECG features is the most predictive for cardiovascular disease

      Figure 5 shows the 10 most predictive features for any type of cardiovascular disease for both sexes combined, women only, and men only. A more positive SHAP value indicates a larger contribution to the positive class, diagnosed with a cardiovascular disease, while a negative SHAP value indicates the opposite. Each dot represents an individual person in the dataset, while the red color means that a person had a high value of that feature, e.g., older, while blue means a lower value, e.g., younger. For the binary categories of sex, smoking status, and diabetes status, red represents a male subject, a person who smokes, and a person with diabetes, respectively. Traditional risk factors refer to the Framingham Risk Score features plus body mass index, as shown in Table 2.

      Top 10 features from the tuned XGBoost classifiers trained on both sexes, female only, and male only for any cardiovascular disease. For both sexes, the top four features for the prediction of cardiovascular disease are traditional risk factors, while ECG features and a blood pressure feature from pulse wave analysis make up the rest of the top 10. For the female-only dataset, in addition to the top four traditional risk factors, there is a mix of ECG, pulse wave, and carotid ultrasound features. For the male-only dataset, six of the features are traditional risk factors while the rest are ECG features. Each dot corresponds to a person in the SHAP analysis dataset. A positive SHAP value indicates the contribution to a diagnosis of cardiovascular disease. Bright red corresponds to a high feature value, e.g., old age, while bright blue corresponds to a low feature value, e.g., young age. The binary categories of sex, smoking status, and diabetes status are red for male, smoker, and diabetic, respectively, while blue represents the opposite.

      For both sexes combined, six of the top 10 features, namely, age, body mass index, cholesterol, HDL cholesterol, sex, and blood pressure, are the factors that are traditionally associated with an increased cardiovascular risk. The other four features are ECG features. For the female-only dataset, five of the top 10 features are traditional risk factors but the rest of the features include a mix of ECG, pulse wave analysis, and carotid ultrasound features. For the male-only dataset, six of the traditional risk factors make up the top 10 features. The other four features are ECG features. Interestingly, the male-only dataset is the only place where the smoking status and the diabetes status make up the top 10 features. Age, body mass index, and cholesterol, either HDL or total, are consistently the top three features, regardless of sex. The ECG features of the PQ interval and T-axis appear in all three categories as well.

      Table 5 shows the top 10 features based on the SHAP value for the tuned XGBoost model prediction of cardiovascular disease. Across all groups, age is the most important feature in predicting risk. When trained on both sexes, the body mass index and HDL cholesterol also appear for all disease groups. Sex is the most important for ischemic heart disease, but interestingly, it is not in the top 10 for conduction disorders. Out of the traditional risk factors in the Framingham Risk Score, the diabetes status appears only once for the prediction of hypertension and the smoking status does not appear at all. A measure of blood pressure also only appears for any disease and hypertension. The ECG, pulse wave analysis, and magnetic resonance imaging features of the PQ interval, T-axis, pulse rate, R-axis, QRS duration, and LV ejection fraction also appear in two disease categories, each in the top 10.

      Top 10 features to predict the risk of cardiovascular diseases for each sex and disease group. Rankings are reported using SHAP on the XGBoost classifiers trained and tuned on each of the 12 datasets with all features from Table 2. BMI, body mass index; BP, blood pressure; HDL, high-density lipoprotein; IM, intima–media; LV, left ventricle; PW, pulse wave.

      Any disease Hypertension Ischemic Conduction
      Both sexes Age Age Age Age
      BMI BMI HDL PQ interval
      Cholesterol Cholesterol Sex QRS duration
      HDL HDL Cholesterol T-axis
      Sex Central systolic BP Pulse rate QTC interval
      T-axis Sex T-axis LV end systolic volume
      QTC interval R-axis BMI P duration
      PQ interval Mean arterial pressure LV ejection fraction HDL
      Systolic brachial BP Diabetes status R-axis BMI
      QRS duration QTC interval LV end-systolic volume Pulse rate
      Female only Age Age Age Age
      BMI BMI HDL HDL
      HDL Central systolic BP Max carotid IM thickness 240 QTC interval
      Cholesterol HDL Cholesterol P duration
      QRS duration Mean arterial pressure T-axis T-axis
      T-axis R-axis BMI QRS duration
      Central systolic BP Cholesterol Pulse rate Cent. augment. press
      Mean carotid IM thickness 150 QRS duration PW arterial stiff. index PQ interval
      PQ interval Systolic brachial BP Mean carotid IM thickness 120 Augmentation index
      PW reflection index End systolic BP Mean carotid IM thickness 210 P-axis
      Male only Age Age Age Age
      BMI BMI Cholesterol T-axis
      Cholesterol Cholesterol HDL QTC interval
      QTC interval Central systolic BP PW stroke volume PQ interval
      T-axis QTC interval Pulse rate QRS duration
      HDL Systolic brachial BP T-axis PW stroke volume
      PQ interval Diabetes status BMI BMI
      Smoking status R-axis LV stroke volume Pulse rate
      R-axis PQ interval Max carotid IM thickness 210 LV end systolic volume
      Diabetes status Cent. augment. press Diastolic BP P-duration
      3.5.1 Female-only dataset

      HDL cholesterol is the only other feature to appear in all categories, while the body mass index, total cholesterol, and T-axis appear three times. The smoking and diabetes status do not appear at all. Hypertension is heavily predicted by four different measures of blood pressure, while ischemic heart disease is predicted by several measures of intima–media thickness from carotid ultrasounds. Lastly, the ECG feature, PQ interval, appears in two categories.

      3.5.2 Male-only dataset

      The body mass index is the only other feature to appear in all categories, while T-axis, total cholesterol, and the QTC interval appear in three of the disease categories. Six of the features for any disease are traditional risk factors. The other four are ECG features. For hypertension, blood pressure measures and diabetes status, along with traditional risk factors, like the age, body mass index, and cholesterol, contribute to risk prediction. Interestingly, three ECG features also make up the top 10 features. For ischemic diseases, the stroke volume and a carotid ultrasound feature add to the traditional risk factors of age, cholesterol, HDL cholesterol, body mass index, and blood pressure. Pulse rate and T-axis conclude the top 10. For conduction disorders, five of the features are ECG features. Age and body mass index are the only traditional risk factors. Magnetic resonance imaging and pulse wave analysis features of the pulse rate, stroke volume, and LV end-systolic volume are the rest of the top 10.

      3.5.3 Both sexes combined

      Traditional risk factors make up 47.5% of the top 10 features, while magnetic resonance imaging features make up 7.5%, carotid ultrasound 0%, ECG 32.5%, and pulse wave analysis 12.5%. When broken down by sex, for women, traditional risk factors contribute 37.5%, magnetic resonance imaging 0%, carotid ultrasound 10%, ECG 30%, and pulse wave analysis 22.5% to the top 10. For men, the breakdown is traditional risk factors 40%, magnetic resonance imaging 5%, carotid ultrasound 2.5%, ECG 32.5%, and pulse wave analysis 20%.

      4 Discussion

      Women are traditionally underdiagnosed for cardiovascular diseases. The lack of sex-specific criteria is one factor contributing to the underdiagnosis of cardiovascular diseases in women compared to that in men (St. Pierre et al., 2022). From Figure 2, we conclude that in the UK Biobank database, women are nearly 2× more underdiagnosed than men for a first-degree AV block and 1.4× more for dilated cardiomyopathy when using standard sex-neutral criteria. When accounting for average sex differences in the PQ interval, left ventricle diameter, and ejection fractions, the fraction by which women are underdiagnosed would increase even further.

      For essential primary hypertension, based on the current sex-neutral criteria, women and men are equally underdiagnosed. Yet, as Figure 2 suggests, women, on average, have lower systolic and diastolic blood pressures than men. If sex-specific criteria were used, women would be underdiagnosed for hypertension. Lastly, women have a smaller wall thickness than men, but the criteria for diagnosing hypertrophic cardiomyopathy are the same. Here, women would again benefit from sex-specific criteria.

      The novel SAINT model outperforms XGBoost in predicting the risk for cardiovascular disease. Using a dataset of UK Biobank patients who underwent cardiovascular clinical tests, we designed 60 classifiers based on relevant features, sex, and disease categories. We compared the new deep learning model SAINT to the state-of-the-art approach for tabular data, XGBoost, and to an MLP deep learning baseline. We found that SAINT showed the highest cardiovascular disease prediction AUC in nearly every case, XGBoost typically achieved the second-best AUC, and MLP, the lowest AUC.

      SAINT is specifically designed for tabular data, which makes its purpose-driven architecture significantly better for our task than an out-of-the-box MLP. The best performance of SAINT classifiers can be attributed, at least in part, to the fact that our data are composed of numerical, continuous features, which are known to favor the performance of SAINT over classical approaches, such as XGBoost (Borisov et al., 2022). The MLP trained with all cardiovascular and Framingham Risk Score features outperformed the state-of-the-art XGBoost method trained with Framingham-only features for all but two dataset variants, which indicates that having access to more features significantly increases model fidelity for this dataset.

      To date, the SAINT architecture has not been applied for risk analysis in cardiovascular diseases. Its remarkable performance not only holds promise for further clinical studies of both cardiovascular diseases and other conditions but also suggests that deep learning approaches could re-surface as viable methods for tabular clinical data modeling. Since deep learning frameworks require large datasets for effective training, we expect SAINT to improve even more as the size of the available medical dataset increases.

      Not all traditional risk factors are equally important. Age, body mass index, HDL and total cholesterol, and systolic blood pressure were the most common factors across sex and disease, with the smoking and diabetes status present only for men, as Table 5 suggests. A previous cardiovascular risk prediction study from the UK Biobank used a machine learning pipeline that included 423,604 participants and 473 features, including the Framingham Risk Score, health and medical history, lifestyle and environment, blood assays, physical activity, family history, physical measures, psychosocial factors, dietary and nutritional information, and sociodemographics (Alaa et al., 2019). This group did not have access to cholesterol levels at the time of their analysis, but they did find that the top feature for men and women was age (Alaa et al., 2019), which we reported as well in Table 5. The previous study reported the smoking status and systolic blood pressure in the top 10 for both women and men (Alaa et al., 2019), while we found the smoking status only for men and central systolic blood pressure for women.

      When broken down by the disease category, for hypertension measures of blood pressure, the body mass index, age, and cholesterol ranked highly. For men, the diabetes status was an important feature but not for women. Interestingly, only total cholesterol, not HDL cholesterol, ranked in the top 10 for men, while both appeared for women. For ischemic diseases, age, HDL and total cholesterol, and body mass index were in the top 10 for both women and men. For conduction disorders, for women, age and HDL cholesterol were in the top features, while for men, the age and body mass index appeared. The traditional risk factors in the Framingham Risk Score appear to be the most important for a general calculation of cardiovascular risk, but our study suggests re-evaluating it by taking into account the sex- and disease-specific categories. However, the underlying distributions for the traditional risk factors for the male and female populations are significantly different for all factors, except for systolic blood pressure, likely impacting the feature rankings.

      ECG recordings are the most effective feature to augment the Framingham Risk Score. For women, traditional risk factors made up 37.5% of the top 10 features, while for men, they made up 40%, as we conclude from Table 5. ECG features appeared next in the top 10, making up 30% and 32.5% for women and men, respectively, followed by pulse wave analysis with 22.5% and 20%, respectively.

      ECG features have previously been shown to be powerful predictors of cardiovascular disease (De Bacquer et al., 1998; Raghunath et al., 2021; Khurshid et al., 2022). For instance, the measurement of T-wave morphological variations only requires a single-beat, single-lead ECG and is fast, safe, and shown to identify individuals at risk for sudden cardiac death and life-threatening ventricular arrhythmias (Ramírez et al., 2022). ECG features, such as the QRS duration, QT duration, and T-wave morphology, are associated with increased cardiovascular mortality (Sahli Costabal et al., 2020; Siegersma et al., 2022; Hughes et al., 2023). Women are known to have a shorter PQ interval and QRS duration, longer QTC, and different T-wave morphology than men (Peirlinck et al., 2021a; Siegersma et al., 2022). All of these features appeared in the top 10 from our feature importance analysis across several sex and disease categories, as shown in Table 5. Adding the ECG features with high SHAP values to the traditional Framingham Risk Score features would be a simple yet effective strategy to increase the predictive potential of cardiovascular disease models.

      Central blood pressure is more predictive of the cardiovascular disease risk than brachial blood pressure. Pulse wave analysis provides multiple measures of blood pressure. In multiple sex and disease categories, as shown in Table 5, the central systolic blood pressure ranked higher than the systolic brachial blood pressure. Central blood pressure relates closely to the load on the coronary and cerebral arteries and, as such, is more strongly correlated with vascular diseases and negative outcomes than brachial blood pressure (Roman et al., 2007). Other pulse wave features, like the pulse rate, arterial stiffness index, reflection index, and mean arterial pressure, that made up the top 10, as shown in Table 5, have also previously been linked to an increased risk of cardiovascular disease (Kengne et al., 2009; Mitchell, 2009; Benetos et al., 2012; Cecelja and Chowienczyk, 2012).

      Carotid ultrasounds provide an accessible way to monitor ischemic heart diseases. Carotid ultrasounds measure the carotid intima–media thickness; a thicker intima–media thickness may indicate atherosclerosis of the carotid artery, leading to the brain (Bots et al., 1997). Increasing evidence suggests that atherosclerosis in the carotid artery is associated with atherosclerosis in the coronary artery, leading to an increased risk of stroke, myocardial infarction, and other ischemic heart diseases (Bots et al., 1997; Zhao et al., 2016; Bytyçi et al., 2021). Because the carotid artery is easily accessible compared to the coronary artery, carotid ultrasounds provide a non-invasive, simple way to screen patients for the increased risk of cardiovascular diseases (Bytyçi et al., 2021). In Table 5, we found that three features from the carotid ultrasound for women and one for men appeared in the prediction of ischemic diseases. One carotid ultrasound feature also appeared for women for the prediction of any cardiovascular disease. As such, carotid ultrasounds provide valuable insights on an individual’s risk for ischemic diseases, regardless of the sex, and may be especially useful for monitoring the cardiovascular disease risk of women in general.

      Limitations and future work: Our study provides a first step toward rethinking about risk indicators for cardiovascular diseases in view of big data and machine learning. Although our results provide encouraging evidence of the added value of leveraging both technologies, it is important to be aware of the limitations to our current approach and, ideally, address them in future follow-up studies. First, the Framingham score was designed to provide a 10-year prediction of risk for developing cardiovascular disease (Lloyd-Jones et al., 2004). Instead, here, we have used these features for the detection of cardiovascular diseases, to identify when it is currently present in an individual. A follow-up study with later time points would be needed to determine which features are best for a 10-year prediction of cardiovascular disease. Second, although the population of individuals with ICD-9 data is only 4% of the entire UK Biobank population, by not including these additional medical data along with the ICD-10 data that are used in this study, there may be valuable information missing on the prevalence of cardiovascular diseases in certain populations. Third, and most notably, the outcome of our approach is only as good as the clinical diagnoses that define our classification. It would be highly beneficial, and actually very feasible with modern machine learning techniques, to perform a comprehensive study of the human-level error by evaluating expert clinician performance on the utilized datasets. This would provide an estimate of the Bayes error rates for our 12 datasets that could then be compared to the SAINT model performance. Fourth, the population of the UK Biobank is fairly homogeneous, so our classifiers might not be generalizable to participants outside the United Kingdom who are from more diverse racial and ethnic backgrounds. Fifth, since SAINT was the best-performing approach for our classification tasks, future studies could focus on integrating a comprehensive feature importance pipeline, such as SHAP, into the SAINT model evaluation. This would leverage the high performance of the SAINT method and could translate to even more informative and credible feature significance rankings. Finally, a direct comparison between the XGBoost and SAINT feature analyses would provide further insights into the sensitivity of feature identification with respect to the specifics of a given learning architecture.

      5 Conclusion

      Women are underdiagnosed for cardiovascular diseases compared to men. Unarguably, there is an urgent need for sex-specific diagnostic criteria. Deep learning provides powerful tools to precisely quantify how well traditional risk factors, like the Framingham Risk Score, predict the risk of cardiovascular diseases, for females, males, or both sexes combined. Alarmingly, our deep learning study revealed that, for a first-degree atrioventricular block and dilated cardiomyopathy, women are underdiagnosed 2× and 1.4× more than men. Inversely, without much extra work, our deep learning approach allows us to identify and rank the most predictive features for different types of cardiovascular diseases, sex specifically and sex neutrally. We found that, out of the four commonly used clinical tests—electrocardiograms, magnetic resonance imaging, carotid ultrasounds, and pulse wave analysis—electrocardiogram features showed the most promise in increasing cardiovascular disease prediction. A more accurate individualized risk prediction of cardiovascular diseases would enable personalized treatment and prevention strategies, a more effective allocation of medical resources, and an early and precise identification of high-risk individuals, toward the ultimate goal to improve patient outcomes, reduce morbidity and mortality, and improve the quality of life.

      Data availability statement

      The datasets presented in this article are not readily available because researchers must apply to access the UK Biobank dataset. Requests to access the datasets should be directed to https://www.ukbiobank.ac.uk/.

      Author contributions

      SS: formal analysis, methodology, software, visualization, and writing–original draft. BK: formal analysis, methodology, software, visualization, and writing–original draft. MP: conceptualization, supervision, visualization, and writing–original draft. EK: conceptualization, funding acquisition, supervision, and writing–original draft.

      Funding

      The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This project was supported by an NSF Graduate Research Fellowship to SS and by the NSF CMMI grants 2318188 and 2320933 to EK.

      This research was conducted using the October 2022 release of the UK Biobank Resource under the application number 89726. It uses data provided by patients and that collected by the National Health Service, England, as part of their care and support; copyright 2022 National Health Service, England; re-used with the permission of the UK Biobank; all rights reserved. This project was carried out in part for the class CS230 Deep Learning, and the authors would like to thank Andrew Ng and the teaching assistants for their helpful feedback.

      Conflict of interest

      The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

      The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.

      Publisher’s note

      All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

      References Abadi M. Agarwal A. Barham P. Brevdo E. Chen Z. Citro C. (2016). TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv:10.48550/arXiv.1603.04467 Aceña V. Martín de Diego I. Fernández R. R. Moguerza M. J. (2022). Minimally overfitted learners: a general framework for ensemble learning. Knowledge-Based Syst. 254, 109669. 10.1016/j.knosys.2022.109669 Alaa A. M. Bolton T. Angelantonio E. D. Rudd J. H. F. van der Schaar M. (2019). Cardiovascular disease risk prediction using automated machine learning: a prospective study of 423,604 UK biobank participants. PLOS ONE 14, e0213653. 10.1371/journal.pone.0213653 Alber M. Buganza Tepole A. Cannon W. De S. Dura-Bernal S. Garikipati K. (2019). Integrating machine learning and multiscale modeling: perspectives, challenges, and opportunities in the biological, biomedical, and behavioral sciences. npj Digit. Med. 2, 115. 10.1038/s41746-019-0193-y Arik S. O. Pfister T. (2020). Tabnet: attentive interpretable tabular learning. arXiv :10.48550/arXiv.1908.07442 Arora G. Morss A. M. Piazza G. Ryan J. W. Dinwoodey D. L. Rofsky N. M. (2010). Differences in left ventricular ejection fraction using Teichholz formula and volumetric methods by CMR: implications for patient stratification and selection of therapy. J. Cardiovasc. Magnetic Reson. 12, P202. 10.1186/1532-429X-12-S1-P202 Athanasiou M. Sfrintzeri K. Zarkogianni K. Thanopoulou A. C. Nikita K. S. (2020). “An explainable xgboost–based approach towards assessing the risk of cardiovascular disease in patients with type 2 diabetes mellitus,” in Proceeding of the 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE), 859864. 10.1109/BIBE50027.2020.00146 Attia Z. I. Noseworthy P. A. Lopez-Jimenez F. Asirvatham S. J. Deshmukh A. J. Gersh B. J. (2019). An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. Lancet 394, 861867. 10.1016/S0140-6736(19)31721-0 Bai W. Suzuki H. Huang J. Francis C. Wang S. Tarroni G. (2020). A population-based phenome-wide association study of cardiac and aortic structure and function. Nat. Med. 26, 16541662. 10.1038/s41591-020-1009-y Benetos A. Gautier S. Labat C. Salvi P. Valbusa F. Marino F. (2012). Mortality and cardiovascular events are best predicted by low central/peripheral pulse pressure amplification but not by high blood pressure levels in elderly nursing home subjects: the PARTAGE (Predictive Values of Blood Pressure and Arterial Stiffness in Institutionalized Very Aged Population) study. J. Am. Coll. Cardiol. 60, 15031511. 10.1016/j.jacc.2012.04.055 Borisov V. Leemann T. Seßler K. Haug J. Pawelczyk M. Kasneci G. (2022). “Deep neural networks and tabular data: a survey,” in Proceeding of the IEEE Transactions on Neural Networks and Learning Systems, 121. Bots M. L. Hofman A. Grobbee D. E. (1997). Increase common carotid intima-media thickness. Stroke 28, 24422447. 10.1161/01.STR.28.12.2442 Bytyçi I. Shenouda R. Wester P. Henein M. Y. (2021). Carotid atherosclerosis in predicting coronary artery disease: a systematic review and meta-analysis. Arteriosclerosis, Thrombosis, Vasc. Biol. 41, e224e237. 10.1161/ATVBAHA.120.315747 Cannatà A. Fabris E. Merlo M. Artico J. Gentile P. Pio Loco C. (2020). Sex differences in the long-term prognosis of dilated cardiomyopathy. Can. J. Cardiol. 36, 3744. 10.1016/j.cjca.2019.05.031 Cecelja M. Chowienczyk P. (2012). Role of arterial stiffness in cardiovascular disease. JRSM Cardiovasc. Dis. 1, 110. 10.1258/cvd.2012.012016 Chen T. Guestrin C. (2016). “XGBoost: a scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785794. Chung A. K. Das S. R. Leonard D. Peshock R. M. Kazi F. Abdullah S. M. (2006). Women have higher left ventricular ejection fractions than men independent of differences in left ventricular volume: the Dallas Heart Study. Circulation 113, 15971604. 10.1161/CIRCULATIONAHA.105.574400 Cirillo D. Catuara-Solarz S. Morey C. Guney E. Subirats L. Mellino S. (2020). Sex and gender differences and biases in artificial intelligence for biomedicine and healthcare. npj Digit. Med. 2, 81. 10.1038/s41746-020-0288-5 Clayton J. A. Collins F. S. (2014). Policy: NIH to balance sex in cell and animal studies. Nature 509, 282283. 10.1038/509282a D’Agostino R. B. Vasan R. S. Pencina M. J. Wolf P. A. Cobain M. Massaro J. M. (2008). General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation 117, 743753. 10.1161/CIRCULATIONAHA.107.699579 Davies J. I. Struthers A. D. (2005). Beyond blood pressure: pulse wave analysis – a better way of assessing cardiovascular risk? Future Cardiol. 1, 6978. 10.1517/14796678.1.1.69 De Bacquer D. De Backer G. Kornitzer M. Blackburn H. (1998). Prognostic value of ECG findings for total, cardiovascular disease, and coronary heart disease death in men and women. Heart 80, 570577. 10.1136/hrt.80.6.570 Elliott P. M. Anastasakis A. Borger M. A. Borggrefe M. Cecchi F. Charron P. (2014). 2014 ESC Guidelines on diagnosis and management of hypertrophic cardiomyopathy: the task force for the diagnosis and management of hypertrophic cardiomyopathy of the European Society of Cardiology (ESC). Eur. Heart J. 35, 27332779. 10.1093/eurheartj/ehu284 Garcia-Sifuentes Y. Maney D. L. (2021). Reporting and misreporting of sex differences in the biological sciences. eLife 10, e70817. 10.7554/eLife.70817 Gilbert K. Bai W. Mauger C. Medrano-Gracia P. Suinesiaputra A. Lee A. M. (2019). Independent left ventricular morphometric atlases show consistent relationships with cardiovascular risk factors: a UK biobank study. Sci. Rep. 9, 1130. 10.1038/s41598-018-37916-6 Gorishniy Y. Rubachev I. Khrulkov V. Babenko A. (2021). Revisiting deep learning models for tabular data. Adv. Neural Inf. Process. Syst. 34, 1893218943. 10.48550/arXiv.2106.11959 Grinsztajn L. Oyallon E. Varoquaux G. (2022). Why do tree-based models still outperform deep learning on tabular data? arXiv. Guo H. Tang R. Ye Y. Li Z. He X. (2017). DeepFM: a factorization-machine based neural network for CTR prediction. arXiv:10.48550/arXiv.1703.04247 Gupta A. Sharma S. Goyal S. Rashid M. (2020). “Novel XGBoost tuned machine learning model for software bug prediction,” in Proceeding of the 2020 International Conference on Intelligent Engineering and Management (ICIEM). June 2020, London, UK (IEEE), 376380. Halliday B. P. Gulati A. Ali A. Newsome S. Lota A. Tayal U. (2018). Sex- and age-based differences in the natural history and outcome of dilated cardiomyopathy. Eur. J. Heart Fail. 20, 13921400. 10.1002/ejhf.1216 Holmqvist F. Daubert J. P. (2013). First-degree AV block–An entirely benign finding or a potentially curable cause of cardiac disease? Ann. Noninvasive Electrocardiol. 18, 215224. The Official Journal of the International Society for Holter and Noninvasive Electrocardiology, Inc. 10.1111/anec.12062 Huang X. Khetan A. Cvitkovic M. Karnin Z. (2020). TabTransformer: tabular data modeling using contextual embeddings. arXiv:10.48550/arXiv.2012.06678 Hughes J. W. Tooley J. Torres Soto J. Ostropolets A. Poterucha T. Christensen M. K. (2023). A deep learning-based electrocardiogram risk score for long term cardiovascular death and disease. npj Digit. Med. 6, 169. 10.1038/s41746-023-00916-6 Ke G. Meng Q. Finley T. Wang T. Chen W. Ma W. (2017). LightGBM: a highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 30. Kengne A.-P. Czernichow S. Huxley R. Grobbee D. Woodward M. Neal B. (2009). Blood pressure variables and cardiovascular risk: new findings from ADVANCE. Hypertension 54, 399404. 10.1161/HYPERTENSIONAHA.109.133041 Khurshid S. Friedman S. Reeder C. Di Achille P. Diamant N. Singh P. (2022). ECG-based deep learning and clinical risk factors to predict atrial fibrillation. Circulation 145, 122133. 10.1161/circulationaha.121.057480 Kremers H. M. Crowson C. S. Therneau T. M. Roger V. L. Gabriel S. E. (2008). High ten-year risk of cardiovascular disease in newly diagnosed rheumatoid arthritis patients: a population-based cohort study. Arthritis and Rheumatism 58, 22682274. 10.1002/art.23650 Lala A. Tayal U. Hamo C. E. Youmans Q. Al-Khatib S. M. Bozkurt B. (2022). Sex differences in heart failure. J. Cardiac Fail. 28, 477498. 10.1016/j.cardfail.2021.10.006 Lloyd-Jones D. M. Wilson P. W. F. Larson M. G. Beiser A. Leip E. P. D’Agostino R. B. (2004). Framingham Risk Score and prediction of lifetime risk for coronary heart disease. Am. J. Cardiol. 94, 2024. 10.1016/j.amjcard.2004.03.023 Loshchilov I. Hutter F. (2019). Decoupled weight decay regularization. arXiv:10.48550/arXiv.1711.05101 Lundberg S. Lee S.-I. (2017). A unified approach to interpreting model predictions. arXiv:10.48550/arXiv.1705.07874 Madani A. Ong J. R. Tibrewal A. Mofrad M. R. K. (2018). Deep echocardiography: data-efficient supervised and semi-supervised deep learning towards automated diagnosis of cardiac disease. npj Digit. Med. 1, 59. 10.1038/s41746-018-0065-x Mandrekar J. N. (2010). Receiver operating characteristic curve in diagnostic test assessment. J. Thorac. Oncol. 5, 13151316. 10.1097/JTO.0b013e3181ec173d McMurray J. J. Jackson A. M. Lam C. S. Redfield M. M. Anand I. S. Ge J. (2020). Effects of sacubitril-valsartan versus valsartan in women compared with men with heart failure and preserved ejection fraction: insights from PARAGON-HF. Circulation 141, 338351. 10.1161/CIRCULATIONAHA.119.044491 Mitchell G. F. (2009). Arterial stiffness and wave reflection: biomarkers of cardiovascular risk. Artery Res. 3, 5664. 10.1016/j.artres.2009.02.002 Morgenroth T. Ryan M. K. (2021). The effects of gender trouble: an integrative theoretical framework of the perpetuation and disruption of the gender/sex binary. Perspect. Psychol. Sci. 16, 11131142. 10.1177/1745691620902442 Oldroyd S. H. Quintanilla Rodriguez B. S. Makaryus A. N. (2022). First degree heart block. Treasure Island (FL): StatPearls Publishing. Olivotto I. Maron M. S. Adabag A. S. Casey S. A. Vargiu D. Link M. S. (2005). Gender-related differences in the clinical presentation and outcome of hypertrophic cardiomyopathy. J. Am. Coll. Cardiol. 46, 480487. 10.1016/j.jacc.2005.04.043 OpenAI (2023). GPT-4 technical report. arXiv 10.48550/arXiv.2303.08774 Orphanou N. Papatheodorou E. Anastasakis A. (2022). Dilated cardiomyopathy in the era of precision medicine: latest concepts and developments. Heart Fail. Rev. 27, 11731191. 10.1007/s10741-021-10139-0 Papadopoulou A. Harding D. Slabaugh G. Marouli E. Deloukas P. (2022). Prediction of atrial fibrillation and stroke using machine learning models in UK Biobank. medRxiv 10.1101/2022.10.28.22281669 Parikh R. Mathai A. Parikh S. Chandra Sekhar G. Thomas R. (2008). Understanding and using sensitivity, specificity and predictive values. Indian J. Ophthalmol. 56, 4550. 10.4103/0301-4738.37595 Peirlinck M. Sahli Costabal F. Kuhl E. (2021a). Sex differences in drug-induced arrhythmogenesis. Front. Physiology 12, 708435. 10.3389/fphys.2021.708435 Peirlinck M. Sahli Costabal F. Yao J. Guccione J. M. Tripathy S. Wang Y. (2021b). Precision medicine in human heart modeling. Perspectives, challenges and opportunities. Biomechanics Model. Mechanobiol. 20, 803831. 10.1007/s10237-021-01421-z Ponikowski P. Voors A. A. Anker S. D. Bueno H. Cleland J. G. F. Coats A. J. S. (2016). 2016 ESC Guidelines for the diagnosis and treatment of acute and chronic heart failure: the Task Force for the diagnosis and treatment of acute and chronic heart failure of the European Society of Cardiology (ESC)Developed with the special contribution of the Heart Failure Association (HFA) of the ESC. Eur. Heart J. 37, 21292200. 10.1093/eurheartj/ehw128 Prokhorenkova L. Gusev G. Vorobev A. Dorogush A. V. Gulin A. (2018). CatBoost: unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst. 31. Putatunda S. Rama K. (2018). “A comparative analysis of hyperopt as against other approaches for hyper-parameter optimization of XGBoost,” in Proceedings of the 2018 International Conference on Signal Processing and Machine Learning, 610. Raghunath S. Pfeifer J. M. Ulloa-Cerna A. E. Nemani A. Carbonati T. Jing L. (2021). Deep neural networks can predict new-onset atrial fibrillation from the 12-lead ECG and help identify those at risk of atrial fibrillation–related stroke. Circulation 143, 12871298. 10.1161/circulationaha.120.047829 Rajadevi R. Devi E. Shanthakumari R. Latha R. Anitha N. Devipriya R. (2021). “Feature selection for predicting heart disease using black hole optimization algorithm and xgboost classifier,” in Proceedings of the 2021 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, January 2021 (IEEE), 17. Rajliwall N. S. Davey R. Chetty G. (2018). “Cardiovascular risk prediction based on XGBoost,” in Proceedings of the 2018 5th Asia-Pacific World Congress on Computer Science and Engineering (APWC on CSE), 246252. 10.1109/APWConCSE.2018.00047 Ramírez J. Kiviniemi A. van Duijvenboden S. Tinker A. Lambiase P. D. Junttila J. (2022). ECG T-wave morphologic variations predict ventricular arrhythmic risk in low- and moderate-risk populations. J. Am. Heart Assoc. 11, e025897. 10.1161/JAHA.121.025897 Ramírez J. van Duijvenboden S. Young W. J. Orini M. Jones A. R. Lambiase P. D. (2021). Analysing electrocardiographic traits and predicting cardiac risk in UK Biobank. JRSM Cardiovasc. Dis. 10, 20480040211023664. 10.1177/20480040211023664 Roman M. J. Devereux R. B. Kizer J. R. Lee E. T. Galloway J. M. Ali T. (2007). Central pressure more strongly relates to vascular disease and outcome than does brachial pressure: the Strong Heart Study. Hypertension 50, 197203. 10.1161/HYPERTENSIONAHA.107.089078 Rutkowski D. R. Barton G. P. François C. J. Aggarwal N. Roldán-Alzate A. (2020). Sex differences in cardiac flow dynamics of healthy volunteers. Radiol. Cardiothorac. Imaging 2, e190058. 10.1148/ryct.2020190058 Sahli Costabal F. Seo K. Ashley E. Kuhl E. (2020). Classifying drugs by their arrhythmogenic risk using machine learning. Biophysical J. 118, 112. 10.1016/j.bpj.2020.01.012 Said M. A. Eppinga R. N. Lipsic E. Verweij N. van der Harst P. (2018). Relationship of arterial stiffness index and pulse pressure with cardiovascular disease and mortality. J. Am. Heart Assoc. Cardiovasc. Cerebrovasc. Dis. 7, e007621. 10.1161/JAHA.117.007621 Sharma D. Gotlieb N. Farkouh M. E. Patel K. Xu W. Bhat M. (2022). Machine learning approach to classify cardiovascular disease in patients with nonalcoholic fatty liver disease in the UK Biobank cohort. J. Am. Heart Assoc. 11, e022576. 10.1161/JAHA.121.022576 Shwartz-Ziv R. Armon A. (2022). Tabular data: deep learning is not all you need. Inf. Fusion 81, 8490. 10.1016/j.inffus.2021.11.011 Siegersma K. R. van de Leur R. R. Onland-Moret N. C. Leon D. A. Diez-Benavente E. Rozendaal L. (2022). Deep neural networks reveal novel sex-specific electrocardiographic features relevant for mortality risk. Eur. Heart J. - Digital Health 3, 245254. 10.1093/ehjdh/ztac010 Sjöström L. Lindroos A.-K. Peltonen M. Torgerson J. Bouchard C. Carlsson B. (2004). Lifestyle, diabetes, and cardiovascular risk factors 10 years after bariatric surgery. N. Engl. J. Med. 351, 26832693. 10.1056/NEJMoa035622 Solomon S. D. McMurray J. J. (2021). Making the case for an expanded indication for Sacubitril/Valsartan in heart failure. J. Cardiac Fail. 27, 693695. 10.1016/j.cardfail.2021.04.008 Somepalli G. Goldblum M. Schwarzschild A. Bruss C. B. Goldstein T. (2021). SAINT: improved neural networks for tabular data via row attention and contrastive pre-training. arXiv:10.48550/arXiv.2106.01342 St. Pierre S. R. Peirlinck M. Kuhl E. (2022). Sex matters: a comprehensive comparison of female and male hearts. Front. Physiology 13, 831179. 10.3389/fphys.2022.831179 Sudlow C. Gallacher J. Allen N. Beral V. Burton P. Danesh J. (2015). UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779. 10.1371/journal.pmed.1001779 van Driel B. Nijenkamp L. Huurman R. Michels M. van der Velden J. (2019). Sex differences in hypertrophic cardiomyopathy: new insights. Curr. Opin. Cardiol. 34, 254259. 10.1097/HCO.0000000000000612 Vaswani A. Shazeer N. Parmar N. Uszkoreit J. Jones L. Gomez A. N. (2017). Attention is all you need. Adv. Neural Inf. Process. Syst. 30. 10.48550/arXiv.1706.03762x Williams B. Mancia G. Spiering W. Agabiti Rosei E. Azizi M. Burnier M. (2018). 2018 ESC/ESH Guidelines for the management of arterial hypertension: the task force for the diagnosis and management of hypertrophic cardiomyopathy of the European Society of Cardiology (ESC) and the European Society of Hypertension (ESH). Eur. Heart J. 39, 30213104. 10.1093/eurheartj/ehy339 Wilson P. W. F. D’Agostino R. B. Levy D. Belanger A. M. Silbershatz H. Kannel W. B. (1998). Prediction of coronary heart disease using risk factor categories. Circulation 97, 18371847. 10.1161/01.CIR.97.18.1837 World Health Organization (2004). ICD-10: international statistical classification of diseases and related health problems: tenth revision. Tech. report. Geneva, Switzerland: World Health Organization. Yang H. Luo Y. M. Ma C. Y. Zhang T. Y. Zhou T. Ren X. L. (2023). A gender specific risk assessment of coronary heart disease based on physical examination data. npj Digit. Med. 6, 136. 10.1038/s41746-023-00887-8 Zhao W. Wu Y. Shi M. Bai L. Tu J. Guo Z. (2016). Sex differences in prevalence of and risk factors for carotid plaque among adults: a population-based cross-sectional study in rural China. Sci. Rep. 6, 38618. 10.1038/srep38618 Appendix A Hyperparameter tuning

      MLP: A deep learning model for baseline comparison. We apply Bayesian optimization using the Keras Tuner to tune the hyperparameters of the multilayer perceptron over the validation data for the dataset with both sexes and any disease. The optimization shows that a multilayer perceptron architecture with six hidden layers, 30 units in each hidden layer, and an L 2-regularization parameter λ = 10–2 in each layer provides a robust model performance. We observe that a batch size of 32 yields the fastest convergence since all datasets are small-to-medium-sized. A maximum of 100 epochs allows us to reach convergence while preserving the computational feasibility. We further conclude that the performance of the multilayer perceptron on the validation data is not highly sensitive to changes in other hyperparameters, so we used the default values of the learning rate and the coefficients in the Adam update rule. In principle, we should apply this hyperparameter study to each of the 12 multilayer perceptrons individually. However, the Bayesian tuning process itself is parallelized on a single GPU, and we decided not to tune all 12 multilayer perceptron models individually. We applied the hyperparameters of both sexes, any disease dataset, to the remaining 11 models.

      XGBoost: A state-of-the-art analysis of tabular data. Since XGBoost training is significantly faster than multilayer perceptron training and is readily parallelizable over the 12 dataset variants, we apply a random search with five-fold cross-validation to tune the hyperparameters of each of the XGBoost models individually. Specifically, we tune the XGBoost hyperparameters for each of the 12 dataset variants, and for the two input feature groups, all available features (cardiovascular and the Framingham Risk Score), and Framingham Risk Score features only. Based on prior work (Putatunda and Rama, 2018; Gupta et al., 2020), the most valuable hyperparameters for tuning include the maximum tree depth, learning rate, subsample ratio of the training instances, subsample ratio of columns for each tree, subsampling ratio of columns for each level, and the number of tree estimators. We tune these hyperparameters in a parallelized random search routine. The tuning process minimizes the AUC score over the validation data. We compare two different XGBoost ensembles: one set that has access to all features (cardiovascular and Framingham Risk Score features) and another set that can train on Framingham Risk Score features and the body mass index only. Each set of XGBoost models is applied to all 12 dataset variants, and hyperparameter tuning is performed for each XGBoost model individually. Thus, we compute 24 optimal sets of XGBoost hyperparameters for the resulting 24 population–disease–feature combinations (three input population groups, four disease label sets, and two feature groups). For evaluation purposes, we consider both the 24 tuned and 12 untuned XGBoost models separately, which yield a total of 36 XGBoost ensembles for performance analysis. The 12 untuned XGBoost models for the Framingham Risk Score-only feature group are not included in the evaluation due to redundancy.

      SAINT: A novel approach for tubular learning. We did not have access to sufficient computed data to be able to perform full-scale hyperparameter tuning for the 12 SAINT classifiers. We did, however, make adjustments to hyperparameters that proved to have the most significant effect on the validation performance. Specifically, we increased the weight decay parameter of the AdamW optimization scheme (Loshchilov and Hutter, 2019) to w = 10 since lower values (such as the default of w = 10–2) led to significant overfitting on the training set. We also found that the default choices of the learning rate and other AdamW optimization parameters provided the best validation set performance.

      Appendix B Computational structure of the SAINT architecture

      Here, we present a brief synthesis of the SAINT framework structure, as described by Somepalli et al. (2021). From a high-dimensional embedding of the input features, a single stage of the SAINT unit computes the output through a series of multi-head self-attention blocks, intersample attention blocks, feed forward layers, and layer-normalization layers. Several of these stages are stacked sequentially before the final contextual representation is generated. In each stage, the self-attention block applies attention among the features of a given sample, while the intersample attention applies row-wise attention across different samples for a given feature. Tabular data learning in SAINT is divided into two phases: self-supervised pre-training and supervised fine tuning. The pre-training phase consists of minimizing the combined contrastive features and denoising the cost function without considering the example labels. The fine-tuning phase evaluates a deviation metric between the ground truth and model prediction. Throughout this study, we perform a binary classification of cardiovascular disease presence and use the binary cross-entropy loss function as our deviation metric of choice.

      ‘Oh, my dear Thomas, you haven’t heard the terrible news then?’ she said. ‘I thought you would be sure to have seen it placarded somewhere. Alice went straight to her room, and I haven’t seen her since, though I repeatedly knocked at the door, which she has locked on the inside, and I’m sure it’s most unnatural of her not to let her own mother comfort her. It all happened in a moment: I have always said those great motor-cars shouldn’t be allowed to career about the streets, especially when they are all paved with cobbles as they are at Easton Haven, which are{331} so slippery when it’s wet. He slipped, and it went over him in a moment.’ My thanks were few and awkward, for there still hung to the missive a basting thread, and it was as warm as a nestling bird. I bent low--everybody was emotional in those days--kissed the fragrant thing, thrust it into my bosom, and blushed worse than Camille. "What, the Corner House victim? Is that really a fact?" "My dear child, I don't look upon it in that light at all. The child gave our picturesque friend a certain distinction--'My husband is dead, and this is my only child,' and all that sort of thing. It pays in society." leave them on the steps of a foundling asylum in order to insure [See larger version] Interoffice guff says you're planning definite moves on your own, J. O., and against some opposition. Is the Colonel so poor or so grasping—or what? Albert could not speak, for he felt as if his brains and teeth were rattling about inside his head. The rest of[Pg 188] the family hunched together by the door, the boys gaping idiotically, the girls in tears. "Now you're married." The host was called in, and unlocked a drawer in which they were deposited. The galleyman, with visible reluctance, arrayed himself in the garments, and he was observed to shudder more than once during the investiture of the dead man's apparel. HoME香京julia种子在线播放 ENTER NUMBET 0016www.gcqjwc.com.cn
      epepiy.com.cn
      gxxsgj.com.cn
      hkaco168.com.cn
      www.gzdjzx.com.cn
      www.vu7.com.cn
      ogato.com.cn
      www.wfjtip.com.cn
      www.qwchain.com.cn
      www.whsbzl.com.cn
      处女被大鸡巴操 强奸乱伦小说图片 俄罗斯美女爱爱图 调教强奸学生 亚洲女的穴 夜来香图片大全 美女性强奸电影 手机版色中阁 男性人体艺术素描图 16p成人 欧美性爱360 电影区 亚洲电影 欧美电影 经典三级 偷拍自拍 动漫电影 乱伦电影 变态另类 全部电 类似狠狠鲁的网站 黑吊操白逼图片 韩国黄片种子下载 操逼逼逼逼逼 人妻 小说 p 偷拍10幼女自慰 极品淫水很多 黄色做i爱 日本女人人体电影快播看 大福国小 我爱肏屄美女 mmcrwcom 欧美多人性交图片 肥臀乱伦老头舔阴帝 d09a4343000019c5 西欧人体艺术b xxoo激情短片 未成年人的 插泰国人夭图片 第770弾み1 24p 日本美女性 交动态 eee色播 yantasythunder 操无毛少女屄 亚洲图片你懂的女人 鸡巴插姨娘 特级黄 色大片播 左耳影音先锋 冢本友希全集 日本人体艺术绿色 我爱被舔逼 内射 幼 美阴图 喷水妹子高潮迭起 和后妈 操逼 美女吞鸡巴 鸭个自慰 中国女裸名单 操逼肥臀出水换妻 色站裸体义术 中国行上的漏毛美女叫什么 亚洲妹性交图 欧美美女人裸体人艺照 成人色妹妹直播 WWW_JXCT_COM r日本女人性淫乱 大胆人艺体艺图片 女同接吻av 碰碰哥免费自拍打炮 艳舞写真duppid1 88电影街拍视频 日本自拍做爱qvod 实拍美女性爱组图 少女高清av 浙江真实乱伦迅雷 台湾luanlunxiaoshuo 洛克王国宠物排行榜 皇瑟电影yy频道大全 红孩儿连连看 阴毛摄影 大胆美女写真人体艺术摄影 和风骚三个媳妇在家做爱 性爱办公室高清 18p2p木耳 大波撸影音 大鸡巴插嫩穴小说 一剧不超两个黑人 阿姨诱惑我快播 幼香阁千叶县小学生 少女妇女被狗强奸 曰人体妹妹 十二岁性感幼女 超级乱伦qvod 97爱蜜桃ccc336 日本淫妇阴液 av海量资源999 凤凰影视成仁 辰溪四中艳照门照片 先锋模特裸体展示影片 成人片免费看 自拍百度云 肥白老妇女 女爱人体图片 妈妈一女穴 星野美夏 日本少女dachidu 妹子私处人体图片 yinmindahuitang 舔无毛逼影片快播 田莹疑的裸体照片 三级电影影音先锋02222 妻子被外国老头操 观月雏乃泥鳅 韩国成人偷拍自拍图片 强奸5一9岁幼女小说 汤姆影院av图片 妹妹人艺体图 美女大驱 和女友做爱图片自拍p 绫川まどか在线先锋 那么嫩的逼很少见了 小女孩做爱 处女好逼连连看图图 性感美女在家做爱 近距离抽插骚逼逼 黑屌肏金毛屄 日韩av美少女 看喝尿尿小姐日逼色色色网图片 欧美肛交新视频 美女吃逼逼 av30线上免费 伊人在线三级经典 新视觉影院t6090影院 最新淫色电影网址 天龙影院远古手机版 搞老太影院 插进美女的大屁股里 私人影院加盟费用 www258dd 求一部电影里面有一个二猛哥 深肛交 日本萌妹子人体艺术写真图片 插入屄眼 美女的木奶 中文字幕黄色网址影视先锋 九号女神裸 和骚人妻偷情 和潘晓婷做爱 国模大尺度蜜桃 欧美大逼50p 西西人体成人 李宗瑞继母做爱原图物处理 nianhuawang 男鸡巴的视屏 � 97免费色伦电影 好色网成人 大姨子先锋 淫荡巨乳美女教师妈妈 性nuexiaoshuo WWW36YYYCOM 长春继续给力进屋就操小女儿套干破内射对白淫荡 农夫激情社区 日韩无码bt 欧美美女手掰嫩穴图片 日本援交偷拍自拍 入侵者日本在线播放 亚洲白虎偷拍自拍 常州高见泽日屄 寂寞少妇自卫视频 人体露逼图片 多毛外国老太 变态乱轮手机在线 淫荡妈妈和儿子操逼 伦理片大奶少女 看片神器最新登入地址sqvheqi345com账号群 麻美学姐无头 圣诞老人射小妞和强奸小妞动话片 亚洲AV女老师 先锋影音欧美成人资源 33344iucoom zV天堂电影网 宾馆美女打炮视频 色五月丁香五月magnet 嫂子淫乱小说 张歆艺的老公 吃奶男人视频在线播放 欧美色图男女乱伦 avtt2014ccvom 性插色欲香影院 青青草撸死你青青草 99热久久第一时间 激情套图卡通动漫 幼女裸聊做爱口交 日本女人被强奸乱伦 草榴社区快播 2kkk正在播放兽骑 啊不要人家小穴都湿了 www猎奇影视 A片www245vvcomwwwchnrwhmhzcn 搜索宜春院av wwwsee78co 逼奶鸡巴插 好吊日AV在线视频19gancom 熟女伦乱图片小说 日本免费av无码片在线开苞 鲁大妈撸到爆 裸聊官网 德国熟女xxx 新不夜城论坛首页手机 女虐男网址 男女做爱视频华为网盘 激情午夜天亚洲色图 内裤哥mangent 吉沢明歩制服丝袜WWWHHH710COM 屌逼在线试看 人体艺体阿娇艳照 推荐一个可以免费看片的网站如果被QQ拦截请复制链接在其它浏览器打开xxxyyy5comintr2a2cb551573a2b2e 欧美360精品粉红鲍鱼 教师调教第一页 聚美屋精品图 中韩淫乱群交 俄罗斯撸撸片 把鸡巴插进小姨子的阴道 干干AV成人网 aolasoohpnbcn www84ytom 高清大量潮喷www27dyycom 宝贝开心成人 freefronvideos人母 嫩穴成人网gggg29com 逼着舅妈给我口交肛交彩漫画 欧美色色aV88wwwgangguanscom 老太太操逼自拍视频 777亚洲手机在线播放 有没有夫妻3p小说 色列漫画淫女 午间色站导航 欧美成人处女色大图 童颜巨乳亚洲综合 桃色性欲草 色眯眯射逼 无码中文字幕塞外青楼这是一个 狂日美女老师人妻 爱碰网官网 亚洲图片雅蠛蝶 快播35怎么搜片 2000XXXX电影 新谷露性家庭影院 深深候dvd播放 幼齿用英语怎么说 不雅伦理无需播放器 国外淫荡图片 国外网站幼幼嫩网址 成年人就去色色视频快播 我鲁日日鲁老老老我爱 caoshaonvbi 人体艺术avav 性感性色导航 韩国黄色哥来嫖网站 成人网站美逼 淫荡熟妇自拍 欧美色惰图片 北京空姐透明照 狼堡免费av视频 www776eom 亚洲无码av欧美天堂网男人天堂 欧美激情爆操 a片kk266co 色尼姑成人极速在线视频 国语家庭系列 蒋雯雯 越南伦理 色CC伦理影院手机版 99jbbcom 大鸡巴舅妈 国产偷拍自拍淫荡对话视频 少妇春梦射精 开心激动网 自拍偷牌成人 色桃隐 撸狗网性交视频 淫荡的三位老师 伦理电影wwwqiuxia6commqiuxia6com 怡春院分站 丝袜超短裙露脸迅雷下载 色制服电影院 97超碰好吊色男人 yy6080理论在线宅男日韩福利大全 大嫂丝袜 500人群交手机在线 5sav 偷拍熟女吧 口述我和妹妹的欲望 50p电脑版 wwwavtttcon 3p3com 伦理无码片在线看 欧美成人电影图片岛国性爱伦理电影 先锋影音AV成人欧美 我爱好色 淫电影网 WWW19MMCOM 玛丽罗斯3d同人动画h在线看 动漫女孩裸体 超级丝袜美腿乱伦 1919gogo欣赏 大色逼淫色 www就是撸 激情文学网好骚 A级黄片免费 xedd5com 国内的b是黑的 快播美国成年人片黄 av高跟丝袜视频 上原保奈美巨乳女教师在线观看 校园春色都市激情fefegancom 偷窥自拍XXOO 搜索看马操美女 人本女优视频 日日吧淫淫 人妻巨乳影院 美国女子性爱学校 大肥屁股重口味 啪啪啪啊啊啊不要 操碰 japanfreevideoshome国产 亚州淫荡老熟女人体 伦奸毛片免费在线看 天天影视se 樱桃做爱视频 亚卅av在线视频 x奸小说下载 亚洲色图图片在线 217av天堂网 东方在线撸撸-百度 幼幼丝袜集 灰姑娘的姐姐 青青草在线视频观看对华 86papa路con 亚洲1AV 综合图片2区亚洲 美国美女大逼电影 010插插av成人网站 www色comwww821kxwcom 播乐子成人网免费视频在线观看 大炮撸在线影院 ,www4KkKcom 野花鲁最近30部 wwwCC213wapwww2233ww2download 三客优最新地址 母亲让儿子爽的无码视频 全国黄色片子 欧美色图美国十次 超碰在线直播 性感妖娆操 亚洲肉感熟女色图 a片A毛片管看视频 8vaa褋芯屑 333kk 川岛和津实视频 在线母子乱伦对白 妹妹肥逼五月 亚洲美女自拍 老婆在我面前小说 韩国空姐堪比情趣内衣 干小姐综合 淫妻色五月 添骚穴 WM62COM 23456影视播放器 成人午夜剧场 尼姑福利网 AV区亚洲AV欧美AV512qucomwwwc5508com 经典欧美骚妇 震动棒露出 日韩丝袜美臀巨乳在线 av无限吧看 就去干少妇 色艺无间正面是哪集 校园春色我和老师做爱 漫画夜色 天海丽白色吊带 黄色淫荡性虐小说 午夜高清播放器 文20岁女性荫道口图片 热国产热无码热有码 2015小明发布看看算你色 百度云播影视 美女肏屄屄乱轮小说 家族舔阴AV影片 邪恶在线av有码 父女之交 关于处女破处的三级片 极品护士91在线 欧美虐待女人视频的网站 享受老太太的丝袜 aaazhibuo 8dfvodcom成人 真实自拍足交 群交男女猛插逼 妓女爱爱动态 lin35com是什么网站 abp159 亚洲色图偷拍自拍乱伦熟女抠逼自慰 朝国三级篇 淫三国幻想 免费的av小电影网站 日本阿v视频免费按摩师 av750c0m 黄色片操一下 巨乳少女车震在线观看 操逼 免费 囗述情感一乱伦岳母和女婿 WWW_FAMITSU_COM 偷拍中国少妇在公车被操视频 花也真衣论理电影 大鸡鸡插p洞 新片欧美十八岁美少 进击的巨人神thunderftp 西方美女15p 深圳哪里易找到老女人玩视频 在线成人有声小说 365rrr 女尿图片 我和淫荡的小姨做爱 � 做爱技术体照 淫妇性爱 大学生私拍b 第四射狠狠射小说 色中色成人av社区 和小姨子乱伦肛交 wwwppp62com 俄罗斯巨乳人体艺术 骚逼阿娇 汤芳人体图片大胆 大胆人体艺术bb私处 性感大胸骚货 哪个网站幼女的片多 日本美女本子把 色 五月天 婷婷 快播 美女 美穴艺术 色百合电影导航 大鸡巴用力 孙悟空操美少女战士 狠狠撸美女手掰穴图片 古代女子与兽类交 沙耶香套图 激情成人网区 暴风影音av播放 动漫女孩怎么插第3个 mmmpp44 黑木麻衣无码ed2k 淫荡学姐少妇 乱伦操少女屄 高中性爱故事 骚妹妹爱爱图网 韩国模特剪长发 大鸡巴把我逼日了 中国张柏芝做爱片中国张柏芝做爱片中国张柏芝做爱片中国张柏芝做爱片中国张柏芝做爱片 大胆女人下体艺术图片 789sss 影音先锋在线国内情侣野外性事自拍普通话对白 群撸图库 闪现君打阿乐 ady 小说 插入表妹嫩穴小说 推荐成人资源 网络播放器 成人台 149大胆人体艺术 大屌图片 骚美女成人av 春暖花开春色性吧 女亭婷五月 我上了同桌的姐姐 恋夜秀场主播自慰视频 yzppp 屄茎 操屄女图 美女鲍鱼大特写 淫乱的日本人妻山口玲子 偷拍射精图 性感美女人体艺木图片 种马小说完本 免费电影院 骑士福利导航导航网站 骚老婆足交 国产性爱一级电影 欧美免费成人花花性都 欧美大肥妞性爱视频 家庭乱伦网站快播 偷拍自拍国产毛片 金发美女也用大吊来开包 缔D杏那 yentiyishu人体艺术ytys WWWUUKKMCOM 女人露奶 � 苍井空露逼 老荡妇高跟丝袜足交 偷偷和女友的朋友做爱迅雷 做爱七十二尺 朱丹人体合成 麻腾由纪妃 帅哥撸播种子图 鸡巴插逼动态图片 羙国十次啦中文 WWW137AVCOM 神斗片欧美版华语 有气质女人人休艺术 由美老师放屁电影 欧美女人肉肏图片 白虎种子快播 国产自拍90后女孩 美女在床上疯狂嫩b 饭岛爱最后之作 幼幼强奸摸奶 色97成人动漫 两性性爱打鸡巴插逼 新视觉影院4080青苹果影院 嗯好爽插死我了 阴口艺术照 李宗瑞电影qvod38 爆操舅母 亚洲色图七七影院 被大鸡巴操菊花 怡红院肿么了 成人极品影院删除 欧美性爱大图色图强奸乱 欧美女子与狗随便性交 苍井空的bt种子无码 熟女乱伦长篇小说 大色虫 兽交幼女影音先锋播放 44aad be0ca93900121f9b 先锋天耗ばさ无码 欧毛毛女三级黄色片图 干女人黑木耳照 日本美女少妇嫩逼人体艺术 sesechangchang 色屄屄网 久久撸app下载 色图色噜 美女鸡巴大奶 好吊日在线视频在线观看 透明丝袜脚偷拍自拍 中山怡红院菜单 wcwwwcom下载 骑嫂子 亚洲大色妣 成人故事365ahnet 丝袜家庭教mp4 幼交肛交 妹妹撸撸大妈 日本毛爽 caoprom超碰在email 关于中国古代偷窥的黄片 第一会所老熟女下载 wwwhuangsecome 狼人干综合新地址HD播放 变态儿子强奸乱伦图 强奸电影名字 2wwwer37com 日本毛片基地一亚洲AVmzddcxcn 暗黑圣经仙桃影院 37tpcocn 持月真由xfplay 好吊日在线视频三级网 我爱背入李丽珍 电影师傅床戏在线观看 96插妹妹sexsex88com 豪放家庭在线播放 桃花宝典极夜著豆瓜网 安卓系统播放神器 美美网丝袜诱惑 人人干全免费视频xulawyercn av无插件一本道 全国色五月 操逼电影小说网 good在线wwwyuyuelvcom www18avmmd 撸波波影视无插件 伊人幼女成人电影 会看射的图片 小明插看看 全裸美女扒开粉嫩b 国人自拍性交网站 萝莉白丝足交本子 七草ちとせ巨乳视频 摇摇晃晃的成人电影 兰桂坊成社人区小说www68kqcom 舔阴论坛 久撸客一撸客色国内外成人激情在线 明星门 欧美大胆嫩肉穴爽大片 www牛逼插 性吧星云 少妇性奴的屁眼 人体艺术大胆mscbaidu1imgcn 最新久久色色成人版 l女同在线 小泽玛利亚高潮图片搜索 女性裸b图 肛交bt种子 最热门有声小说 人间添春色 春色猜谜字 樱井莉亚钢管舞视频 小泽玛利亚直美6p 能用的h网 还能看的h网 bl动漫h网 开心五月激 东京热401 男色女色第四色酒色网 怎么下载黄色小说 黄色小说小栽 和谐图城 乐乐影院 色哥导航 特色导航 依依社区 爱窝窝在线 色狼谷成人 91porn 包要你射电影 色色3A丝袜 丝袜妹妹淫网 爱色导航(荐) 好男人激情影院 坏哥哥 第七色 色久久 人格分裂 急先锋 撸撸射中文网 第一会所综合社区 91影院老师机 东方成人激情 怼莪影院吹潮 老鸭窝伊人无码不卡无码一本道 av女柳晶电影 91天生爱风流作品 深爱激情小说私房婷婷网 擼奶av 567pao 里番3d一家人野外 上原在线电影 水岛津实透明丝袜 1314酒色 网旧网俺也去 0855影院 在线无码私人影院 搜索 国产自拍 神马dy888午夜伦理达达兔 农民工黄晓婷 日韩裸体黑丝御姐 屈臣氏的燕窝面膜怎么样つぼみ晶エリーの早漏チ○ポ强化合宿 老熟女人性视频 影音先锋 三上悠亚ol 妹妹影院福利片 hhhhhhhhsxo 午夜天堂热的国产 强奸剧场 全裸香蕉视频无码 亚欧伦理视频 秋霞为什么给封了 日本在线视频空天使 日韩成人aⅴ在线 日本日屌日屄导航视频 在线福利视频 日本推油无码av magnet 在线免费视频 樱井梨吮东 日本一本道在线无码DVD 日本性感诱惑美女做爱阴道流水视频 日本一级av 汤姆avtom在线视频 台湾佬中文娱乐线20 阿v播播下载 橙色影院 奴隶少女护士cg视频 汤姆在线影院无码 偷拍宾馆 业面紧急生级访问 色和尚有线 厕所偷拍一族 av女l 公交色狼优酷视频 裸体视频AV 人与兽肉肉网 董美香ol 花井美纱链接 magnet 西瓜影音 亚洲 自拍 日韩女优欧美激情偷拍自拍 亚洲成年人免费视频 荷兰免费成人电影 深喉呕吐XXⅩX 操石榴在线视频 天天色成人免费视频 314hu四虎 涩久免费视频在线观看 成人电影迅雷下载 能看见整个奶子的香蕉影院 水菜丽百度影音 gwaz079百度云 噜死你们资源站 主播走光视频合集迅雷下载 thumbzilla jappen 精品Av 古川伊织star598在线 假面女皇vip在线视频播放 国产自拍迷情校园 啪啪啪公寓漫画 日本阿AV 黄色手机电影 欧美在线Av影院 华裔电击女神91在线 亚洲欧美专区 1日本1000部免费视频 开放90后 波多野结衣 东方 影院av 页面升级紧急访问每天正常更新 4438Xchengeren 老炮色 a k福利电影 色欲影视色天天视频 高老庄aV 259LUXU-683 magnet 手机在线电影 国产区 欧美激情人人操网 国产 偷拍 直播 日韩 国内外激情在线视频网给 站长统计一本道人妻 光棍影院被封 紫竹铃取汁 ftp 狂插空姐嫩 xfplay 丈夫面前 穿靴子伪街 XXOO视频在线免费 大香蕉道久在线播放 电棒漏电嗨过头 充气娃能看下毛和洞吗 夫妻牲交 福利云点墦 yukun瑟妃 疯狂交换女友 国产自拍26页 腐女资源 百度云 日本DVD高清无码视频 偷拍,自拍AV伦理电影 A片小视频福利站。 大奶肥婆自拍偷拍图片 交配伊甸园 超碰在线视频自拍偷拍国产 小热巴91大神 rctd 045 类似于A片 超美大奶大学生美女直播被男友操 男友问 你的衣服怎么脱掉的 亚洲女与黑人群交视频一 在线黄涩 木内美保步兵番号 鸡巴插入欧美美女的b舒服 激情在线国产自拍日韩欧美 国语福利小视频在线观看 作爱小视颍 潮喷合集丝袜无码mp4 做爱的无码高清视频 牛牛精品 伊aⅤ在线观看 savk12 哥哥搞在线播放 在线电一本道影 一级谍片 250pp亚洲情艺中心,88 欧美一本道九色在线一 wwwseavbacom色av吧 cos美女在线 欧美17,18ⅹⅹⅹ视频 自拍嫩逼 小电影在线观看网站 筱田优 贼 水电工 5358x视频 日本69式视频有码 b雪福利导航 韩国女主播19tvclub在线 操逼清晰视频 丝袜美女国产视频网址导航 水菜丽颜射房间 台湾妹中文娱乐网 风吟岛视频 口交 伦理 日本熟妇色五十路免费视频 A级片互舔 川村真矢Av在线观看 亚洲日韩av 色和尚国产自拍 sea8 mp4 aV天堂2018手机在线 免费版国产偷拍a在线播放 狠狠 婷婷 丁香 小视频福利在线观看平台 思妍白衣小仙女被邻居强上 萝莉自拍有水 4484新视觉 永久发布页 977成人影视在线观看 小清新影院在线观 小鸟酱后丝后入百度云 旋风魅影四级 香蕉影院小黄片免费看 性爱直播磁力链接 小骚逼第一色影院 性交流的视频 小雪小视频bd 小视频TV禁看视频 迷奸AV在线看 nba直播 任你在干线 汤姆影院在线视频国产 624u在线播放 成人 一级a做爰片就在线看狐狸视频 小香蕉AV视频 www182、com 腿模简小育 学生做爱视频 秘密搜查官 快播 成人福利网午夜 一级黄色夫妻录像片 直接看的gav久久播放器 国产自拍400首页 sm老爹影院 谁知道隔壁老王网址在线 综合网 123西瓜影音 米奇丁香 人人澡人人漠大学生 色久悠 夜色视频你今天寂寞了吗? 菲菲影视城美国 被抄的影院 变态另类 欧美 成人 国产偷拍自拍在线小说 不用下载安装就能看的吃男人鸡巴视频 插屄视频 大贯杏里播放 wwwhhh50 233若菜奈央 伦理片天海翼秘密搜查官 大香蕉在线万色屋视频 那种漫画小说你懂的 祥仔电影合集一区 那里可以看澳门皇冠酒店a片 色自啪 亚洲aV电影天堂 谷露影院ar toupaizaixian sexbj。com 毕业生 zaixian mianfei 朝桐光视频 成人短视频在线直接观看 陈美霖 沈阳音乐学院 导航女 www26yjjcom 1大尺度视频 开平虐女视频 菅野雪松协和影视在线视频 华人play在线视频bbb 鸡吧操屄视频 多啪啪免费视频 悠草影院 金兰策划网 (969) 橘佑金短视频 国内一极刺激自拍片 日本制服番号大全magnet 成人动漫母系 电脑怎么清理内存 黄色福利1000 dy88午夜 偷拍中学生洗澡磁力链接 花椒相机福利美女视频 站长推荐磁力下载 mp4 三洞轮流插视频 玉兔miki热舞视频 夜生活小视频 爆乳人妖小视频 国内网红主播自拍福利迅雷下载 不用app的裸裸体美女操逼视频 变态SM影片在线观看 草溜影院元气吧 - 百度 - 百度 波推全套视频 国产双飞集合ftp 日本在线AV网 笔国毛片 神马影院女主播是我的邻居 影音资源 激情乱伦电影 799pao 亚洲第一色第一影院 av视频大香蕉 老梁故事汇希斯莱杰 水中人体磁力链接 下载 大香蕉黄片免费看 济南谭崔 避开屏蔽的岛a片 草破福利 要看大鸡巴操小骚逼的人的视频 黑丝少妇影音先锋 欧美巨乳熟女磁力链接 美国黄网站色大全 伦蕉在线久播 极品女厕沟 激情五月bd韩国电影 混血美女自摸和男友激情啪啪自拍诱人呻吟福利视频 人人摸人人妻做人人看 44kknn 娸娸原网 伊人欧美 恋夜影院视频列表安卓青青 57k影院 如果电话亭 avi 插爆骚女精品自拍 青青草在线免费视频1769TV 令人惹火的邻家美眉 影音先锋 真人妹子被捅动态图 男人女人做完爱视频15 表姐合租两人共处一室晚上她竟爬上了我的床 性爱教学视频 北条麻妃bd在线播放版 国产老师和师生 magnet wwwcctv1024 女神自慰 ftp 女同性恋做激情视频 欧美大胆露阴视频 欧美无码影视 好女色在线观看 后入肥臀18p 百度影视屏福利 厕所超碰视频 强奸mp magnet 欧美妹aⅴ免费线上看 2016年妞干网视频 5手机在线福利 超在线最视频 800av:cOm magnet 欧美性爱免播放器在线播放 91大款肥汤的性感美乳90后邻家美眉趴着窗台后入啪啪 秋霞日本毛片网站 cheng ren 在线视频 上原亚衣肛门无码解禁影音先锋 美脚家庭教师在线播放 尤酷伦理片 熟女性生活视频在线观看 欧美av在线播放喷潮 194avav 凤凰AV成人 - 百度 kbb9999 AV片AV在线AV无码 爱爱视频高清免费观看 黄色男女操b视频 观看 18AV清纯视频在线播放平台 成人性爱视频久久操 女性真人生殖系统双性人视频 下身插入b射精视频 明星潜规测视频 mp4 免賛a片直播绪 国内 自己 偷拍 在线 国内真实偷拍 手机在线 国产主播户外勾在线 三桥杏奈高清无码迅雷下载 2五福电影院凸凹频频 男主拿鱼打女主,高宝宝 色哥午夜影院 川村まや痴汉 草溜影院费全过程免费 淫小弟影院在线视频 laohantuiche 啪啪啪喷潮XXOO视频 青娱乐成人国产 蓝沢润 一本道 亚洲青涩中文欧美 神马影院线理论 米娅卡莉法的av 在线福利65535 欧美粉色在线 欧美性受群交视频1在线播放 极品喷奶熟妇在线播放 变态另类无码福利影院92 天津小姐被偷拍 磁力下载 台湾三级电髟全部 丝袜美腿偷拍自拍 偷拍女生性行为图 妻子的乱伦 白虎少妇 肏婶骚屄 外国大妈会阴照片 美少女操屄图片 妹妹自慰11p 操老熟女的b 361美女人体 360电影院樱桃 爱色妹妹亚洲色图 性交卖淫姿势高清图片一级 欧美一黑对二白 大色网无毛一线天 射小妹网站 寂寞穴 西西人体模特苍井空 操的大白逼吧 骚穴让我操 拉好友干女朋友3p