Edited by: Mark Trombetta, Allegheny Health Network, United States
Reviewed by: Lorenzo Ugga, University of Naples Federico II, Italy; Ghazaleh Amjad, Iran University of Medical Sciences, Iran
*Correspondence: Valeria Landoni,
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
This study aimed to develop a clinical–radiomic model based on radiomic features extracted from digital breast tomosynthesis (DBT) images and clinical factors that may help to discriminate between benign and malignant breast lesions.
A total of 150 patients were included in this study. DBT images acquired in the setting of a screening protocol were used. Lesions were delineated by two expert radiologists. Malignity was always confirmed by histopathological data. The data were randomly divided into training and validation set with an 80:20 ratio. A total of 58 radiomic features were extracted from each lesion using the LIFEx Software. Three different key methods of feature selection were implemented in Python: (1) K best (KB), (2) sequential (S), and (3) Random Forrest (RF). A model was therefore produced for each subset of seven variables using a machine-learning algorithm, which exploits the RF classification based on the Gini index.
All three clinical–radiomic models show significant differences (p < 0.05) between malignant and benign tumors. The area under the curve (AUC) values of the models obtained with three different feature selection methods were 0.72 [0.64,0.80], 0.72 [0.64,0.80] and 0.74 [0.66,0.82] for KB, SFS, and RF, respectively.
The clinical–radiomic models developed by using radiomic features from DBT images showed a good discriminating power and hence may help radiologists in breast cancer tumor diagnoses already at the first screening.
香京julia种子在线播放
Breast cancer (BC) is the tumor with the highest incidence worldwide. With 2.3 million new cases estimated in 2020, it represents the 1.7% of new cancer diagnoses and is therefore the most frequently diagnosed according to Global Cancer Statistics 2020 (
In current radiological practice, mammographic, ultrasonographic, or magnetic resonance imaging (MRI) evaluation of tumors is largely qualitative and includes subjective evaluations such as tumor aspect (spiculated, rounded, with necrosis, microcalcification), density, type of enhancement and anatomic relationship to the surrounding tissues in order to inform further treatment (
The inclusion of standard digital imaging among the possible sources of big data for precision medicine represents one of the new frontiers of research. Particularly, radiomics (
There are few studies concerning the analysis of DBT images and eventually the introduction into clinical practice of methods of automatic cancer detection (
In the present study, patients who were subjected to tomosynthesis exams were enrolled; DBT imaging was performed at the Breast Unit in the Department of Radiology and Diagnostic Imaging by using the Giotto® CLASS mammography unit. Images were transferred from the picture archiving communication system (PACS) to a dedicated MIM-Maestro system (MIM Software INC.) in which the lesion was identified by the radiologists. This study was approved by the IRCCS Regina Elena Cancer Institute Ethics Committee (CEI number: RS1414/20(2408)). The requirement for obtaining informed consent was waived as it was a retrospective study.
In this study, 150 patients who underwent DBT scans were enrolled, 80 of whom had lesions classified as malignant and 70 benign. Lesions radiologically classified as malignant were subsequently confirmed by pathologic analysis.
Patients were randomly collected among those undergoing DBT at our hospital from May 2021 to May 2022, and their characteristics were quite extensively distributed as shown in
Clinical characteristics.
Age (mean ± SD) | Benign (70) |
Malignant |
---|---|---|
Density: | ||
|
15 | 27 |
|
26 | 33 |
|
25 | 17 |
|
4 | 3 |
Bi-Rads: | ||
|
38 | 0 |
|
26 | 12 |
|
6 | 50 |
|
0 | 18 |
Parameters for performing DBT scans were selected automatically by the automatic exposure control (AEC) at fixed Target/filter combination (W/Ag 50 ± 5 µm). The images resolution was 2925 × 1342 pixels per each 1 mm reconstructed slice. The initial images reading was performed on a workstation with diagnostic quality monitors (BARCO 5 MP).
The data were randomly divided into the training and validation sets in a ratio of 80:20. In the subdivision process, attention was paid to maintaining the predetermined relationship between patients of one group or the other.
Lesions were always identified by two expert radiologists (more than 10 years of experience). In the case of a very irregular shape, often a malignant lesion, the radiologist manually performed the contouring. Otherwise, a semi-automatic contouring method was applied. In both cases, the most representative DBT slice was chosen according to the radiologist’s indication.
An example of delineated lesions is shown in
The algorithm, called “2D Edge”, is part of the MIM-Maestro system (MIM Software INC.). The density gradient method was used to draw a particular region of the image previously identified by the operator. To assess the robustness of the algorithm, a specific lesion of 2.84 cm2 manually delineated by an expert radiologist was also automatically contoured 20 times, resulting in a median area value of 2.83 cm2 (range 2.69–2.99). Furthermore, a qualitative validation regarding the shape was performed.
Patient clinical data, such as age, breast density, and Breast Imaging–Reporting and Data System (BI-RADS) scores were collected in a dedicated database specifically created with Microsoft Access software.
The lesion-associated anatomical and pathological data were obtained from the characterization following the biopsy, and data, such as estrogen, progesterone, human epidermal growth factor raptor 2 (HER2) and Ki67 (
Malignant or benign status of the lesions was defined according to a breast screening report [NHSBSP
B1 not adequate or not representative/probable the lesion was not taken.
B2 benign.
B3 with atypia but probably benign.
B4 with suspected atypia but not diagnostic for malignancy.
B5 malignant (B5a carcinoma
The images and the contours of the lesion exported from MIM were transferred to an open access software that allowed the extraction of radiomic features: LIFEx (
A total of 58 features were extracted from the original images: (1) five in the shape category, (2) 22 first-order statistical features and (3) 31 textural (n=6 Gray Level Co-occurrence Matrix [GLCM] + 11 Gray Level Run Length Matrix [GLRM] + 3 Neighboring Gray Level Dependence Matrix [NGLDM] + 11 Gray Level Size Zone Matrix [GLZLM]). All extracted features were obtained from the original image without any kind of filter. LIFEx output was an Excel file containing for each row all the variables extracted from one lesion analysis.
The clinical–radiomic model was constructed by combining age and density with the 58 cited features associated with each lesion. The ratio between training and validation set was chosen trying to balance the two groups (
The number of features suitable for representing the population was chosen considering the dimension and the variability of the sample. Usually, it is considered a good practice to take a number of features in the ratio 1:9 respect to the sample size, to avoid possible overfitting seven features were selected from the initial 60 to build the model. Three different key methods of feature selection have been implemented in Python and included K best (KB), sequential (S), and Random Forest (RF).
The KB is based on a filter method (
To estimate the degree of linearity between the input features (such as predictor of malignancy) and the output feature, the analysis of variance (ANOVA) F-value method was implemented. To avoid issues with outliers and violations of distributional assumptions, all features were previously normalized using a normal transformation of the ranks. However, any non-linear relationships cannot be detected by ANOVA F-value. Hence in the S method, to avoid and capture also non-linear relationships between input and output features a Mutual information (MI) algorithm was implemented (
The main weakness of filter methods is the lack of consideration of the relationships among features. To obtain a robust model but at the same time not overburden it, it is necessary to discard the information that turns out to be overwhelming. In fact, if two characteristics are strongly correlated, it is sufficient to consider only one for the construction of the final model. This information can be derived by creating a correlation matrix between the characteristics.
The last feature selection method used is an Embedded method, RF (
Three models were therefore produced, one for each subset of variables, through a machine-learning algorithm, implemented in python language, which exploits the Random Forrest classification based on the Gini index (
The Mann–Whitney (
The goodness of the three models obtained was compared by analyzing the area under the curve (AUC) of the receiver operating characteristic (ROC) curve (
Scheme of the overall pipeline of this study.
In
In
Result of the F-test on the input variables when using the KB filter method.
A high F-value indicates high degree of linearity, and a low F-value indicates a low degree. The presence of some promising variables (such as age and GLCM_Correlation) and others not correlated with the dichotomous output variable was immediately visible.
In the S filter method, MI measures the dependence, also non-linear, of one variable to another by quantifying the amount of information obtained about one feature through the other. MI is symmetric and non-negative; it is zero only if the input and output features are independent (
Mutual Information scores between output and input variables when using the S filter method.
A correlation matrix, such as the one shown in
Correlation matrix of the complete set of variables, clinical plus radiomic.
In
Accuracy plot showed with respect of the subset of features considered.
Finally, the RF method was used.
In
Correlation matrices of the 7 features obtained by KB, S and RF selection method are shown.
Based on the Mann-Whitney test, the following features resulted to be significantly different (p < 0.05) between benign and malignant lesions: (1) age, (2) density, (3) CONVENTIONAL_HUKurtosis, (4) CONVENTIONAL_HUExcessKurtosis, (5) GLCM correlation, (6) GLRLM_LRLGE, (7) GLRLM_SZE, and (8) GLRLM_SZHGE.
In
Boxplot together with the p value of 4 most representative selected features.
In
Radiomic features provided by the three models showed significant differences (p < 0.05) between malignant and benign lesions.
In
The receiver operating characteristic (ROC) curves of all the models. Area under the curve (AUC) is highlighted.
In a recently published paper (
Due to its widespread among hospitals and low economic impact, in Italy DBT is used as a first level screening and patients are eventually directed to MRI according to the radiologist’s opinion. So, even if the two methods can probably be complementary (
For this reason, we aimed to develop a model that could help radiologists in their first level diagnosis and eventually to address the patient to further exams.
Having a dataset with high dimensionality a process of feature selection is mandatory to avoid oversampling. In fact, high-dimensional datasets are not preferred because they have lengthy training time and have high risk of overfitting. Feature selection helps to mitigate these problems by selecting features that have high importance to the model so that the data dimensionality can be reduced without much loss of the total information.
In this study, three feature selection methods were used, and consequently, three predictive models were derived. The resulting diagnostic performances of the 3 models are quite similar. The model derived from the RF Selector showed a slightly better performance with respect to the KB and S Selectors yielding AUC values of 0.740 [CI 0.662–0.819], 0.716 [CI 0.635–0.797], and 0.722 [CI 0.641–0.802], respectively.
The best diagnostic performance of the derived models is in accordance with other studies (
It is worth noting that due to the fact that patients enrolled in our study belong to a screening protocol, the mean age was 60.61 ± 15.51 years. This is probably the reason why patients with benign lesions, usually younger, appear to have a higher parenchymal density score. In fact, in our population a significant inverse correlation was found between age and density (p < 0.0001) indicating that age somehow is disguising the density effect.
Some limitations of this study need to be highlighted.
Even if we evaluated the reproducibility of the delineation, feature stability inside repeated contours has not been assessed.
Moreover, in the feature selection step the variance has not been taken into account. In fact, due to the characteristics of the studied population some relevant features would have been excluded, as it was retrospectively investigated.
In addition, we derived our models from a relatively small sample size that could be hopefully augmented. Also, the choice of selecting exams performed by the same mammographer is limiting, and more data from patients enrolled in screening protocols in our institute could be exploited thus overcoming differences in DBT scanner by performing data harmonization. A better model could be constructed using an external validation set. For these reasons, we are designing a wider clinical trial in which, besides including in the delineation also of the peritumoral area, more hospitals will be involved with the aim of building a model that can be shared and whose robustness can be proven among different users.
In conclusion, according to the results obtained in our study, we think that the derived models could be considered as an aid to the radiologist in the diagnosis of breast tumor, at least at a first level screening, due to the good performance shown by the constructed models.
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
The studies involving human participants were reviewed and approved by Comitato Etico Centrale IRCCS Lazio Sezione IFO. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.
FM and VL: study design, FM and VL: study conduct, LG and FF: tumor delineation and radiological assessment, AR and LP: anatomo-pathological data support, PO database implementation, FM: data collection, FM and VL: data processing, FM and VL: statistical data analysis, FM and VL: drafting manuscript, AV and PO: manuscript revision. All authors contributed to the article and approved the submitted version.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
The Supplementary Material for this article can be found online at: