This is an open-access article distributed under the terms of the
Benign and malignant vocal fold lesions can alter voice quality and lead to significant morbidity or, in the case of malignancy, mortality. Early, noninvasive identification of these lesions using voice as a biomarker may improve diagnostic access and outcomes. In this study, we analyzed data from the initial release of the Bridge2AI-Voice dataset to evaluate which acoustic features best distinguish laryngeal cancer and benign vocal fold lesions from other vocal pathologies and healthy voice function. Seven diagnostic cohorts were grouped into two analyses: the first included participants with laryngeal cancer, benign lesions, or no voice disorder; the second included those with laryngeal cancer or benign lesions without other voice disorders, as well as individuals with spasmodic dysphonia or vocal fold paralysis. Acoustic features including fundamental frequency, jitter, shimmer, and harmonic-to-noise ratio (HNR) were extracted from standardized speech recordings and compared using nonparametric statistical methods. Among the overall sample, significant differences were identified in HNR and fundamental frequency between benign lesions and both healthy controls and laryngeal cancer. In cisgender men, these distinctions were also observed, particularly in HNR and its variability. No statistically significant differences were observed among cisgender women, likely due to the limited sample size. These findings suggest that HNR, particularly its variability, may hold promise as a voice-based marker for early detection and monitoring of vocal fold lesions. Further research with larger, more diverse populations is needed to refine these features and validate their clinical utility.
香京julia种子在线播放
As part of the National Institutes of Health (NIH) Bridge to Artificial Intelligence (Bridge2AI) consortium (
Voice disorders are defined as impairments in the pitch, loudness, or quality of voice that interfere with communication and social participation (
Benign vocal fold lesions can affect human voices and cause morbidity, whereas malignant lesions can cause morbidity and mortality if not treated (
When attempting to detect the presence of vocal lesions, it is essential to determine whether or not the participant has a concordant vocal disorder (
The Project Aim is to examine which acoustic features best distinguish laryngeal cancer and benign vocal cord lesions from other vocal pathologies and healthy laryngeal function utilizing the Bridge2AI-Voice v1.1 dataset (
Closely related is jitter, which is used to measure fluctuations in fundamental frequency. Local jitter is the difference between two consecutive periods (i.e., the length of time to complete one sound wave cycle) divided by the mean period. Higher local jitter percentages correspond to lower control of vocal cord vibration and are regularly found in patients with vocal pathologies (
Similarly, shimmer measures fluctuations in the amplitude of sound waves. High shimmer measurements are perceived as breathiness and are correlated with glottal resistance, which can be caused by lesions that interfere with vocal cord movement. For this analysis, we extracted the mean local shimmer, which is the mean difference in consecutive sound wave amplitudes in decibels (dB).
Finally, the harmonic-to-noise ratio (HNR) is the ratio of the periodic to aperiodic component in a speech signal. The periodic component stems from regular glottal pulses during phonation, while the aperiodic component is the noise produced from turbulence as air flows through the glottis. A possible source of this turbulence is the improper closing of the vocal cords (
The selection of these features was based on the findings of previous related work. For example, Dr. Tom Karlsen and colleagues found that jitter, shimmer, and noise to harmonic ratio were larger among laryngeal cancer patients than among controls using
The dataset used for this project was the Bridge2AI-Voice v1.0, the initial release, provides 12,523 recordings for 306 participants collected across five sites in North America (
In exploring the potential for a biomarker of vocal cord lesions, we had two related but different clinical objectives. First, we wished to identify acoustic features that could distinguish the voices of participants
Since the lesion present with no other voice disorder cohorts were subsets of the lesion present cohorts, thereby introducing statistically dependent cohorts, hypothesis testing was conducted in two groups to ensure the diagnostic cohorts within them contained mutually independent observations. Group 1 consists of recordings for participants with: laryngeal cancer (
Participant grouping by lesion type and vocal disorder diagnosis.
Prior to comparing distributions of acoustic features among these cohorts, basic demographic information was analyzed and compared across the lesion-absent and lesion-present cohorts to detect potential biases (
Demographics and clinical characteristics, grouped by presence of vocal fold lesions.
Characteristic | Overall | Lesion absent | Lesion present | |
---|---|---|---|---|
176 | 153 | 23 | ||
Age (years), median | 59.0 | 59.0 | 60.0 | 0.260 |
Weight (lbs), median | 169.0 | 164.0 | 184.0 | 0.110 |
Gender Identity, |
||||
Female | 110 (63.2) | 97 (64.2) | 13 (56.5) | 0.546 |
Male | 59 (33.9) | 52 (34.4) | 7 (30.5) | |
Non-binary or genderqueer | 2 (1.1) | 2 (1.3) | 0 (0.0) | |
Sexual orientation, |
||||
Bisexual | 8 (4.6) | 8 (5.3) | 0 (0.0) | |
Heterosexual | 138 (79.3) | 119 (78.0) | 19 (82.6) | |
Homosexual | 7 (4.0) | 6 (3.9) | 1 (4.3) | |
Other | 5 (2.9) | 5 (3.3) | ||
No answer | 16 (9.2) | 13 (8.6) | 3 (13.0) | |
Race, |
0.149 | |||
American Indian or Alaska Native | 1 (0.6) | 1 (0.7) | ||
Asian | 7 (4.0) | 7 (4.6) | ||
Black or African American | 11 (6.3) | 7 (4.6) | 4 (17.4) | |
White | 140 (80.5) | 124 (82.1) | 16 (69.6) | |
Multiracial | 6 (3.4) | 5 (3.3) | 1 (4.3) | |
No answer | 3 (1.7) | 2 (1.3) | 1 (4.3) | |
Other | 6 (3.4) | 5 (3.3) | 1 (4.3) | |
Ethnicity, |
0.794 | |||
Hispanic or Latino | 17 (9.8) | 16 (10.6) | 1 (4.3) | |
Not Hispanic or Latino | 148 (85.1) | 127 (84.1) | 21 (91.3) | |
No answer | 9 (5.2) | 8 (5.3) | 1 (4.3) |
Acoustic features were extracted from recordings for the Rainbow Passage task, a paragraph containing all phonemes in American English commonly used as an assessment by speech pathologists. Acoustic features for these recordings were pre-extracted and included in the Bridge2AI dataset by default. They were obtained using openSMILE (
Features examined for analysis were mean HNR, the standard deviation of harmonic-to-noise ratio (HNR SD), mean local jitter, mean local shimmer, and mean fundamental frequency. Analysis was initially conducted collectively for all participants. First, a Kruskal–Wallis test was used to assess differences within Group 1 and then within Group 2 for each acoustic feature. If statistically significant differences were detected (
Statistical tests were conducted in Python (3.10.14) using SciPy (1.13.1) (
For the analysis representing all 176 participants, statistically significant differences were found between the benign CL and NVD cohorts in their distributions of mean HNR (
Dunn's test results group 1 pairings (unstratified data).
Acoustic feature | Pairing | |
---|---|---|
Mean HNR | Laryngeal cancer, benign C.L. | 0.095 |
Mean HNR | Laryngeal cancer, no voice disorder | 0.914 |
Mean HNR | ||
Standard deviation HNR | ||
Standard deviation HNR | Laryngeal cancer, no voice disorder | 0.256 |
Standard deviation HNR | ||
Mean F0 | Laryngeal cancer, benign C.L. | 0.335 |
Mean F0 | Laryngeal cancer, no voice disorder | 0.429 |
Mean F0 |
Bolded values indicate statistical significance at the
The number of recordings for each diagnostic cohort, stratified by diagnostic cohort, is shown in
Number of recordings for cisgender men and women, by diagnostic cohort.
Diagnostic group | # Cisgender women recordings | # Cisgender men recordings |
---|---|---|
Laryngeal cancer | 6 | 4 |
Benign CL | 7 | 6 |
No voice disorder | 77 | 36 |
Laryngeal cancer (NOVD) | 2 | 4 |
Benign CL (NOVD) | 6 | 5 |
Spasmodic Dysphonia + no lesion | 6 | 2 |
UVFP + no lesion | 17 | 9 |
Dunn's test results for group 1 pairings with only cisgender male participants.
Acoustic feature | Pairing | |
---|---|---|
Mean HNR | Laryngeal cancer, benign C.L. | 0.192 |
Mean HNR | Laryngeal cancer, no voice disorder | 0.512 |
Mean HNR | ||
Standard deviation HNR | ||
Standard deviation HNR | Laryngeal cancer, no voice disorder | 0.863 |
Standard deviation HNR |
Bolded values indicate statistical significance at the
No statistically significant differences were found among cisgender women for all acoustic features examined.
Our preliminary analysis of the Bridge2AI-Voice dataset shows early promise that there are vocal features that can act as a biomarker for vocal fold lesions. Other recent studies have shown links between benign and malignant vocal fold lesions using principal component analysis (PCA), suggesting the utility of the PCA method in the identification of vibrational alterations in the acoustic characteristics of voice affected by lesions (
Despite the relatively small sample size, we detected statistically significant differences in acoustic features within our Group 1 cohort. Notably, the differences were most pronounced between the benign C.L. cohort and the NVD cohort.
Of particular interest is the difference in HNR SD between benign and malignant lesion groups, which suggests that HNR SD may be a useful measure for monitoring lesion progression and detecting laryngeal cancer at an early stage. This is a finding that will be interesting to test with larger datasets, and future studies can potentially leverage this to explain this relationship further. However, no statistically meaningful differences were found within Group 2, indicating that distinguishing lesions from other vocal pathologies may be more challenging.
The primary limitations of this study were the small sample size and participants' incomplete lesion histories. Despite these limitations, the study provides valuable insights into the potential for voice biomarkers to serve as early indicators of vocal fold lesions.
The most striking barrier for our selected features to be considered for a biomarker of vocal cord lesions is that, when we stratified our data by sex, we found no statistically significant differences among women for Groups 1 or 2. The power of these statistical tests was, of course, limited by the small sample sizes in some of these cohorts, most noticeably when comparing against the 2 cisgender women participants in the laryngeal cancer + no other vocal disorder cohort, as shown in
Additionally, voice disorders arising from a broader range of laryngeal diseases, such as spasmodic dysphonia, vocal fold paralysis, and functional dysphonia, carry significant morbidity and impair communication and quality of life (
While a definitive diagnosis still requires visualization, a validated AI-based voice screening tool could serve as a triage mechanism. It could identify individuals with subtle voice changes who may not otherwise seek care, especially in primary care or telehealth settings. Such a tool could prompt earlier referrals to voice specialists, help prioritize urgent cases, and reduce diagnostic delays. Unlike the human ear, which may not reliably distinguish between subtle pathologic changes, an AI model can offer consistent and scalable voice analysis across diverse populations.
Future studies should focus on increasing sample sizes and incorporating more nuanced data, such as lesion sizes. Additionally, the sex of participants played a role in the results, which should be considered in future recruitment efforts to prevent biased datasets. Further research should continue to explore different types of benign and malignant lesions by voice feature.
The original contributions presented in the study are included in the article/
The data collection and studies involving humans were approved by the University of South Florida IRB entitled STUDY004890: Bridge2AI Voice Data Acquisition. The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation was obtained from the participants or the participants' legal guardians/next of kin in accordance with the national legislation and institutional requirements.
PJ: Writing – original draft, Investigation, Writing – review & editing, Formal analysis, Methodology, Data curation, Conceptualization. RH: Formal analysis, Conceptualization, Methodology, Data curation, Writing – original draft, Writing – review & editing, Investigation, Software. SB: Project administration, Supervision, Methodology, Formal analysis, Writing – original draft, Writing – review & editing. LK: Writing – original draft, Formal analysis, Conceptualization, Data curation, Supervision, Writing – review & editing. WH: Methodology, Conceptualization, Writing – review & editing, Supervision, Writing – original draft.
University of South Florida, Tampa, FL, US: Yael Bensoussan. Weill Cornell Medicine, New York, NY, USA: Olivier Elemento. Weill Cornell Medicine, New York, NY, USA: Anais Rameau. Weill Cornell Medicine, New York, NY, USA: Alexandros Sigaras. Massachusetts Institute of Technology, Boston, MA, USA: Satrajit Ghosh. Vanderbilt University Medical Center, Nashville, TN, USA: Maria Powell. University of Montreal, Montreal, Quebec, Canada: Vardit Ravitsky. Simon Fraser University, Burnaby, BC, Canada: Jean Christophe Belisle-Pipon. Oregon Health & Science University, Portland, OR, USA: David Dorr. Washington University in St. Louis, St. Louis, MO, USA: Phillip Payne. University of Toronto, Toronto, Ontario, Canada: Alistair Johnson. University of South Florida, Tampa, FL, USA: Ruth Bahr. University of Florida, Gainesville, FL, USA: Donald Bolser. Dalhousie University, Toronto, ON, Canada: Frank Rudzicz. Mount Sinai Hospital, Sinai Health, University of Toronto, Toronto, ON, Canada: Jordan Lerner-Ellis. Boston Children's Hospital, Boston, MA, USA: Kathy Jenkins. University of Central Florida, Orlando, FL, USA: Shaheen Awan. University of South Florida, Tampa, FL, USA: Micah Boyer. Oregon Health & Science University, Portland, OR, USA: William Hersh. Washington University in St. Louis, St. Louis, MO, USA: Andrea Krussel. Oregon Health & Science University, Portland, OR, USA: Steven Bedrick. UT Health, Houston, TX, USA: Toufeeq Ahmed Syed. University of South Florida, Tampa, FL, USA: Jamie Toghranegar. University of South Florida, Tampa, FL, USA: James Anibal. New York, NY, USA: Duncan Sutherland. University of South Florida, Tampa, FL, USA: Enrique Diaz-Ocampo. University of South Florida, Tampa, FL, USA: Elizabeth Silberhoz Boston Children's Hospital, Boston, MA, USA: John Costello. Vanderbilt University Medical Center, Nashville, TN, USA: Alexander Gelbard. Vanderbilt University Medical Center, Nashville, TN, USA: Kimberly Vinson. University of South Florida, Tampa, FL, USA: Tempestt Neal. Mount Sinai Health, Toronto, ON, Canada: Lochana Jayachandran. The Hospital for Sick Children, Toronto, ON, Canada: Evan Ng. Mount Sinai Health, Toronto, ON, Canada: Selina Casalino. University of South Florida, Tampa, FL, USA: Yassmeen Abdel-Aty. University of South Florida, Tampa, FL, USA: Karim Hanna. University of South Florida, Tampa, FL, USA: Theresa Zesiewicz. Florida Atlantic University, Boca Raton, FL, USA: Elijah Moothedan. University of South Florida, Tampa, FL, USA: Emily Evangelista. Vanderbilt University Medical Center, Nashville, TN, USA: Samantha Salvi Cruz. Weill Cornell Medicine, New York, NY, USA: Robin Zhao. University of South Florida, Tampa, FL, USA: Mohamed Ebraheem. University of South Florida, Tampa, FL, USA: Karlee Newberry. University of South Florida, Tampa, FL, USA: Iris De Santiago. University of South Florida, Tampa, FL, USA: Ellie Eiseman. University of South Florida, Tampa, FL, USA: JM Rahman. Boston Children's Hospital, Boston, MA, USA: Stacy Jo. Hospital for Sick Children, Toronto, ON, Canada: Anna Goldenberg.
The author(s) declare that financial support was received for the research and/or publication of this article. This study was funded in part by the NIH Common Fund through the Bridge2AI program, award OT2OD032720.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The author(s) declare that no Generative AI was used in the creation of this manuscript.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
The Supplementary Material for this article can be found online at: