Clinical use of the CAPE-V scales: Agreement, reliability & notes on voice quality
Nagle, K.F. (in press). Journal of Voice.
The CAPE-V is a widely used protocol developed to help standardize the evaluation of voice. Variability of voice quality ratings has prevented development of training protocols that might themselves improve interrater agreement among new clinicians. As part of a larger mixed methods project, this study examines agreement and reliability for experienced clinicians using the CAPE-V scales. Experienced voice clinicians (N=20) provided ratings of recordings from 12 speakers representing a range of overall voice quality. Participants were instructed to rate the voices as they normally would, using the CAPE-V scales. Descriptive data were recorded and two levels of agreement were calculated. Single rater reliability was calculated using a 2-way random model of absolute agreement for intraclass correlations (ICC [2,1]). Participants’ use of the CAPE-V scales varied considerably, although most rated overall severity, breathiness, roughness and strain. Data from one participant did not meet a priori agreement criteria. Because outcomes were significantly different without their data, agreement and reliability were analyzed based on the reduced data set from 19 participants. Interrater agreement and reliability were comparable to previous research; the mean range of ratings was at least 47mm for all dimensions of voice quality. Results indicated differential use of the components of the CAPE-V form and scales in evaluating voice quality and severity of dysphonia, including categorical variability among ratings of all of the primary CAPE-V dimensions of voice quality that may complicate the clinical description of a voice as mildly, moderately or severely dysphonic.
Influence of phonatory break duration on auditory-perceptual ratings of speech acceptability and listener comfort in adductor-type laryngeal dystonia.
Doyle, P.C., Woldmo, R., Nagle, K.F., Crews, N. & Jovanovic, N. (2021). Journal of Voice. https://doi.org/10.1016/j.jvoice.2021.10.025
Abstract: This study empirically evaluated the influence of phonatory break duration on auditory-perceptual measures of speech produced by 26 adult speakers diagnosed with adductor-type laryngeal dystonia (AdLD). Fifteen inexperienced, young adult normal-hearing listeners provided ratings of speech acceptability and listener comfort for samples of running speech. Four phonatory break timing conditions were assessed using visual analog scaling methods. All stimuli were randomized for presentation and listeners were presented with experimental stimuli in a counterbalanced manner. Results indicate that the duration of phonatory breaks directly influenced listener ratings of speech acceptability (p<.001) and listener comfort (p<.001), with significant differences between original and modified recordings for both. Speech acceptability and listener comfort ratings were strongly correlated across all timing conditions (r = .85-.97). The duration of phonatory breaks and pauses have significant influence on judgments of speech acceptability and listener comfort for AdLD. This suggests that temporal factors such as phonatory break duration and pause time in AdLD may carry substantial negative impact on listeners’ perception relative to other auditory-perceptual features that co-exist in the signal.
Effect of noise on speech intelligibility & perceived listening effort in head & neck cancer.
Eadie, T., Durr, H., Sauder, C., Nagle, K.F., Kapsner-Smith, M. & Spencer, K. (2021). American Journal of Speech Language Pathology. https://doi.org/10.1044/2020_AJSLP-20-00149
Abstract: This study (a) examined the effect of different levels of background noise on speech intelligibility and perceived listening effort in speakers with impaired and intact speech following treatment for head and neck cancer (HNC) and (b) determined the relative contribution of speech intelligibility, speaker group, and background noise to a measure of perceived listening effort. Ten speakers diagnosed with nasal, oral, or oropharyngeal HNC provided audio recordings of six sentences from the Sentence Intelligibility Test. All speakers were 100% intelligible in quiet: Five speakers with HNC exhibited mild speech imprecisions (speech impairment group), and five speakers with HNC demonstrated intact speech (HNC control group). Speech recordings were presented to 30 inexperienced listeners, who transcribed the sentences and rated perceived listening effort in quiet and two levels (+7 and +5 dB SNR) of background noise. Significant Group × Noise interactions were found for speech intelligibility and perceived listening effort. While no differences in speech intelligibility were found between the speaker groups in quiet, the results showed that, as the signal-to-noise ratio decreased, speakers with intact speech (HNC control) performed significantly better (greater intelligibility, less perceived listening effort) than those with speech imprecisions in the two noise conditions. Perceived listening effort was also shown to be associated with decreased speech intelligibility, imprecise speech, and increased background noise. Speakers with HNC who are 100% intelligible in quiet but who exhibit some degree of imprecise speech are particularly vulnerable to the effects of increased background noise in comparison to those with intact speech. Results have implications for speech evaluations, counseling, and rehabilitation.
Perceptual and acoustic assessment of strain using synthetically modified voice samples.
Park, Y., Diaz-Cadiz, M., Nagle, K.F. & Stepp, C. (2020). Journal of Speech Language & Hearing Research. https://doi.org/10.1044/2020_JSLHR-20-00294
Abstract: Assessment of strained voice quality is difficult due to the weak reliability of auditory-perceptual evaluation and lack of strong acoustic correlates. This study evaluated the contributions of relative fundamental frequency (RFF) and mid-to-high frequency noise to the perception of strain. Stimuli were created using recordings of speakers producing /ifi/ with a comfortable voice and with maximum vocal effort. RFF values of the comfortable voice samples were synthetically lowered, and RFF values of the maximum vocal effort samples were synthetically raised. Mid-to-high frequency noise was added to the samples. Twenty listeners rated strain in a visual sort-and-rate task. The effects of RFF modification and added noise on strain were assessed using an analysis of variance; intra- and interrater reliability were compared with and without noise. Lowering RFF in the comfortable voice samples increased their perceived strain, whereas raising RFF in the maximum vocal effort samples decreased their strain. Adding noise increased strain and decreased intra- and interrater reliability relative to samples without added noise. Both RFF and mid-to-high frequency noise contribute to the perception of strain. The presence of dysphonia may decrease the reliability of auditory-perceptual evaluation of strain, which supports the need for complementary objective assessments.
Elements of clinical training with the electrolarynx.
Nagle, K.F. (2019) In P. Doyle (Ed.), Clinical care and rehabilitation in head and neck cancer (pp 129-143). Cham, Switzerland: Springer.
Abstract: The electromechanical device commonly known as an electrolarynx (EL) is a popular primary or backup mode of postlaryngectomy alaryngeal communication. Learning to efficiently and successfully use an EL requires the acquisition of several skills, including: 1) appropriate placement of the device; 2) control of voice activation; 3) over-articulation and modulation of speech rate; and 4) awareness of paralinguistic behaviors. Mastering such skills can increase comprehensibility, and in turn, the potential for communicative success with the EL. Design features vary among commercially available devices, mostly in the type and degree of pitch modulation they offer. To optimize the ability of newer devices to modulate pitch, users may need specific practice directed toward enhancement of the suprasegmental aspects of their EL speech. This chapter addresses reviews current EL features and outlines how speech-language pathologists (SLP) can provide valuable training and insight for laryngectomees seeking to use this popular method of post-laryngectomy communication.
Perceived listener effort as an outcome measure for disordered speech.
Nagle, K.F. & Eadie, T.L. (2018). Journal of Communications Disorders, 73, 34-49.
Abstract: Perceived listening effort is a perceptual dimension used to identify the amount of work necessary to understand disordered speech. The purpose of this study was to investigate the utility of perceived listening effort to provide unique information about disordered speech. The relationships between perceived listening effort and two current outcome measures (speech acceptability, intelligibility) were examined for listeners rating electrolaryngeal speech, along with their reliability and intra-rater agreement. Ten healthy male speakers read low-context sentences using an electrolarynx. Twenty-five inexperienced listeners orthographically transcribed and rated the stimuli for perceived listening effort and speech acceptability using a visual analog scale. Strict reliability and agreement criteria were set. Perceived listening effort was moderately to strongly correlated with intelligibility (r = −0.76) and acceptability (r = −0.80), each of which contributed uniquely to ratings of perceived listening effort. However, only 17 listeners met stringent reliability and agreement criteria. Ratings of perceived listening effort may provide unique information about the communicative success of individuals with communication disorders. There is great variability, however, among inexperienced listeners’ perceptual ratings of electrolaryngeal speech. Future research should investigate variables that may affect perceived listening effort specifically and auditory-perceptual ratings in general.
Perceived naturalness of electrolaryngeal speech produced using sEMG-controlled vs. manual pitch modulation.
Nagle, K.F. & Heaton, J.T. (2016). Interspeech 2016. San Francisco, CA.
Abstract: Producing speech with natural prosodic patterns is an ongoing challenge for users of electrolaryngeal (EL) speech. This study describes speech produced using a method currently in development, wherein a prosodic pattern is derived from skin surface electromyographical (sEMG) signals recorded from under the chin (submental surface). Eight laryngectomees who currently use a TruTone EL as their primary or backup mode of speech provided samples of EL speech in two modes: conventional thumb-pressure pitch-modulated control (represented by the TruTone EL; Griffin Laboratories, CA, U.S.A.) and sEMG-based pitch-modulated control (EMG-EL). Ratings of perceived naturalness were obtained from ten listeners unfamiliar with EL speech. Listener ratings indicated that five speakers produced equally natural speech using both devices, and three produced significantly more natural speech using the EMG-EL than the TruTone EL. Mean fundamental frequency (f0) was similar within speakers for both modes; however, mean f0 range and standard deviation were significantly larger for the EMG-EL than for the TruTone EL, despite both devices having similar potential f0 range. This study showed that the EMG-EL provides an intuitive means of controlling f0-based prosodic patterns that are more natural-sounding than push-button control for some EL users.
Emerging Scientist: Challenges to CAPE-V as a Standard.
Nagle, K.F. (2016). Perspectives of the ASHA Special Interest Groups, 1, 47-53.
Abstract: The Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V; American Speech-Language-Hearing Association, 2002) outlines a protocol for obtaining voice samples and rating their voice quality. It was developed as a standard voice protocol based on expert consensus and psychophysically appropriate measurement of auditory perceptual qualities of voice. The CAPE-V has since obtained widespread research and clinical use, but research suggests considerable variability in how both expert and new clinicians use its rating scales. In this paper, I review remaining challenges to standardizing voice quality evaluation and describe ongoing research addressing these challenges.
Generating tonal distinctions in Mandarin Chinese using an electrolarynx with preprogrammed tone patterns.
Guo, L., Nagle, K.F. & Heaton, J.T. (2016). Speech Communication, 78, 34-41.
Abstract: An electrolarynx (EL) is a valuable rehabilitative option for individuals who have undergone laryngectomy, but current monotone ELs do not support controlled variations in fundamental frequency for producing tonal languages. The present study examined the production and perception of Mandarin Chinese using a customized hand-held EL driven by computer software to generate tonal distinctions (tonal EL). Four native Mandarin speakers were trained to articulate their speech coincidentally with preprogrammed tonal patterns in order to produce mono- and di-syllabic words with a monotone EL and tonal EL. Three native Mandarin speakers later transcribed and rated the speech samples for intelligibility and acceptability. Results indicated that words produced using the tonal EL were significantly more intelligible and acceptable than those produced using the monotone EL.