| ||||||||||||||||
|
Speech Intelligibility PapersGlossary of Terms |
| %ALcons | Diagnostic Rhyme Test | Masking | RT60 |
|
Percentage Articulation Loss of Consonants. This machine measure of intelligibility is closely associated with the TEF sound analyzer. It is computed from measurements of the Direct-to-Reverberant Ratio and the Early Decay Time using a set of correlations defined by SynAudCon, and is specified in percent. Since %ALcons expresses loss of consonant definition, lower values are associated with greater intelligibility. It is generally assumed that the maximum allowable value for typical paging applications is 10%, assuming that the environment is relatively free of masking noise. For learning environments and voice warning systems, the desired value is 5% or less.
Word Articulation refers to the number of test words correctly identified in an intelligibility test. It is expressed in percent. The term articulation also refers to the quality of a speaking persons enunciation. The greater a given talkers articulation (consonants are crisp and distinct, vowels are clearly articulated and not slurred), the more intelligible his or her speech will be.
One of the earliest attempts to measure by machine the intelligibility of a speech transmission system, the Articulation Index was developed by Bell Telephone Laboratories in the 1940s. AI is based on the idea that the response of a speech communication system can be divided into twenty frequency bands, each of which carries an independent contribution to the intelligibility of the system, and that the total contribution of all the bands is the sum of the contributions of the individual bands. (AI may also be measured using one-third octave or octave bands.) Signal-to-noise ratios are computed for each individual band, then weighted and combined to yield an intelligibility score. The AI varies in value from 0 (completely unintelligible) to 1 (perfect intelligibility). An AI of 0.3 or below is considered unsatisfactory, 0.3 to 0.5 satisfactory, 0.5 to 0.7 good, and greater than 0.7 very good to excellent.
A sentence that is used to present test words in statistical intelligibility tests (for example, Would you write <test word> now). The test word is spoken without emphasis, and the sentence is the same for each test word. The carrier sentence assures that the reverberant field is excited prior to the test word being spoken, so that its effects are properly accounted for in the test. It also allows dynamics processors such as automatic gain controils or compressors to activate and stabilize.
The term critical distance refers to the distance from a loudspeaker in an enclosed space at which the reverberation is equal in strength to the direct sound from the speaker. Beyond this distance, the reverberant energy tends to mask the direct sound. In truth, because reflected sound loses energy to boundary absorption (and also travels a longer path to the listener, thus incurring greater air absorption losses), the reverberant energy from a discrete pulse sound stimulus can never equal the direct sound on an instantaneous basis. In highly reflective environments, however, the steady-state reverberation strength can easily exceed that of the direct sound at many locations in the space. This degrades the signal-to-noise ratio and destroys intelligibility.
The DALT is derived from the Diagnostic Rhyme Test. It employs a list of ninety-six one-syllable word pairs that differ in only their final consonants (for example, art-arc). These differences are organized in six categories, and scores in each category can be used to identify specific problems in a communication system. Averaged together, the six scores provide a single measure of intelligibility. As in the DRT, listeners are shown a word pair, then asked to identify which word is presented by the talker. Carrier Sentences are not used.
The DMCT is derived from the Diagnostic Rhyme Test. It employs a list of ninety-six two-syllable word pairs that differ in only the middle consonant (for example, bobble-bottle). These differences are organized in six categories, and scores in each category can be used to identify specific problems in a communication system. Averaged together, the six scores provide a single measure of intelligibility. As in the DRT, listeners are shown a word pair, then asked to identify which word is presented by the talker. Carrier Sentences are not used.
Similarly to the Modified Rhyme Test, the DRT uses monosyllabic English words that are constructed from a consonant-vowel-consonant sound sequence. In the DRT, one hundred and ninety two words are arranged in ninety-six rhyming pairs which differ only in their initial consonants (the DRT word list may be seen here). Listeners are shown a word pair, then asked to identify which word is presented by the talker. Carrier Sentences are not used. The DRT is based on a number of distinctive features of speech, and its test results reveal errors in discrimination of initial consonant sounds. The test can be presented in a short period of time and may be scored in several different ways.
This term refers to sound arriving on a direct acoustical path from the source to the listener in an enclosed space (i.e. with no intervening reflections from boundaries). The direct sound is the desired signal for a speech reinforcement system. (See also direct-to-reverberant ratio, reverberation, signal-to-noise ratio, masking.)
The ratio between the intensities of the direct sound and reverberation. There are several measures for this quantity. C50, one of the most popular, expresses speech clarity as the energy ratio of the first 50 milliseconds of direct sound to the overall steady-state reverberation, with 0 dB being the minimum acceptable value and +4 dB or above preferred. A similar measure, C7, is used in Germany; C35 is yet another version. Measurements are made in a single frequency band (usually centered on 1 kHz). Each of these measures can be more reliable and repeatable than %ALcons, which also deals with the direct-to-reverberant ratio.
Discrimination refers to a listeners ability to discern among similar-sounding words or phrases in a speech intelligibility test.
A measure of reverberation, EDT is the time that it takes for the reverberant energy in a room to decrease by 10 dB from its steady-state value. (See RT60.)
Proposed in 1996 by G. Marshall, ELR is similar to C50 but is weighted for speech and incorporates measurements in more than one frequency band. As with other direct-to-reverberant methods, however, factors other than reverberation are not accounted for.
The degree to which speech can be understood. With specific reference to speech communication system specification and testing, intelligibility denotes the extent to which trained listeners can identify words or phrases that are spoken by trained talkers and transmitted to the listeners via the communication system.
In most practical speech communication systems, unwanted sounds may be introduced by a variety of sources (as shown in this diagram). These unwanted sounds effectively reduce the listeners sensitivity to the transmitted speech, thus degrading intelligibility. The effect is termed masking, and is described in detail in Section II.
A word list for statistical intelligibility testing. The modified Rhyme Test uses 50 six-word lists of rhyming or similar-sounding monosyllabic English words, as shown here. Each word is constructed from a consonant-vowel-consonant sound sequence, and the six words in each list differ only in the initial or final consonant sound. Listeners are shown a six-word list and then asked to identify which of the six is spoken by the talker. A carrier sentence is usually used. MRT test results indicate errors in discrimination of both initial and final consonant sounds. Listener responses can be scored as (1) the number of words heard correctly; (2) the number of words heard incorrectly; or (3) the frequency of particular confusions of consonant sounds.
Any unwanted, introduced signal or sound in a communications system or speaking environment. The sources of noise are many, and can be both acoustical (HVAC, street sounds, crowd noise, reverberation and echoes, etc.) and electronic (thermal noise or hiss, hum, etc.). Noise can be correlated with the desired speech signal (reverberation) or it may be uncorrelated (background noise, babble).
The smallest meaningful unit of speech that, if altered, changes the meaning of the word.
The set of twenty phonetically balanced word lists was developed during World War II and has been used very widely since then in statistical intelligibility testing. Here are the first four PB word lists. The words in each list are presented in a new, random order each time the list is used, each spoken in the same carrier sentence. PB intelligibility test requires more training of listeners and talkers than other statistical tests, and is particularly sensitive to signal-to-noise: a relatively small change in S/N causes a large change in the intelligibility score.
Rapid Speech Transmission Index, an machine method of testing for intelligibility in sound systems that is associated with Brüel and Kjaer, the instrumentation company that manufactures a portable device to implement it. RASTI was developed as a simpler alternative to the more complex STI (Speech Transmission Index). In contrast to STI, RASTI measures only in two third-octave bands centered at 500 Hz and 2 kHz, respectively. It uses a speech-like excitation signal and, like STI, correlates reductions in modulation depth to loss of intelligibility.
Reverberation is the persistence of sound in an enclosed space after the original excitation sound has ceased. It consists of a series of very closely spaced reflections, or echoes, whose strength decreases over time due to boundary absorption and air losses.
The standard method for specifying reverberation time, RT60 is the amount of time it takes for the reverberant energy in an enclosed space to drop by 60 dB from its initial, steady-state value after the original sound has ceased. Large rooms with hard, highly reflective surfaces (like cathedrals) have long reverberation times, while smaller rooms with absorptive surfaces have short reverberation times. Here is a diagram that gives preferred RT60 values for various applications.
The ratio between the strength of the desired speech signal and that of introduced noise, expressed in decibels. At 0 dB the two are of equal strength; negative values are associated with loss of intelligibility due to masking. Positive values are usually associated with better intelligibility.
Derived from and in essence identical to STI, SII is the method for by machine measuring speech intelligibility that is currently proposed in draft form as ANSI Standard S3.5-1997. In the Standard, four measurement procedures are allowed, each using a different number and size of frequency bands. In descending order of accuracy, they are:
The SpAT is a test developed by the United States Navy for statistical intelligibility tests using a word list known as ICAO. Listeners respond by writing the spoken word or digit, or by pressing the first letter of the word or number on a keyboard.
Developed in the early 1970s, the Speech Transmission Index (STI) is an machine measure of intelligibility whose value varies from 0 (completely unintelligible) to 1 (perfect intelligibility). In STI testing, speech is modeled by a special test signal with speech-like characteristics. Following on the concept that speech can be described as a fundamental waveform that is modulated by low-frequency signals, STI employs a complex amplitude modulation scheme to generate its test signal. At the receiving end of the communication system, the depth of modulation of the received signal is compared with that of the test signal in each of a number of frequency bands. Reductions in the modulation depth are associated with loss of intelligibility.
The logarithmic ratio between the energy of sounds that are useful to intelligibility and those that are detrimental to it, expressed in decibels. Useful sounds are the integrated energy of speech sounds arriving within the first 50 or 80 milliseconds after the direct sound, and detrimental sounds are the sum of later-arriving speech energy and ambient noise. In practice, both quantities may be found by integrating appropriate portions of the room impulse response. Return to top of page |
|
|
Contact
Us | Terms of Use | Trademarks |