Meyer Sound
Meyer Sound Logo

 

Speech Intelligibility Papers

Glossary of Terms

 

%ALcons

Articulation

Articulation Index

Carrier Sentence

Critical Distance

Diagnostic Alliteration Test

Diagnostic Medial Consonant Test


Diagnostic Rhyme Test

Direct Sound

Direct-to-Reverberant Ratio

Discrimination

Early Decay Time

Early-to-Late Sound Energy Ratio

Intelligibility


Masking

Modified Rhyme Test

Noise

Phoneme

Phonetically Balanced Word List

RASTI

Reverberation


RT60

Signal-to-Noise Ratio

Speech Intelligibility Index

Spelling Alphabet Test

STI

Useful-to-Detrimental Sound Ratio (U50 or U80)

 

 

 


%ALcons

Percentage Articulation Loss of Consonants. This machine measure of intelligibility is closely associated with the TEF sound analyzer. It is computed from measurements of the Direct-to-Reverberant Ratio and the Early Decay Time using a set of correlations defined by SynAudCon, and is specified in percent.

Since %ALcons expresses loss of consonant definition, lower values are associated with greater intelligibility. It is generally assumed that the maximum allowable value for typical paging applications is 10%, assuming that the environment is relatively free of masking noise. For learning environments and voice warning systems, the desired value is 5% or less.

Return to top of page


Articulation

“Word Articulation” refers to the number of test words correctly identified in an intelligibility test. It is expressed in percent.

The term “articulation” also refers to the quality of a speaking person’s enunciation. The greater a given talker’s articulation (consonants are crisp and distinct, vowels are clearly articulated and not slurred), the more intelligible his or her speech will be.

Return to top of page


Articulation Index (AI)

One of the earliest attempts to measure by machine the intelligibility of a speech transmission system, the Articulation Index was developed by Bell Telephone Laboratories in the 1940’s.

AI is based on the idea that the response of a speech communication system can be divided into twenty frequency bands, each of which carries an independent contribution to the intelligibility of the system, and that the total contribution of all the bands is the sum of the contributions of the individual bands. (AI may also be measured using one-third octave or octave bands.) Signal-to-noise ratios are computed for each individual band, then weighted and combined to yield an intelligibility score.

The AI varies in value from 0 (completely unintelligible) to 1 (perfect intelligibility). An AI of 0.3 or below is considered unsatisfactory, 0.3 to 0.5 satisfactory, 0.5 to 0.7 good, and greater than 0.7 very good to excellent.

Return to top of page


Carrier Sentence

A sentence that is used to present test words in statistical intelligibility tests (for example, “Would you write <test word> now”). The test word is spoken without emphasis, and the sentence is the same for each test word. The carrier sentence assures that the reverberant field is excited prior to the test word being spoken, so that its effects are properly accounted for in the test. It also allows dynamics processors such as automatic gain controils or compressors to activate and stabilize.

Return to top of page


Critical Distance

The term “critical distance” refers to the distance from a loudspeaker in an enclosed space at which the reverberation is equal in strength to the direct sound from the speaker. Beyond this distance, the reverberant energy tends to mask the direct sound.

In truth, because reflected sound loses energy to boundary absorption (and also travels a longer path to the listener, thus incurring greater air absorption losses), the reverberant energy from a discrete pulse sound stimulus can never equal the direct sound on an instantaneous basis. In highly reflective environments, however, the steady-state reverberation strength can easily exceed that of the direct sound at many locations in the space. This degrades the signal-to-noise ratio and destroys intelligibility.

Return to top of page


Diagnostic Alliteration Test (DALT)

The DALT is derived from the Diagnostic Rhyme Test. It employs a list of ninety-six one-syllable word pairs that differ in only their final consonants (for example, art-arc). These differences are organized in six categories, and scores in each category can be used to identify specific problems in a communication system. Averaged together, the six scores provide a single measure of intelligibility. As in the DRT, listeners are shown a word pair, then asked to identify which word is presented by the talker. Carrier Sentences are not used.

Return to top of page


Diagnostic Medial Consonant Test (DMCT)

The DMCT is derived from the Diagnostic Rhyme Test. It employs a list of ninety-six two-syllable word pairs that differ in only the middle consonant (for example, bobble-bottle). These differences are organized in six categories, and scores in each category can be used to identify specific problems in a communication system. Averaged together, the six scores provide a single measure of intelligibility. As in the DRT, listeners are shown a word pair, then asked to identify which word is presented by the talker. Carrier Sentences are not used.

Return to top of page


Diagnostic Rhyme Test (DRT)

Similarly to the Modified Rhyme Test, the DRT uses monosyllabic English words that are constructed from a consonant-vowel-consonant sound sequence. In the DRT, one hundred and ninety two words are arranged in ninety-six rhyming pairs which differ only in their initial consonants (the DRT word list may be seen here). Listeners are shown a word pair, then asked to identify which word is presented by the talker. Carrier Sentences are not used.

The DRT is based on a number of distinctive features of speech, and its test results reveal errors in discrimination of initial consonant sounds. The test can be presented in a short period of time and may be scored in several different ways.

Return to top of page


Direct Sound

This term refers to sound arriving on a direct acoustical path from the source to the listener in an enclosed space (i.e. with no intervening reflections from boundaries). The direct sound is the desired signal for a speech reinforcement system. (See also direct-to-reverberant ratio, reverberation, signal-to-noise ratio, masking.)

Return to top of page


Direct-to-Reverberant Ratio

The ratio between the intensities of the direct sound and reverberation. There are several measures for this quantity. C50, one of the most popular, expresses speech clarity as the energy ratio of the first 50 milliseconds of direct sound to the overall steady-state reverberation, with 0 dB being the minimum acceptable value and +4 dB or above preferred. A similar measure, C7, is used in Germany; C35 is yet another version. Measurements are made in a single frequency band (usually centered on 1 kHz). Each of these measures can be more reliable and repeatable than %ALcons, which also deals with the direct-to-reverberant ratio.

Return to top of page


Discrimination

“Discrimination” refers to a listener’s ability to discern among similar-sounding words or phrases in a speech intelligibility test.

Return to top of page


Early Decay Time (EDT)

A measure of reverberation, EDT is the time that it takes for the reverberant energy in a room to decrease by 10 dB from its steady-state value. (See RT60.)

Return to top of page


Early-to-Late Sound Energy Ratio (ELR)

Proposed in 1996 by G. Marshall, ELR is similar to C50 but is weighted for speech and incorporates measurements in more than one frequency band. As with other direct-to-reverberant methods, however, factors other than reverberation are not accounted for.

Return to top of page


Intelligibility

The degree to which speech can be understood. With specific reference to speech communication system specification and testing, intelligibility denotes the extent to which trained listeners can identify words or phrases that are spoken by trained talkers and transmitted to the listeners via the communication system.

Return to top of page


Masking

In most practical speech communication systems, unwanted sounds may be introduced by a variety of sources (as shown in this diagram). These unwanted sounds effectively reduce the listener’s sensitivity to the transmitted speech, thus degrading intelligibility. The effect is termed “masking,” and is described in detail in Section II.

Return to top of page


Modified Rhyme Test (MRT)

A word list for statistical intelligibility testing. The modified Rhyme Test uses 50 six-word lists of rhyming or similar-sounding monosyllabic English words, as shown here. Each word is constructed from a consonant-vowel-consonant sound sequence, and the six words in each list differ only in the initial or final consonant sound. Listeners are shown a six-word list and then asked to identify which of the six is spoken by the talker. A carrier sentence is usually used.

MRT test results indicate errors in discrimination of both initial and final consonant sounds. Listener responses can be scored as (1) the number of words heard correctly; (2) the number of words heard incorrectly; or (3) the frequency of particular confusions of consonant sounds.

Return to top of page


Noise

Any unwanted, introduced signal or sound in a communications system or speaking environment. The sources of noise are many, and can be both acoustical (HVAC, street sounds, crowd noise, reverberation and echoes, etc.) and electronic (thermal noise or hiss, hum, etc.). Noise can be correlated with the desired speech signal (reverberation) or it may be uncorrelated (background noise, babble).

Return to top of page


Phoneme

The smallest meaningful unit of speech that, if altered, changes the meaning of the word.

Return to top of page


Phonetically Balanced Word List (PB) \

The set of twenty phonetically balanced word lists was developed during World War II and has been used very widely since then in statistical intelligibility testing. Here are the first four PB word lists. The words in each list are presented in a new, random order each time the list is used, each spoken in the same carrier sentence.

PB intelligibility test requires more training of listeners and talkers than other statistical tests, and is particularly sensitive to signal-to-noise: a relatively small change in S/N causes a large change in the intelligibility score.

Return to top of page


RASTI

Rapid Speech Transmission Index, an machine method of testing for intelligibility in sound systems that is associated with Brüel and Kjaer, the instrumentation company that manufactures a portable device to implement it.

RASTI was developed as a simpler alternative to the more complex STI (Speech Transmission Index). In contrast to STI, RASTI measures only in two third-octave bands centered at 500 Hz and 2 kHz, respectively. It uses a speech-like excitation signal and, like STI, correlates reductions in modulation depth to loss of intelligibility.

Return to top of page


Reverberation

Reverberation is the persistence of sound in an enclosed space after the original excitation sound has ceased. It consists of a series of very closely spaced reflections, or echoes, whose strength decreases over time due to boundary absorption and air losses.

Return to top of page


RT60

The standard method for specifying reverberation time, RT60 is the amount of time it takes for the reverberant energy in an enclosed space to drop by 60 dB from its initial, steady-state value after the original sound has ceased. Large rooms with hard, highly reflective surfaces (like cathedrals) have long reverberation times, while smaller rooms with absorptive surfaces have short reverberation times. Here is a diagram that gives preferred RT60 values for various applications.

Return to top of page


Signal-to-Noise Ratio

The ratio between the strength of the desired speech signal and that of introduced noise, expressed in decibels. At 0 dB the two are of equal strength; negative values are associated with loss of intelligibility due to masking. Positive values are usually associated with better intelligibility.

Return to top of page


Speech Intelligibility Index (SII)

Derived from and in essence identical to STI, SII is the method for by machine measuring speech intelligibility that is currently proposed in draft form as ANSI Standard S3.5-1997.

In the Standard, four measurement procedures are allowed, each using a different number and size of frequency bands. In descending order of accuracy, they are:

  • Critical band (21 bands)
  • One-third octave band (18 bands)
  • Equally-contributing critical band (17 bands)
  • Octave band (6 bands)
The value of SII varies from 0 (completely unintelligible) to 1 (perfect intelligibility).

Return to top of page


Spelling Alphabet Test (SpAT)

The SpAT is a test developed by the United States Navy for statistical intelligibility tests using a word list known as ICAO. Listeners respond by writing the spoken word or digit, or by pressing the first letter of the word or number on a keyboard.

Return to top of page


STI

Developed in the early 1970’s, the Speech Transmission Index (STI) is an machine measure of intelligibility whose value varies from 0 (completely unintelligible) to 1 (perfect intelligibility).

In STI testing, speech is modeled by a special test signal with speech-like characteristics. Following on the concept that speech can be described as a fundamental waveform that is modulated by low-frequency signals, STI employs a complex amplitude modulation scheme to generate its test signal. At the receiving end of the communication system, the depth of modulation of the received signal is compared with that of the test signal in each of a number of frequency bands. Reductions in the modulation depth are associated with loss of intelligibility.

Return to top of page


Useful-to-Detrimental Sound Ratio (U50 or U80)

The logarithmic ratio between the energy of sounds that are useful to intelligibility and those that are detrimental to it, expressed in decibels.

“Useful” sounds are the integrated energy of speech sounds arriving within the first 50 or 80 milliseconds after the direct sound, and “detrimental” sounds are the sum of later-arriving speech energy and ambient noise. In practice, both quantities may be found by integrating appropriate portions of the room impulse response.

Return to top of page

 


 

Contact Us | Terms of Use | Trademarks
Copyright © 2008 Meyer Sound Laboratories Inc.