| Most people have had this experience:
Youre driving along in your car, windows down and the radio
playing. Its a new song, one youve never heard before
by an artist you dont recognize, and youve got to get
the name so you can buy the disc. The music ends, the announcer
comes on and . . .
. . . you cant understand him over the road noise.
As this simple example illustrates, theres an important
difference between music and speech. The brain is capable of filling
in a fair amount of missing information in music, because
theres a high degree of redundancy (If you didnt get
the bass line in the first four measures, youll pick it up
when it repeats in the next four.) But speech is rich in constantly-changing
information and has less redundancy than music. If even a modest
percentage of the information is garbled or missing, the brain
cant decipher the message.
Speech communication systems therefore are subject to more stringent
requirements than music systems. These pages discuss speech intelligibility
in sound reinforcement - what it is, what affects it and how its
measured.
The Speech Signal
Human speech is a continuous waveform with a fundamental frequency
in the range of 100-400 Hz. (The average is about 100 Hz for men
and 200 Hz for women.) At integer multiples of the fundamental
are a series of changing harmonics called formants which
are determined by the resonant characteristics of the vocal tract.
Formants create the various vowel sounds and transitions among
them. Consonant sounds, which are impulsive and/or noisy, occur
in the range of 2 kHz to about 9 kHz. (Here is
a vocal spectrum graph for male and female speakers with an idealized human
vocal spectrum superimposed.)
The sound power in speech is carried by the vowels, which average
from 30 to 300 milliseconds in duration. Intelligibility is
imparted chiefly by the consonants, which average from 10 to 100
milliseconds in duration and may be as much as 27 dB lower in amplitude
than the vowels. The strength of the speech signal varies as a
whole, and the strength of individual frequency ranges varies with
respect to the others as the formants change.
Speech Comprehension
The listeners challenge is to parse speech sounds into
meaningful units of language - a complicated task. Gaps in the
sound dont necessarily correspond to word or syllable breaks.
Speech sounds also are not discrete events: rather, they merge
and overlap in time, and the articulation of a given phoneme differs
in different contexts and with different speakers.
In fact, the precise ways in which the ear-brain mechanism decodes
speech remain something of a mystery. Such factors as loudness,
duration and spectral content certainly affect speech perception,
but how they may interact is not fully understood.
Diminished intelligibility is associated with a loss of information
that is coded in a number of highly interactive elements, and many
factors influence it. Background noises can mask the
speech. Both the direction of the source, relative to the listener,
and the direction of the interfering noise can alter the degree
of masking. Intelligibility is also affected by the predictability
of the message, the speaker's enunciation and, not least, the acuity
of the listeners hearing.
We Invite Your Feedback On
These Papers
And we hope to be able to create a forum for discussion through that
feedback.
Next Section
|