| Statistical intelligibility measurements use human beings, rather
than electronic test instruments, to assess speech communication systems.
First proposed in 1910 and refined with the introduction of the
telephone and the advent of electronic communication systems in
World War II, such tests are still considered to be the most accurate
and reliable measures of intelligibility. While many variations
are in use, this discussion deals most directly with the American
National Standards Institutes approved procedure (ANSI S3.2-1989, Method
for Measuring the Intelligibility of Speech Over Communication
Systems).
Method and Applications
The statistical measurement process uses trained, English-fluent
talkers speaking standardized word lists through the communication
system to trained, English-fluent listeners. The word lists are
crafted to evaluate specific aspects of speech transmission; the
ability of the listeners to identify individual words or word pairs
indicates the quality of the transmission.
Such tests are used in a wide variety of applications, from examining
the acoustics of conference rooms to evaluating intercoms for deep-sea
divers. In professional sound reinforcement, statistical tests
provide crucial information for architects and consultants, both
in designing speech reinforcement systems and refining their performance
in the field. They may also be used to evaluate the contributions
that specific microphones, loudspeakers and signal processors make
to speech intelligibility.
Preparation
In order for the results of any intelligibility test to be valid,
those conducting the test must be well versed in experimental design
and statistical data analysis. Since human subjects are central
to the tests, the experimenters must also understand the psychological
factors involved, including the effects of motivation and learning
through repetition. Finally, they must, of course, know how to
operate the sound system properly so as to avoid introducing errors.
For all of these reasons, intelligibility tests invariably are
made by trained consultants who specialize in the field.
The tests use a minimum of five talkers and five listeners; larger
subject groups reduce the margin of error. Talkers and listeners
are selected to assure a representative cross-section of age and
gender. All must speak English as their first language and have
normal hearing. Talkers must have good articulation, and
are trained both to speak at a consistent level and to synchronize
their words with timing signals so that the rate of presentation
doesnt skew the test results in any way. Listeners must have
good discrimination, and
are familiarized with all the test words that will be used, the
sound of each talkers voice and the method of recording responses.
A number of specialized word lists are in common use for testing
various aspects of speech communication. The ANSI standard specifies
three:
Other examples of word lists include:
Testing
If at all possible, the sound system should be tested under conditions
of actual use: if there are potential sources of masking noise
such as outside traffic or an HVAC system, these should be present
during the testing and documented for the report. Its also
important that the system gains be set to a representative sound
pressure level. Pre-recorded test material can be used as long
as the recording and playback equipment dont introduce significant
noise or distortion.
At a minimum, each talker is given three PB or MRT word lists
- or the complete DRT list - to read. Where only one sound system
is being tested, the trained subjects are first tested face-to-face
or in similarly ideal conditions to establish a control or
baseline measurement. (Under these circumstances the intelligibility
should be nearly perfect.) This score is then used as a reference
to which the system under test can be compared. During testing,
supplementary information such as the speed/certainty of the listeners responses
and their statistical opinions about the sound system should be
gathered.
Analyzing the Results
There are many ways of analyzing the test data depending on the
characteristics of the particular word list and the variables being
tested. At the least, a set of percentage scores is calculated
showing the number of times words were identified correctly by
each listener. Taking an average of these can produce a single
overall score. If either the DRT or MRT is used, the results are
adjusted mathematically to account for guessing (no adjustment
is required for the PB test). Deeper statistical analyses can yield
more detailed information about the sound system if undertaken
carefully.
We Invite Your Feedback On
These Papers
And we hope to be able to create a forum for discussion through that
feedback.
Next Section
|