| Statistical tests using trained talkers and listeners are by far
the most accurate and reliable methods for intelligibility testing.
Unfortunately, they are complicated to set up, time-consuming to conduct
and require extensive statistical analysis to interpret.
Hence, consultants and acousticians have long sought an automated,
machine-based test that could quickly and easily yield meaningful
intelligibility scores for speech systems. A number of methods
have emerged over the past fifty-odd years that fall into two basic
categories: analyses of the reverberant
field, and measurements based on signal-to-noise
ratio.
Reverberation Analysis
From at least the ancient Classical period, architects have recognized
that reverberation and echoes hamper intelligibility. Indeed, that
realization resulted in the development of the Greek amphitheater,
a durable architectural model that survives to this day.
Modern acousticians have at their disposal several different
methods to test reverberation in enclosed spaces. The most commonly
used of these are:
Each of these tests can tell us something about the reverberant qualities
of a space and, therefore, how intelligible speech could be in that
space. Since they deal predominantly with reverberation, however,
they fail to take into account the majority of the factors that can
affect a speech reinforcement systems performance.
Signal-to-Noise Methods
With the advent of electronic communication systems and their
complex potential problems, acousticians and engineers recognized
that different machine testing approaches were needed. Beginning
as early as the 1940s with telephony research at Bell Laboratories,
several instrument-based tests have evolved, each of which relies
on signal-to-noise measurements in one form or another. They are:
AI is now of interest chiefly for having demonstrated the relative
importance of different frequency bands in the speech spectrum; because
it doesnt effectively account for reverberation, it has been
largely superseded by the newer methods. Of these, only RASTI is
available in a simple, reasonably-priced instrument.
SII (which is proposed as ANSI standard S3.5-1997) is the most
robust of the machine intelligibility measures, but it requires
sophisticated equipment and the calculations that it entails are
quite complex. Given the prodigious computing power thats
now available at reasonable cost, however, a practical, affordable
SII instrument could soon become a reality.
Limitations of Machine Measures
Their relative convenience notwithstanding, all machine-based
intelligibility measures have inherent limitations.
Every machine testing method requires that the operator have
significant experience and analytical skill if the results are
to be accurate and useful. It can be very difficult to identify
inaccurate or misleading scores and determine their causes. Most
significantly, adjustments to the system that improve intelligibility
may not positively affect the measured score - and adjustments
that improve the measurements may not enhance intelligibility.
In addition to these factors, each testing method has its own
particular limitations that must be weighed both when carrying
out the tests and when interpreting the results.
Percentage Articulation Loss of Consonants. This machine measure
of intelligibility is
closely associated with the TEF sound analyzer. It is computed
from measurements of the Direct-to-Reverberant
Ratio and the Early
Decay Time using a set of correlations defined by SynAudCon,
and is specified in percent.
Since %ALcons expresses loss of consonant definition,
lower values are associated with greater intelligibility. It is
generally assumed that the maximum allowable value for typical
paging applications is 10%, assuming that the environment is
relatively free of masking noise.
For learning environments and voice warning systems, the desired
value is 5% or less.
The %Alcons method is widely used by acoustical consultants (particularly
in the United States), but it has significant drawbacks. First,
it is based on measurements in a single one-third octave band centered
on 2 kHz; all other frequencies are ignored, so the systems
frequency response must be verified in some other way for the %Alcons
score to be meaningful.
Moreover, the method does not account for many factors that can
dramatically affect intelligibility, including signal-to-noise
ratio, the background noise spectrum, distortion, late reflections
or echoes, system frequency response, compression, non-linear phase,
equalization and acoustic power. %Alcons measurements of sound
systems therefore often yield overly optimistic scores. Where reverberation
or strong, late-arriving reflections are the primary problem, however,
they can sometimes be more useful and accurate than RASTI.
The ratio between the intensities of the direct
sound and reverberation.
There are several measures for this quantity. C50, one of the
most popular, expresses speech clarity as the energy ratio of
the first 50 milliseconds of direct sound to the overall steady-state
reverberation, with 0 dB being the minimum acceptable value and
+4 dB or above preferred. A similar measure, C7, is used in Germany;
C35 is yet another version. Measurements are made in a single
frequency band (usually centered on 1 kHz). Each of these measures
can be more reliable and repeatable than %ALcons,
which also deals with the direct-to-reverberant ratio.
The logarithmic ratio between the energy of sounds that are useful
to intelligibility and those that are detrimental to it, expressed
in decibels.
Useful sounds are the integrated energy of speech
sounds arriving within the first 50 or 80 milliseconds after the direct
sound, and detrimental sounds are the sum of later-arriving
speech energy and ambient noise. In practice, both quantities may
be found by integrating appropriate portions of the room impulse
response.
Proposed in 1996 by G. Marshall, ELR is similar to C50 but
is weighted for speech and incorporates measurements in more than
one frequency band. As with other direct-to-reverberant methods,
however, factors other than reverberation are
not accounted for.
One of the earliest attempts to measure by machine the intelligibility of
a speech transmission system, the Articulation Index was developed
by Bell Telephone Laboratories in the 1940s.
AI is based on the idea that the response of a speech communication
system can be divided into twenty frequency bands, each of which
carries an independent contribution to the intelligibility of the
system, and that the total contribution of all the bands is the
sum of the contributions of the individual bands. (AI may also
be measured using one-third octave or octave bands.) Signal-to-noise
ratios are computed for each individual band, then weighted
and combined to yield an intelligibility score.
The AI varies in value from 0 (completely unintelligible) to
1 (perfect intelligibility). An AI of 0.3 or below is considered
unsatisfactory, 0.3 to 0.5 satisfactory, 0.5 to 0.7 good, and greater
than 0.7 very good to excellent.
Developed in the early 1970s, the Speech Transmission Index
(STI) is an machine measure of intelligibility whose value varies
from 0 (completely unintelligible) to 1 (perfect intelligibility).
In STI testing, speech is modeled by a special test signal with
speech-like characteristics. Following on the concept that speech
can be described as a fundamental waveform that is modulated by
low-frequency signals, STI employs a complex amplitude modulation
scheme to generate its test signal. At the receiving end of the
communication system, the depth of modulation of the received signal
is compared with that of the test signal in each of a number of
frequency bands. Reductions in the modulation depth are associated
with loss of intelligibility.
Rapid Speech Transmission Index, an machine method of testing
for intelligibility in sound systems that is associated with Brüel
and Kjaer, the instrumentation company that manufactures a portable
device to implement it.
RASTI was developed as a simpler alternative to the more complex STI (Speech
Transmission Index). In contrast to STI, RASTI measures only in
two octave bands centered at 500 Hz and 2 kHz, respectively. It
uses a speech-like excitation signal and, like STI, correlates
reductions in modulation depth to loss of intelligibility.
RASTI has been implemented in a simple, portable instrument that
can make very rapid intelligibility measurements, both acoustically
and with an installed sound system. For this reason, it has been
adopted for a number of European standards and civil system specifications.
Being a radically simplified version of STI, however, it suffers
compromises that have forced reevaluation of those standards.
For example, RASTI tests in only two frequency bands, with the
assumption that the sound systems response actually extends
in a reasonably flat fashion from 100 Hz or lower to 8 kHz or higher.
While this might well be the case in a properly-designed auditorium
system, many types of paging systems fall short of such performance.
In these cases, RASTI almost invariably gives an overly optimistic
picture. (In fact, a sound system that reproduced only the two
frequency bands in question could receive a perfect rating.)
Moreover, because it affects modulation depth, any compression
or limiting in the system can cause an artificially low RASTI value
- despite the fact that it may, in actuality, be acting to enhance
intelligibility. RASTI also does not take system distortion or
non-linear amplitude and phase into account.
Derived from and in essence identical to STI,
SII is the method for by machine measuring speech intelligibility that
is currently proposed in draft form as ANSI Standard S3.5-1997.
In the Standard, four measurement procedures are allowed, each
using a different number and size of frequency bands. In descending
order of accuracy, they are:
- Critical band (21 bands)
- One-third octave band (18 bands)
- Equally-contributing critical band (17 bands)
- Octave band (6 bands)
The value of SII varies from 0 (completely unintelligible) to 1 (perfect
intelligibility).
SII is a highly capable testing method that, under the right
conditions, shows good correlation with statistical tests. It features
both wide bandwidth (150 Hz to 8.5 kHz) and, especially in the
critical band procedure, far greater resolution than any other
method. SII properly includes reverberation, noise and distortion,
all of which are accounted for in the modulation transfer function.
Experienced test operators can go beyond generating a single intelligibility
score to diagnosing the source of a loss in intelligibility.
Under certain conditions, however, SII can yield misleading results.
In particular, late-arriving reflections and echoes can distort
the measurement significantly. Like RASTI, SII is susceptible to
giving artificially low intelligibility scores if compression or
limiting is introduced in the system. And because even the critical-band
procedure ignores frequencies below 100 Hz, it may very well miss
significant low-frequency masking sources.
Finally, SII does not take non-linear phase into account. Nonetheless,
when used correctly by a skilled operator, it remains the most
reliable and accurate of the machine methods.
We Invite Your Feedback On
These Papers
And we hope to be able to create a forum for discussion through that
feedback.
Next Section
|