Understanding the Principles of Sound Reproduction--Reflections, Images, and the Precedence Effect [part 2]

Home | Audio Magazine | Stereo Review magazine | Good Sound | Troubleshooting



Above: AD----110V-1000W Hot-Air BGA Rework Soldering Station Motherboard-Repair-Station (click image for more details)

<<Prev.

2. A REFLECTION IN THE PRESENCE OF OTHER REFLECTIONS

Working with a single reflection allows for intensely analytical investigations, but, inevitably, the tests must include others to be realistic. A long-standing belief in the area of control room design is that early reflections from monitor loudspeakers must be attenuated to allow those in the recordings to be audible.


FIG. 8 Detection and image-shift thresholds as a function of delay for a single reflection auditioned in three very different acoustical circumstances: (a) Anechoic. (b) A normal room in which the first-order reflections were attenuated with 2-in. (50 mm) fiberglass board. (c) The same room in a relatively reverberant configuration (mid-frequency reverberation time = 0.4 s). From Olive and Toole, 1989.

Consequently, embodied in several standards, and published designs, are schemes to attenuate or eliminate the first reflections from a loudspeaker using deflecting reflectors, absorbers, or scattering surfaces (diffusers).

Olive and Toole (1989) appear to have been the first to test the validity of this idea. FIG. 8 shows the results of experiments that examined the audibility of a single lateral reflection simulated in an anechoic chamber with 3 ft (1 m) wedges. For the second experiment, the same physical arrangement was replicated in a typical small room in which the first wall, floor, and ceiling reflections had been attenuated using 2-in. (5 cm) panels of rigid fiberglass board. A third experiment was conducted in the same room with most of the absorption removed (mid-frequency reverberation time = 0.4 s). The idea was to show the effects, on the perception of a single reflection, of increasing levels of natural reflected sound within a real room.

The large changes in the level of reflected sound had only a modest (1-5 dB) effect on the absolute threshold or the image-shift threshold of an additional lateral reflection occurring within about 30 ms of the direct sound. At longer delays, the threshold shifts were up to about 20 dB, a clear response to elevated late-reflected sounds in the increasingly live rooms. This is a good point to remember, as we will see it again: the threshold curves become more horizontal when the sound-in this case, speech-becomes prolonged by reflected energy (repetitions).


FIG. 9 A comparison of the absolute thresholds shown in FIG. 8, with measured energy-time curves (ETCs) for the three spaces within which the tests were done. All data from Olive and Toole, 1989.

FIG. 9 shows a direct comparison of the thresh olds with the ETC (energy-time curve) measured in each of the three test environments. Here the huge variations in level of the reflections can be clearly seen, in contrast with the relatively small changes in the detection thresh olds within the first 30 ms or so. Sub-section 6 [below] explains why.

2.1 Real Versus Simulated Rooms

In a large anechoic-chamber simulation of a room of similar size, Bech (1998) investigated the audibility of single reflections in the presence of 16 other reflections, plus a simulated "reverberant" sound field beginning at 22 ms. One of his results is directly comparable with these data. The figure caption in Bech's paper describes the response criterion as "a change in spatial aspects," which seems to match the image shift/image spreading criterion used by Olive and Toole. FIG. 10 shows the image-shift thresholds in the "live" configuration of the IEC room for two subjects (the FT data are from FIG. 9; the SO data were previously unpublished) and thresh olds determined in the simulated room, an average of the three listeners from Bech (1998). The similarity of the results is remarkable considering the very different physical circumstances of the tests. It suggests that listeners were responding to the same audible effect and that the real and simulated rooms had similar acoustical properties.

Bech separately examined the influence of several individual reflections on timbral and spatial aspects of perception. In all of the results, it was evident that signal was a major factor: Broadband pink noise was more revealing than male speech. In terms of timbre changes, only the noise signal was able to show any audible effects and then only for the floor reflection; speech revealed no audible effects on timbre.

Looking at the absorption coefficients used in modeling the floor reflection (Bech, 1996, Table II) reveals that the simulated floor was significantly more reflective than would be the case if it had been covered by a conventional clipped pile carpet on a felt underlay. Further investigations revealed that the detection was based mainly on sounds in the 500 Hz-2 kHz range, meaning that ordinary room furnishings are likely to be highly effective at reducing first reflections below threshold, even for the more demanding signal: broadband pink noise.

In terms of spatial aspects, Bech (1998) concluded that those sounds above ~2 kHz contributed to audibility and that "only the first-order floor reflection will contribute to the spatial aspects." The effect was not large, and, as before, speech was less revealing than broadband noise.

Again, this is a case where a good carpet and underlay would appear to be sufficient to eliminate any audible effect. See FIG. 21.3 for data on the acoustical performance of floor coverings.

In conclusion, it seems that the basic audible effects of early reflections in recordings are well preserved in the reflective sound fields of ordinary rooms. There is no requirement to absorb first reflections to allow recorded reflections to be heard.

2.2 The "Family" of Thresholds

FIG. 11 shows a complete set of thresholds, like those shown in FIG. 5, determined in an anechoic chamber but here determined in the "live" IEC listening room.

The curves are slightly irregular because the data were based on a small number of repetitions. As expected, the curves all have a more horizontal appearance than for speech auditioned in an anechoic environment. It is significant that all the curves have the same basic shape from detection at the bottom to the Haas-inspired equal loudness curve at the top.


FIG. 10 Image-shift thresholds as a function of delay for two listeners in the "live" IEC room (FT data from FIG. 8) and averaged results for three listeners in a simulation of an IEC room using multiple loudspeakers set up in a large anechoic chamber (Bech, 1998).


FIG. 11 The full set of thresholds, as shown in FIG. 5, but here obtained while listening in a normally reflective room rather than an anechoic chamber. One listener (SO). Unpublished data acquired during the experiments of Olive and Toole, 1989.


FIG. 12 An examination of how a real and a phantom center image respond to a single lateral reflection simulated by a loudspeaker located at the right side wall. The room was the "live" version of the IEC listening room used in other experiments. Note that the vertical scale has been greatly expanded to emphasize the lack of any consequential effect. The signal was speech.

3. A COMPARISON OF REAL AND PHANTOM IMAGES

A phantom image is a perceptual illusion resulting from summing localization when the same sound is radiated by two loudspeakers. It is natural to think that these directional illusions may be more fragile than those created by a single loudspeaker at the same location. The evidence shown here applies to the simple case of a single lateral reflection, simulated in a normally reflective room with a loudspeaker positioned along a side wall, as shown in FIG. 12. When detection threshold and image-shift threshold determinations were done first with real and then with phantom center images, in the presence of an asymmetrical single lateral reflection, the differences were insignificantly small. It appears that concerns about the fragility of a phantom center image are misplaced.

Examining the phantom image in transition from front to surround loud speakers (±30° to ±110°), Corey and Woszczyk (2002) concluded that adding simulated reflections of each of the individual loudspeakers did not significantly change image position or blur, but it did slightly reduce the confidence that listeners expressed in the judgment.


FIG. 13 Subjective effects of a single reflection arriving from 40° to the side, adjusted for different delays and sound levels. An important unseen effect is an increase in loudness, which occurs when the reflected sound is within what is colloquially called the "integration interval": about 30 ms for speech and 50 ms or more for music, all depending on the temporal structure of the sound. In this figure, the lowest curve is the hearing threshold. Above this, at short delays, listeners reported various forms of image shift in the direction of the reflection. At all delays larger than about 10 ms, listeners reported "spatial impression" wherein "the source appeared to broaden, the music beginning to gain body and fullness. One had the impression of being in a three-dimensional space" (Barron, 1971, p. 483). Spatial impression increased with increasing reflection level, a fact illustrated in the figure by the increased shading density. The "curve of equal spatial impression" shows that at short delays, levels must be higher to produce the same perceived effect. At high levels and long delays, disturbing echoes were heard (upper right quadrant). At intermediate delays and at all levels, some degree of tone coloration was heard (darkened brushstrokes). The areas identified as exhibiting "image shift" refer to impressions that the principal image has been shifted toward the reflection image. At short delays, this would begin with summing localization-the stereo-image phenomenon in which the image moves to the leading loudspeaker. At longer delays, the image would likely be perceived to be larger and less spatially clear. Finally, at longer delays and higher sound levels, a second image at the location of the reflection would be expected to add to the spatial illusion. From this presentation it is not clear where these divisions occur. From Barron, 1971, Figure 5, redrawn.


FIG. 14 Data from FIG. 6a showing thresholds obtained using speech and data from FIG. 13 showing thresholds obtained using Mozart. The upper curve for music was described as that at which the "apparent source moved from direct sound loudspeaker toward reflection loudspeaker." This could be interpreted as being equivalent to the Olive and Toole "image shift" threshold, but the pattern of the data in the comparison suggests that it is more likely equivalent to the "second image" criterion.


FIG. 15 Detection thresholds for a single lateral reflection, determined in an anechoic chamber for several sounds exhibiting different degrees of "continuity" or temporal extension.

4. EXPERIMENTAL RESULTS WITH MUSIC AND OTHER SOUNDS

A good introduction to investigations that used music is FIG. 13, the widely reproduced illustration from Barron (1971), in which he combines several subjective effects for a single lateral reflection simulated in an anechoic chamber using a "direct sound" loudspeaker at 0° (forward) and a "reflected sound" loud speaker at 40° to the left, both at 3 m distance. For different electronically introduced delays, listeners adjusted the sound level of the "reflection," reporting what they heard while listening to an excerpt from an anechoic-chamber recording of Mozart's Jupiter symphony. They heard several identifiable effects, as shown in the figure and described in the caption. There is more to this matter, but this important paper provides a good summary of research up to that point and some new data contributed by Barron.

There is a lot of information in this diagram, but most of it is familiar from the discussions of perceptions in experiments using speech. In the Barron paper, much emphasis is placed on spatial impression because of the direct parallel with concert hall experiences. These days, discussions of spatial impression would be separated into listener envelopment (LEV) and apparent source width (ASW). The discussions here appear to relate primarily to ASW, but the quote in the caption includes the remark "the impression of being in a three-dimensional space," indicating that it is not a hard division. In any event, Barron considers spatial impression to be a desirable effect, as opposed to "tone coloration." On the topic of "tone coloration," it was suggested that a contributing factor may be comb filtering, the interference between the direct and reflected sound, but Barron further noted that this is mostly a "monaural effect" because "the effect becomes less noticeable as the direct sound and reflection sources are separated laterally." The "tone coloration ... will frequently be masked in a complex reflection sequence," meaning that in rooms with multiple reflecting surfaces, tone coloration is not a concern. More recent evidence supports this opinion.

We will discuss the matter of timbre changes later, and we will see that tone coloration can be either positive or negative, depending on how one asks the question in an experiment. Again, we will go back to the quote in the caption that with the addition of a reflection, "the music [begins] to gain body and fullness," which can readily be interpreted to be tonal coloration but of a possibly desirable form.

4.1 Threshold Curve Shapes for Different Sounds

It is useful to go back now and compare the shapes of the threshold contours determined by Barron for music with those shown earlier for speech, both in anechoic listening conditions. FIG. 14 shows such a comparison, and it is seen that curves obtained using the anechoically recorded Mozart excerpt are much flatter than those for speech.

These data suggest two things. First, it appears that the slower paced, longer notes in the music cause the threshold curves to be flatter than they are for the more compact syllables in speech. This "prolongation" appears to be similar in perceptual effect to that occurring due to reflections in the listening environment (FIG. 8). Second, it appears that the slope of the absolute threshold curve is similar to that of the second-image curve, some thing that was foreshadowed in FIG. 11.

FIG. 15 shows detection thresholds for sounds chosen to exemplify different degrees of "continuity," starting with continuous pink noise and moving through Mozart, speech, castanets with reverberation, and "anechoic" clicks (brief electronically generated pulses sent to the loudspeakers). The result is that increasing "continuity" produces the kind of progressive flattening seen in Figures 6.8 and 6.9. The perceptual effect is similar if the "continuity" or "prolongation" is due to variations in the structure of the signal itself or due to reflective repetitions added in the listening environment.

In any event, pink noise generated an almost horizontal flat line, Mozart was only slightly different over the 80 ms delay range examined, speech produced a moderate tilt, castanets (clicks) with some recorded reverberation were even more tilted, and isolated clicks generated a very compact, steeply tilted threshold curve.

Assuming that the patterns seen in previous data for speech and Mozart apply to other sounds as well, FIG. 16 shows a compilation and extension of data portraying detection thresholds and second-image thresholds for Mozart, speech, and clicks. To achieve this, the second-image curve for clicks had to be "created" by elevating the click threshold curve by an amount similar to the separation of the speech and music curves. Absolute proof of this must await more experiments, but it is interesting to go out on a (strong) limb and speculate.

Looking at the 0 dB relative level line-where the direct and reflected sounds are identical in level-it can be seen that the precedence-effect interval for clicks appears to be just under 10 ms. According to Litovsky et al. (1999), this is consistent with other determinations (<10 ms), and the approximately 30 ms for speech is also in the right range (<50 ms). They offer no fusion interval data for Mozart, but it is reasonable to speculate based on the Barron data that it might be substantially longer than 50 ms. The short fusion interval for clicks suggests that sounds like close-miked percussion instruments might, in an acoustically dead room, elicit second images.


FIG. 16 Using data from Figures 16 and 17, this is a comparative estimate of the detection thresholds and the second image thresholds (i.e., the boundary of the precedence effect) for clicks, speech, and Mozart. The "typical room reflections" suggest that in the absence of any other reflections, the clicks are approaching the point of being detected as a second image. However, normal room reflections would be expected to prevent this from happening because the threshold curve would be flattened (see Figures 8 and 9).


FIG. 17 A comparison of a single large reflection with a sequence of three lower-level reflections. From Cremer and Müller, 1982, Figure 1.16.

5. SINGLE VERSUS MULTIPLE REFLECTIONS

So far, we have looked at some audible effects of single reflections when they appear in anechoic isolation and when they appear in the presence of room reflections. Now we will look at some evidence of how a sequence of reflections is perceived.

Cremer and Müller (1982) provide a limited but interesting perspective. FIG. 17 shows a microphone picking up the direct sound from a loudspeaker and either a single large or three smaller reflections in rapid sequence. The middle layer of images displays sound pressure, showing the direct sound followed by the reflections. The bottom layer of images portrays what Cremer and Müller call an "ear-imitative" function, which is a simple attempt to show that the ear has a short memory that fades with time-a relaxation time.

The point of this illustration is that events occurring within short intervals of each other can accumulate "effect," whatever that may be. The sequence of three smaller reflections can be seen to cause the "ear-imitative" function to progressively grow, although not to the same level as that for the single reflection.

However, when the authors conducted subjective tests in an anechoic chamber, they found that the sequence of three low-level reflections and the large single reflection were "almost equally loud." The message here is that if we believed the impulse response measurements, we might have concluded that by breaking up the large reflecting surface, we had reduced the audible effects.

This is one of the persistent problems of psychoacoustics. Human perception is usually nonlinear, and technical measurements are remarkably linear.

Angus (1997, 1999) compared large, single lateral reflections from a side wall with diffuse-multiple small-reflections from the same surface covered with scattering elements. There were no subjective tests, but mathematical simulations showed some counterintuitive results-namely that although the amplitudes of individual reflections were attenuated (as seen in an ETC), the variations in frequency responses measured at the listening position were not necessarily reduced. If the Cremer and Müller perceptual-summation effect is incorporated, the multiple smaller reflections seen in the ETC may end up being perceived as louder than anticipated. It is suggested, however, that a diffuse reflecting surface may make listening position less critical.

So there are both subjective and objective perspectives indicating that breaking up reflective surfaces may not yield results that align with our intuitions. It is another of those topics worthy of more investigation.



FIG. 18 The left column of data shows results when the second of a series of reflections was adjusted to the threshold of detection when it was broadband; the right column shows comparable data when the reflection was low-pass filtered at 500 Hz. (a) Shows the waterfall diagram, (b) the spectrum of the second reflection taken from the waterfall, and (c) the ETC measured with a Techron 12 in its default condition (Hamming windowing). The signal was speech. The horizontal dotted lines are "eyeball" estimates of reflection levels. From Olive and Toole, 1989, Figures 18 and 19.

6. MEASURING REFLECTIONS

It seems obvious to look at reflections in the time domain, in a "reflectogram" or impulse response, a simple oscilloscope-like display of events as a function of time or, the currently popular alternative, the ETC (energy-time curve). In such displays, the strength of the reflection would be represented by the height of the spike. However, the height of a spike is affected by the frequency content of the reflection, and time-domain displays are "blind" to spectrum. The measurement has no information about the frequency content of the sound it rep resents. Only if the spectra of the sounds represented by two spikes are identical can they legitimately be compared.

Let us take an example. In a very common room acoustic situation, suppose a time-domain measurement reveals a reflection that it is believed needs attenuation. Following a common procedure, a large panel of fiberglass is placed at the reflection point. It is respectably thick-2 in. (50 mm)-so it attenuates sounds above about 500 Hz. A new measurement is made, and-behold!-the spike has gone down. Success, right? Maybe not.

In a controlled situation, Olive and Toole (1989) performed a test intended to show how different measurements portrayed reflections that, subjectively, were adjusted to be at the threshold of detection. So from the listener's perspective, the two reflections that are about to be discussed are the same: just at the point of audibility or inaudibility.

The results are shown in FIG. 18. At the top, the (a) graphs are waterfall diagrams displaying events in three dimensions. At the rear is the direct sound, the next event in time is an intermediate reflection, and at the front is the second reflection, the one that we are interested in. It can be seen that the second reflection is broadband in the left-hand diagram and that frequencies above 500 Hz have been eliminated in the right-hand version. When that particular "layer" of the waterfall is isolated, as in the (b) displays, the differences in frequency content are obvious. The amplitudes are rather similar, although the low-pass filtered version is a little higher, which seems to make sense considering that slightly over 5 octaves of the audible spectrum have been removed from the signal. Recall that these signals have been adjusted to produce the same subjective effect-a threshold detection-and it would be logical for a reduced bandwidth signal to be higher in level.

In contrast, the (c) displays, showing the ETC measurements, were telling us that there might be a difference of about 20 dB in the opposite direction; the narrow-band sound is shown to be lower in level. Obviously, this particular form of the measurement is not a good correlate with the audible effect in this test.

The message is that we need to know the spectrum level of reflections to be able to gauge their relative audible effects. This can be done using time-domain representations, like ETC or impulse responses, but it must be done using a method that equates the spectra in all of the spikes in the display, such as bandpass filtering. Examining the "slices" of a waterfall would also be to the point, as would performing FFTs on individual reflections isolated by time windowing of an impulse response. Such processes need to be done with care because of the trade-off between time and frequency resolution, as explained in Section 13.5. It is quite possible to generate meaningless data.

All of this is especially relevant in room acoustics because acoustical materials, absorbers, and diffusers routinely modify the spectra of reflected sounds.

Whenever the direct and reflected sounds have different spectra, simple broad band ETCs or impulse responses are not trustworthy indicators of audible effects.

Prev. | Next

Top of Page   All Related Articles    Home

Updated: Friday, 2017-05-05 18:53 PST