Understanding the Principles of Sound Reproduction--Reflections, Images, and the Precedence Effect [part 1]

Home | Audio mag. | Stereo Review mag. | High Fidelity mag. | AE/AA mag.

Above: AD----110V-1000W Hot-Air BGA Rework Soldering Station Motherboard-Repair-Station (click image for more details)

1. AUDIBLE EFFECTS OF A SINGLE REFLECTION

Investigations of these effects go back many decades, and observations of our ability to localize a source of sound in an acoustically hostile-that is, reflective-environment were first recorded more than a century ago.

In audio in the past, the terms Haas effect and law of the first wavefront were used to identify this effect, but current scientific work has settled on the other original term, precedence effect. Whatever it is called, it describes the well-known phenomenon wherein the first arrived sound, normally the direct sound from a source, dominates our impression of where sound is coming from.

Within a time interval often called the "fusion zone," we are not aware of reflected sounds that arrive from other directions as separate spatial events. All of the sound appears to come from the direction of the first arrival. Sounds that arrive later than the fusion interval may be perceived as spatially separated auditory images, coexisting with the direct sound, but the direct sound is still perceptually dominant. At very long delays, the secondary images are perceived as echoes, separated in time as well as direction. The literature is not consistent in language, with the word echo often being used to describe a delayed sound that is not perceived as being separate in either direction or time.

Haas was not the first person to observe the primacy of the first arrived sound so far as localization in rooms is concerned (Gardner, 1968, 1969, 1973 describes a rich history), but work done for his PhD thesis in 1949, translated from German to English in Haas (1972), has become one of the standard references in the audio field. Sadly, his conclusions are often mis construed. Let us review his core experiment.

FIG. 1 The approximate frequency range over which reflections appear to influence perceptions of the direction of a sound source and the apparent size of that source. In some circumstances, reflections may be audible as separate "images" of the sound source.

============

FIG. 2 A progression of localization effects observed in the experimental setup used by Haas, including stereo (summing) localization, the precedence effect, and the equal loudness experiment.

Because the experiments were done on a flat roof, to minimize the effect of the roof reflection, Haas placed the loudspeakers directly on the roof, aimed upward toward the listener's ears. He found, though, that there was no significant difference if the loudspeakers were elevated to ear level, and that is the configuration used for the experiments.

A hemi-anechoic listening environment: a flat roof.

Test signal: speech. Equal levels both channels.

Delay = 0 ms to 0.6-1.0 ms

Delay = 1 to 30 ms

This is the precedence effect fusion interval for speech: two sound sources, but only one image is perceived at the earlier loudspeaker.

Delay =30 ms OR the delayed sound is higher in level than the direct sound. Two images, one at each loudspeaker, indicate a breakdown of the precedence effect. The image at the reference loudspeaker is dominant.

The unique Haas experiment: for each delay, the listener increased the sound level of the delayed channel until both images were judged to have the same loudness.

===========

FIG. 2 shows the essence of the experiment. On the hemi-anechoic space provided by the flat roof of a laboratory building, a listener was positioned facing loudspeakers that had been placed 45° apart. The Haas (1972) translation describes the setup as "at an angle of 45° to the left and right side of the observer" (p. 150). This could be construed in two ways. Gardner (1968), however, in a translation of a different Haas document, reports "loudspeakers . . . at an angle of 45°, half to the right and half to the left of him. . . ." When Lochner and Burger (1958) repeated the Haas experiment, they used loud speakers that were placed 90° apart. So there is ambiguity about the angular separation.

A recording of running speech was sent to both loudspeakers, and a delay could be introduced into the signal fed to one of them. In all situations except for FIG. 2d, both signals were radiated with the same sound level.

FIG. 2a shows summing localization. When there is no delay, the perceived result was a phantom (stereo) image floating midway between the loud speakers. When delay was introduced, the center image moved toward the loudspeaker that radiated the earlier sound, reaching that location at delays of about 0.6-1.0 ms. This is called summing localization, and it is the basis for the phantom images that can be positioned between the left and right loudspeakers in stereo recordings, assuming a listener is in the "sweet spot" (Blauert, 1996).

FIG. 2b shows the precedence effect. For delays in excess of 1 ms, it is found that the single image remains at the reference loudspeaker up to about 30 ms. This is the precedence effect-that is, when there are two (or more) sound sources and only one sound image is perceived. It needs to be noted that the 30 ms interval is only for speech and only for equal level direct and reflected sounds.

FIG. 3 The sound level of the delayed sound, relative to that of the first arrival, at which listeners judged the two sound images to be equal in loudness.

FIG. 2c shows multiple images-the breakdown of precedence. With delays greater than 30 ms but certainly by 40 ms, the listener becomes aware of a second sound image at the location of the delayed loudspeaker. The precedence effect has broken down because there are two images, but the second image is a subordinate one; the dominant (louder) localization cue still comes from the loudspeaker that radiated the earlier sound.

FIG. 2d shows the Haas equal-loudness experiment. In the first three illustrations, the first and delayed sounds had identical amplitudes. Obviously, this is artificial because if the delayed sound were a reflection, it would be attenuated by having traveled a greater distance. But Haas moved even farther from passive acoustical realities and deliberately amplified the delayed sound, as would happen in a public address situation. His interest was to determine how much higher in sound level the delayed sound could be before it became the dominant localization cue-in other words, subjectively louder. To do this, he asked his listeners to adjust the sound level of the delayed loudspeaker until both of the perceived images appeared to be equally loud. This is the balance point, beyond which the delayed loudspeaker would be perceived as being dominant. The objective was to prevent an audience from seeing a person speaking in one direction and being distracted by a louder voice coming from a different direction. As shown in FIG. 3, over a wide range of delays, the later loud speaker can be as much as 10 dB higher in level before it is perceived to be equally loud and therefore a major distraction to the audience. Naturally, this would depend on where the audience member is seated relative to the symmetrical axis of the two sound sources.

Haas described this as an "echo suppression effect." Some people have taken this to mean that the delayed sound is masked, but it isn't. Within the precedence effect fusion interval, there is no masking-all of the reflected (delayed) sounds are audible, making their contributions to timbre and loudness, but the early reflections simply are not heard as spatially separate events. They are perceived as coming from the direction of the first sound; this, and only this, is the essence of the "fusion." The widely held belief that there is a "Haas fusion zone," approximately the first 20 ms after the direct sound, within which every thing gets innocently combined, is simply untrue.

Haas observed audible effects that had nothing to do with localization. First, the addition of a second sound source increased loudness. There were some changes to sound quality "liveliness" and "body" (Haas, 1972, p. 150) and a "pleasant broadening of the primary sound source" (p. 159). Increased loudness was a benefit to speech reinforcement, and the other effects would be of concern only if they affected intelligibility.

Benade (1985) contributed a thoughtful summary under the title "Generalized Precedence Effect," in which he stated the following:

1. The human auditory system combines the information contained in a set of reduplicated sound sequences and hears them as though they were a single entity, provided (a) that these sequences are reasonably similar in their spectral and temporal pat terns and (b) that most of them arrive within a time interval of about 40 ms following the arrival of the first member of the set.

2. The singly perceived composite entity represents the accumulated information about the acoustical features shared by the set of signals (tone color, articulation, etc.). It is heard as though all the later arrivals were piled upon the first one without any delay-that is, the perceived time of arrival of the entire set is the physical instant at which the earliest member arrived.

3. The loudness of the perceived sound is augmented above that of the first arrival by the accumulated contributions from the later arrivals. This is true even in the case when one or more of the later signals is stronger than the first one to arrive-that is, a strong later pulse does not start a new sequence of its own.

4. The apparent position of the source of the composite sound coincides with the position of the source of the first--arriving member of the set, regardless of the physical directions from which the later arrivals may be coming.

5. If there are any arrivals of sounds from the original acceptably similar set which come in after a delay of 100-200 ms, they will not be accepted for processing with their fellows. On the contrary, they will be taken as a source of confusion and will damage the clarity and certainty of the previously established percept. These "middle-delay" signals that dog the footsteps of their betters may or may not be heard as separate events.

If for some reason a reasonably strong member of the original set should come in with a delay of something more than 250-300 ms, it will be distinctly heard as a separate echo. This late reflection will be so heard even if it is superposed on a welter of other (for example, reverberant) sounds.

It is important to notice that these very strongly worded categorical statements all emphasize that there is an accumulation of information from the various members of the sequence. It is quite incorrect to assume that the precedence effect is some sort of masking phenomenon which, by blocking out the later arrivals of the signal, prevents the auditory system from being confused. Quite to the contrary, those arrivals that come in within a reasonable time after the first one actively contribute to our knowledge of the source. Furthermore, members of the set that are delayed somewhat too long actually disrupt and confuse our perceptions even when they may not be consciously recognized. If the arrivals are later yet, they are heard as separate events (echoes) and are treated as a nuisance. In neither case are the late arrivals masked out.

FIG. 4 An explanation of how an anechoic simulation can imitate a reflection from a real-flat and perfectly reflective-wall. The anechoic setup uses a real loudspeaker to simulate the "mirror image" loudspeaker in the room situation. This is the experimental method that has been used in numerous experiments conducted over the decades. Electrical adjustments of delay, amplitude, and frequency response of the signal sent to the "reflection" loudspeaker allow the simulation of different geometries and reflective surface types.

FIG. 5 An illustration of the several audible effects that occur when a single lateral reflection is added to a direct sound, in an anechoic simulation similar to that shown in FIG. 4b. All of these curves were determined using speech as a signal. In the experiments, at each of several delays, the sound level of the reflected sound was adjusted to identify those levels at which each of the described perceptions became apparent. The bottom two curves are from Olive and Toole (1989), in which the direct sound was at 0° and the lateral reflection arrived from 65°. Meyer and Schodder (1952) had their reflection arrive from 90°, and listeners reported when the echo was not perceived at all. Lochner and Burger (1958) employed a direct sound arriving from -45° and a delayed sound from +45°, and their listeners reported when the second source was just audible. Adapted from Toole (1990), with additional information from Cremer and Müller, 1982, Figure 1.25.

1.1 Effects of a Single Reflection

This is the "begin at the beginning" experiment, in which the number of variables is minimized. The listening environment is anechoic, the signal is speech, and only a single lateral reflection is examined. It is not data that can be applied to real-world circumstances listening to music or movies, but it is scientific data that establishes a baseline for further research.

In FIG. 5, the lowest curve describes the sound level at which listeners reported hearing any change attributable to the presence of the reflection. This is the "absolute threshold"; nothing is perceived for reflections at lower levels. Most listeners described what they heard as a sense of spaciousness (Olive and Toole, 1989). Although the experiment was conducted in an anechoic chamber, a single detectable reflection was sufficient to create the impression of a (rudimentary) three-dimensional space. Throughout, listeners reported all of the sound as originating at the location of the loudspeaker that reproduced the first sound, meaning that the precedence effect was working. As the sound level of the delayed sound was increased, the impression of spaciousness increased.

The next higher curve is the level at which listeners reported hearing a change in size or position of the main sound image, which the precedence effect causes to be localized at the position of the loudspeaker that reproduced the earlier sound. This was called the "image shift" threshold. In general, these changes were subtle and noticeable in these controlled A versus B comparisons, but it is doubtful that they would be detected in the context of a multiple-image music or movie soundstage. As the sound level of the delayed sound was further increased, the impression of spaciousness also increased.

With the two curves that portray the third perceptual category, a major transition is reached, because it is at this sound level that listeners report hearing a second sound source or image, simultaneously coexisting with the original one (we have not reached the long delays at which there is a sense of a tempo rally as well as a spatially separate echo). Data from Lochner and Burger (1958) and Meyer and Schodder (1952). This means that the precedence-effect directional "fusion" has broken down. Although the original source remains the perceptually louder, spatially dominant source, there is a problem because two spatial events are perceived when there should be only one.

The top curve is from the well-known work by Haas (1972) in which he asked his listeners to adjust the relative levels of the spatially separate images associated with the direct and reflected sounds until they appeared to be equally loud. This tells us that in a public address situation, it is possible to raise the level of delayed sound from a laterally positioned loudspeaker by as much as 10 dB above the direct sound before it is perceived as being as loud as the direct sound. It is important information in the context of professional audio, but it is irrelevant in the context of small-room acoustics.

All of the data points are thresholds-the sound levels at which listeners detected a change in their perceptions. As we will see later, some of the perceived changes are beneficial and, up to a point, listeners find that levels well above threshold provide greater pleasure. For example, the perception described at threshold as "image shift or spreading" may seem like a negative attribute, but when it is translated into what is heard in rooms, it becomes "image broadening" or apparent source width (ASW), which are widely-liked qualities. Even "second-image" thresholds can be exceeded with certain kinds of sounds, expanding the size of an orchestra beyond its visible extent in a concert hall or extending the stereo soundstage beyond the spread of the loudspeakers. In reproduced sound, the picture is more confused because some techniques in the recording process can achieve similar perceptions. Because all of these factors are influenced by how the recordings are made as well as how they are reproduced, these comments are observations, not judgments of relative merit. Some evidence suggests that even these small effects might be diminished by experience during listening within a given room (Shinn-Cunningham, 2003), another in the growing list of perceptual phenomena we can adapt to.

FIG. 6 (a) A simplification of FIG. 5 in which only data that are relevant to sound in small rooms are preserved and a shaded area representing the precedence-effect fusion zone for speech is identified. This is the range of amplitudes and delays within which a reflected sound will not be identified as a separately localizable event. From Toole, 2006. (b) The precedence-effect fusion intervals for delayed sounds at three sound levels. The classic experiments much quoted from psychoacoustic literature generally used equal-level direct and delayed sounds. This is the highest large arrow at 0 dB, showing an interval of about 30 ms. In rooms, delayed sounds are attenuated by propagation loss, typically -6 dB/double distance, and sound absorption at the reflecting surfaces. As the delayed sound is reduced in level, the fusion zone increases rapidly. The set of black dots show the delays and amplitudes for the first six reflections in a typical listening room (Devantier, 2002), indicating that in such rooms, the precedence effect is solidly in control of the localization of speech sounds.

1.2 Another View of the Precedence Effect

If we extract from FIG. 5 those things that are relevant to sound reproduction from a single loudspeaker, we end up with FIG. 6. The Haas data have been removed because amplified delayed sounds do not exist in passive acoustics. The "second-image" data (Lochner and Burger, 1958; Meyer and Schodder, 1952) have been combined into a single average curve for simplicity. There is some justification for doing this, as one curve expresses a "just audible" criterion and the other a "just not audible" criterion.

The area under the "second-image" curve has been shaded. This is the real world precedence-effect fusion zone for speech, within which any delayed sound will not be perceived as a spatially separate localizable event. This perspective is very different from most discussions of the precedence-effect fusion interval.

Normally, only a single number is stated, and that number normally relates to direct and delayed sounds at the same sound levels. This is a correct description of the results of a laboratory experiment but is simply wrong as guidance about what may happen in the real world.

The fusion interval for speech is often quoted as being around 30 ms. This is true for anechoic listening to a single reflection that has the same sound level as the direct sound, as can be seen in FIG. 6b. This is how the classic psychoacoustic experiments were conducted, but these circumstances are far from the acoustical realities in normal rooms. For reflections at realistically lower levels, the fusion interval is much longer. So far, in small rooms, the precedence effect is undoubtedly the dominant factor in the localization of speech.

FIG. 7 The detection thresholds for delayed sounds simulating a wall reflection, a ceiling reflection, and one arriving from the same direction as the direct sound. The test signal was pink noise. Adapted from Olive and Toole, 1989.

1.3 Reflections from Different Directions

FIG. 7 shows more data from Olive and Toole (1989), in which it is seen that the thresholds for the side wall and the ceiling reflections are almost identical. This is counterintuitive because one would expect a lateral reflection to be much more strongly identified by the binaural discrimination mechanism because of the large signal differences at the two ears. For sounds that differ only in elevation, we have only the spectral cues provided by the external ears and the torso (HRTFs). Although the threshold levels might be surprising, intuition is rewarded in that the dominant audible effect of the lateral reflection was spaciousness (the result of interaural differences) and that of the vertical reflection was timbre change (the result of spectral differences). The broadband pink noise used in these tests would be very good at revealing colorations, especially those associated with HRTF differences at high frequencies. On the other hand, continuous noise lacks the strong temporal patterns of some other sounds, like speech.

This makes the findings of Rakerd et al. (2000) especially interesting. These authors examined what happened with sources arranged in a horizontal plane and vertically on the front-back (median sagittal) plane. Using speech as a test sound, they found no significant differences in masked thresholds and echo thresholds sources in the horizontal and vertical planes. In explanation, they agreed with other referenced researchers that there may be an "echo suppression mechanism mediated by higher auditory centers where binaural and spectral cues to location are combined." This is another example of humans being very well adapted to listening in reflective environments.

Another surprise in FIG. 7 is that delayed sounds that come from the same loudspeaker are more difficult to hear; the threshold here is consistently higher than for sounds that arrive from the side or above, slightly for short delays, and much higher (10+ dB) at long delays. Burgtorf (1961) agrees, finding thresholds for coincident delayed sounds to be 5-10 dB higher than those separated by 40-80°. Seraphim (1961) used a delayed source that was positioned just above the direct-sound source (~5° elevation difference) and found that, with speech, the threshold was elevated by about 5 dB compared to one at a 30° horizontal separation. The relative insensitivity to coincident sounds appears to be real, and the explanation seems to be that it is the result of spectral similarities between the direct sound and the delayed sound. These sounds take on progressively greater timbral differences as they are elevated (or, one assumes, lowered) relative to the direct sound. For those readers who have been wondering about the phenomenon of "comb filtering," which will be specifically addressed in Section 9, it is worthy of note that this evidence tells us that the situation of maximum comb filtering, when the direct and delayed sounds emerge from the same loudspeaker, is the one for which we are least sensitive. (Encouraging news!) All this said, it still seems remarkable that a vertically displaced reflection, with no apparent binaural (between the ears) differences, can be detected as well as a reflection that arrives from the side, generating large binaural differences. Not only are the auditory effects at threshold different-timbre versus spaciousness-the perceptual mechanisms required for their detection are also different.

============

THE IEC ROOM

The listening room used in these experiments was the prototype room underlying IEC 268-13-1985. It was constructed at the National Research Council, in Ottawa, within an existing laboratory space (which explains the dimensions). There was little real science to guide the choice of dimensions and acoustical treatment, so the resulting room became one of the variables in ongoing experiments. Of course, at that time stereo was the standard reproduction format. The room was 6.7 m × 4.1 m × 2.8 m (22 ft × 13.5 ft × 9.2 ft) with a mid-frequency reverberation time of 0.34 s. More description and measurements are shown in the appendix of Toole (1982). The original concept of the standard was to specify a room that could be duplicated so test results from different laboratories could be compared.

In subsequent editions of the standard, the requirements were relaxed so more rooms could qualify, which is a different and significantly less useful objective but much more popular among users who want to claim IEC compliance.

=============

Next>>

Prev. | Next