Home | Audio Magazine | Stereo Review magazine | Good Sound | Troubleshooting Departments | Features | ADs | Equipment | Music/Recordings | History |
[Arnold I. Klayman is a senior scientist at Hughes Aircraft Company in Rancho Santa Margarita, Cal.] Early in the 1970s, long before I joined my present employer, Hughes Aircraft Corp., I ran a consulting and production firm concerned primarily with acoustics and speaker design. While I was designing speakers as best I could, I recognized certain shortcomings in them and, for that matter, in all stereo systems that I had heard over the years. Stereo systems, no matter how well designed or expensive, presented a two dimensional image without any real depth. Even worse, sounds were almost always confined to the space between the speakers. If speakers were placed, say, 8 feet apart, then the apparent stage was 8 feet wide and no wider. Another shortcoming of stereo systems at that time, I felt, was related to a well-known characteristic of human hearing-the so-called precedence effect. So long as I positioned my chair exactly midway between the speaker and at a sufficient distance in front of them, the monophonic portion of any stereo program-that portion involving a vocalist or a featured soloist-remained fixed or centered between the two speakers. How ever, if I moved closer to one speaker or the other, those sounds that should have remained centered drifted along with me. I began to wonder whether there was some way to correct these flaws. When I went to work for Hughes in 1986, I was asked to work on a project involving stereo systems for commercial aircraft, the kind you listen to with headphones when flying in an airliner. As anyone who has had that listening experience (or, for that matter, anyone who listens to portable radios or cassette players through headphones) can confirm, no matter how good the quality of the headphones, sounds still seem to be coming from inside your head instead of from up front, where the aircraft's motion picture or TV screen is located. My colleagues and I were assigned to come up with a sys tem that would get the sound out of the listener's head and, if possible, expand the stereo image to a more realistic dimension. So, I began reading a variety of scientific papers spanning more than 50 years of acoustic and psychoacoustic research. (A bibliography of the papers and books that I used is found at the end of this article.) After a great deal of experimentation, I came up with a system that seemed to solve these problems. I filed for my first patent for the system on November 12, 1986, and it (Patent Number 4,748,669) was granted on May 31, 1988. As issued, the patent, assigned to Hughes Aircraft Co., contained no fewer than 159 granted claims. Additional patents, involving further improvements to the system, have also been granted. Audio readers may gain a better insight into the sys tem that was subsequently called SRS (Sound Retrieval System) if I quote a few paragraphs from the section of the patent headed "Background of the Invention." The disclosed invention generally relates to an enhancement system for stereo sound reproduction systems, and is particularly directed to a stereo enhancement system which broadens the stereo sound image, provides for an in creased stereo listening area, and pro vides for perspective correction for the use of speakers or headphones .... …. Another consideration in stereo re production is the fact that sound transducers (typically speakers or headphones) are located at predetermined locations, and therefore provide sounds emanating from such predetermined locations. However, in a live performance the perceived sound may emanate from many directions as a result of the acoustics of the structure where the performance takes place. The human ears and brain cooperate to determine direction on the basis of different phenomena, including relative phase shift for low-frequency sounds, relative intensity for sounds in the voice range, and relative time arrival for sounds having fast rise times and high-frequency components. As a result of the predetermined locations of speakers or headphones, a listener receives erroneous cues as to the directions from which the reproduced sounds are emanating. For example, for speakers located in front of the listener, sounds that should be heard from the side are heard from the front and there fore are not readily perceived as being sounds emanating from the sides. For headphones or side-mounted speakers, sounds that should emanate from the front emanate from the sides. Thus, as a result of the placement of speakers or headphones, the sound perspective of a recorded performance is incorrect. There have been numerous attempts to spread and widen the stereo image with mixed results. For example, it is known that the left and right stereo signals may be mixed to provide a difference signal (such as left minus right) and a sum signal (left plus right) which can be selectively processed and then mixed to provide processed left and right signals. Particularly, it is well known that increasing or boosting the difference signal produces a wider stereo image. However, indiscriminately increasing the difference signal creates problems since the stronger frequency components of the difference signal tend to be concentrated in the midrange. One problem is that the reproduced sound is very harsh and annoying, since the ear has greater sensitivity to the range of about 1 kHz to 4 kHz within the mid range . . . . Another problem is that the listener is limited to a position that is equidistant between speakers, since the midrange includes frequencies having wavelengths comparable to the distance between the listener's ears (which have frequencies in the range between about 1 kHz and 2 kHz). As to such frequencies . . ., a slight shift in the position of the listener's head provides an annoying shift in the stereo image. More over, the perceived widening of the stereo image resulting from the indiscriminate boosting of the difference signal is small, and is clearly not worth the attendant problems. Some known stereo imaging systems require additional amplifiers and speakers. However, with such systems the stereo image is limited by the placement of the speakers. Moreover, placing speakers at different locations does not necessarily provide the correct sound perspective. With other systems, fixed or variable delays are provided. However, such de lays interfere with the accuracy of the reproduced sound since whatever de lays existed in the performance that was recorded are already present in the re cording. Moreover, delays introduce further complexity and limit the listener's position.
Much of what happens when SRS is working involves psychoacoustics. Many of the studies and research performed by those whose papers I read revealed that we perceive the direction from which sounds come by at least three different means. We detect the relative phase of sounds in the case of low frequencies (between about 20 and 200 Hz.) For midrange sounds (300 to 4,000 Hz) we detect relative intensity: Sounds coming from the side sound louder to the nearer ear and sound softer to the ear on the other side of our head. For higher frequency sounds--those having fast rise-times--we judge direction by the relative time of arrival. Those sounds reach the closer ear sooner than they reach the ear that is further away from the source of sound. There is, however, a fourth factor that governs the way we judge the direction from which sounds originate. It is this factor that, until the development of SRS, had been ignored in stereo reproducing systems. It has to do with the way our hearing system's frequency response varies, depending on the angle from which sounds reach our ears. The outer ear, known as the pinna, has an effect on the spectrum of sound reaching the eardrum, while the concha (the section of the ear leading to the ear canal) has an effect on the frequency at which the ear canal resonates. Together, these two parts of our ears control the spectral shape (frequency response) of the sounds reaching the eardrum. In other words, the system functions as a sort of multiple filter, emphasizing some frequencies and attenuating others, while let ting some get through without any change at all. Response changes with both azimuth and elevation and, together with our binaural (two-ear) capability, helps us determine whether a sound is coming from above, below, left, right, ahead, or behind. Examples of the response of our hearing system to frontal sounds, and to sounds coming at us from behind, are shown in Fig. 1. Our ears would exhibit still another frequency response for sounds coming at us from either the left or right side. Much of what I read in connection with my development of SRS was clear about these phenomena. Let me quote from the introduction of the 1977 paper by Robert A. Butler and Krystyna Belediuk, "Spectral Cues Utilized in Localization of Sound in the Median Sagittal Plane." Spectral cues provided by the pinnae are essential for localization of sound in the median sagittal plane (MSP). Distort the pinnae or occlude their cavities, and the listener is unable to locate at a level of accuracy exceeding that expected by chance (Roffler and Butler, 1968; Gardner and Gardner, 1973). Localization performance in the MSP for sounds recorded via an acoustic manikin and played back through headphones has been shown to be distinctly inferior to that associated with free-field listening (Damaske and Wagener, 1969). Searle and his associates (1975), however, placed insert microphones into the ear canals of five subjects and recorded the output when noise bursts emanated from the MSP. Four of the five participants were able to identify, with reason able accuracy, the loudspeakers which had originally generated the sound currently being played back via headphone. The importance of the outer ear structure and the ear canal in the ability to judge the vertical location of a sound had also been researched over the years, as is evident from the following introduction to a 1974 paper by Jack Henbrank and D. Wright, "Spectral Cues Used in Localization of Sound Sources on the Median Plane." Auditory perception of elevation involves a complex interaction of several localization subsystems. For sound located off the median plane (position having nonzero azimuth angles), auditory localization cues comprise inter-aural time differences (ITD), interaural amplitude differences (IAD) and directional filtering of the external ears [emphasis mine], as well as changes in all these cues during head motion. Though the IAD and ITD mechanisms have been investigated and explained through azimuthal localization experiments, the generation and processing of external ear filtering cues are poorly understood. Finally, in one of the most significant papers on the subject of spatial perception, E. A. G. Shaw, in his 1974 paper "Transformation of Sound Pressure Level from Free Field to the Eardrum in the Horizontal Plane," states in his introduction: The transformation of progressive waves by the head and external ear is of recurring interest in psychoacoustics and audio engineering. It is intimately connected with the spatial perception of sound, provides the essential link be tween auditory measurements in the free field and with earphones, is relevant to noise control, and is an underlying factor in architectural acoustics. From the abstract of the same pa per, "Sheets of data are presented showing transformation to the ear drum, azimuthal dependence, and interaural difference as a function of frequency from 200 Hz to 12 kHz at 45-degree intervals in azimuth. Other sheets show azimuthal dependence and interaural difference as functions of azimuth at 24 discrete frequencies."
Fig. 2B--Microphones' polar pickup patterns affect their frequency response as well as their overall sensitivity. Omnidirectional microphones have the most constant response to sounds from all directions. Bidirectional mikes have identical response (but opposite polarities) for front and rear sounds. Minimum sensitivity, and greatest change in frequency response, occur at the sides of a bidirectional mike or to the rear of a cardioid.
During my studies of these and many other scholarly papers, I began to realize why the usual signal chain from microphones to reproducing loudspeakers could never hope to accurately reproduce sound fields as we hear them in real-life situations. Microphones used in making recordings don't behave like human ears. As can be seen in Fig. 2, omnidirectional microphones have relatively flat frequency response for sounds coming from all directions. Cardioid, or directional, microphones have flat response for sounds coming from the sides and from the front, but are "dead" to rear sounds. So, during playback, if sounds that originally came from one side or the other are reproduced by speakers located up front, those sounds are heard with incorrect spectral response, since they are no longer coming from the sides but from in front. The result is a spatial distortion of the sound field. We are prevented from hearing the proper spatial cues of what was originally performed. The SRS technique helps to correct these problems by processing the electrical signals so that spatial cues are restored. A block diagram of the system, Fig. 3, is derived from the granted patent itself and shows much of what takes place in the SRS circuit- ry. SRS takes the available left (L) and right (R) signals and first combines the left and right channel signals (L and R) to create a sum signal (L + R). It then takes one from the other to get difference signals (L- R) and (R- L). These two types of signals are then subjected to various forms of processing and equalization. Ambience and spatial characteristics are derived from the processed difference signals. Dia log, vocalist, and soloist sounds are derived from the processed sum signal. Once the complex and dynamic processing sequence has taken place, the revised (L + R) and (L- R) or (R- L) signals-perhaps we should call them (L + R)', (L- R)', and (R- L)' signals-are matrixed back together in the familiar fashion used in stereo FM and in stereo TV. That is, using simple algebra, (L + R) added to (L- R) yields a new L signal (actually 2L, but the "2" is simply an amplitude coefficient and can be ignored), and the new (R- L) signal is added to the new (L + R) signal to form a new R signal. In stereo, sounds coming from up front produce equal-amplitude sounds in both channels and are therefore present in the sum, or (L + R), signal. Ambient, reflected, and side signals produce a complex sound field and are present primarily in the difference signals (L- R) and (R- L). The Hughes Sound Retrieval System processes the difference signals to bring back the missing spatial cues and directional information. The difference signals are then dynamically increased in amplitude to increase the apparent image width. However, since the ear has increased sensitivity to midrange frequencies, selective emphasis of the difference signals is necessary in order to produce a realistically wider stereo image without introducing any annoying image shifts. This selective emphasis, or boost, of certain frequencies in the difference signal accomplishes several things. For the quieter difference-signal components, it further enhances the stereo image by restoring the ambience of the live performance--ambience that, in a recorded performance, is ordinarily masked by the louder, direct sounds. It also provides for a much wider listening area. Listeners can walk about the room and still retain a sense of direction of all the instruments and soloists in an orchestral work. There is no more having to sit in that "X marks the spot" location mid way between the two loudspeakers! There's a lot more going on in the I SRS circuit than what I have briefly described. There are any number of electronic servo-control systems that detect the content of the music and dynamically adjust both the levels and the spectral content of the sum and difference signals. The selective boosting of the difference signal, for example, is automatically adjusted while music signals are applied so that the perceived stereo effect is relatively consistent. Without such automatic adjustment, the amount of enhancement provided would have to be manually adjusted for different program material. For instance, to avoid inappropriate excessive boosting of the artificial reverberation that is sometimes added to stereo recordings, the enhancement circuit of SRS de-emphasizes the frequency range in which excessive reverberation of a center soloist is most likely to occur. This area is then reinforced by appropriate injection or addition, under servo control, of the (L + R) signal. The perceived effect of this correction is that the amount of artificial reverberation does not change appreciably when the SRS circuit is turned on and off. Initially, it was the intent of Hughes to simply license the SRS technology to various manufacturers of audio and/or video equipment. It was clear to us that with the emergence of stereo television sound (which was just becoming popular at the time SRS was perfected), TV sets containing a pair of necessarily closely spaced speakers could benefit from SRS. The first manufacturer to obtain a license from Hughes was Sony, which has since incorporated SRS into many of its high-end stereo TV models. A second licensee, whose SRS-equipped TV sets have become avail able more recently, is Thomson Consumer Electronics; their sets are marketed in America under the RCA brand name. In the Sony and RCA televisions equipped with SRS, there is no need for user controls other than an SRS on/ off switch. That's because the speakers are at a predetermined, known distance apart inside the TV and because stereo TV broadcasts are generally handled in a fairly predictable manner, with dialog channeled to the "center" of the sound field while special effects and background music are generally given the full stereo treatment. Right from the start, Sony wisely elected to provide the on/off option so that owners could easily compare stereo performance with and without SRS activated. From what I have been able to determine, most owners leave SRS on all the time, preferring the more spacious imaging that the system pro vides. It offers the kind of stereo enhancement that's very easy to get used to. Not only is there a fantastic spread of stereo sound, but when videotapes or videodiscs intended for surround sound reproduction are played, much of the illusion of sound coming at you from all around the room is maintained without having to add any loudspeakers besides the pair in the TV! Hughes markets a stand-alone SRS signal processor, which has been assigned the model number AK-100. It has several front-panel controls that are necessary because of the variety of program material available from CDs, tapes, videodiscs, and radio and TV broadcast and cable sources. If the given signal source has good-quality stereophonic sound with good separation between left and right channels, a user need only press the SRS and power switches on the front panel of the unit. The AK-100 also includes a stereo synthesizer circuit that I consider unique, a control for subtly reducing reverberation on vocalists and de creasing ambient information in quiet passages, plus a filter to reduce the rumble that can be heard in some programs. (For a full description of the AK-100, see the "Equipment Profile" in last April's issue.) In addition to the SRS, several other companies have introduced stereo-enhancement systems. Most of these are intended for use by recording studios when they create their software in the final mastering or mixdown process. What I feel is unique and different about SRS is that it requires no encoding or processing during software creation. Virtually any conventional stereo program source will benefit when played back on a system in which SRS has been incorporated. I would emphasize, finally, that contrary to many other stereo-enhancement systems, SRS neither employs any sort of time delay nor manipulates the phase relationships of the signals being processed. In effect, it recognizes and identifies what portion of the program being passed through it requires modification to restore proper spatial cues, and it performs such selective modification on a dynamic and continuing basis. BIBLIOGRAPHY (Partial) Angell, J. R. and W. Fite, "The Monaural Localization of Sound," Psychology Review, 1901, Vol. 8, pp. 225-246. Angell, J. R. and W. Fite, "Further Observations on the Monaural Localization of Sound," Psychology Review, 1901, Vol. 8, pp. 449-458. Batteau, D. W., "Characteristics of Human Localization of Sound," Final Report, May 1961, prepared under Contract No. N123-(60530) 23545A for U.S. Naval Ordnance Test Station, China Lake, Cal. Blauert, J., "Sound Localization in the Median Plane," Acoustica, 1969, Vol. 22, pp. 205-213. Bloom, P. J., "Creating Source Elevation Illusions by Spectral Manipulation," presented at the 53rd Convention of the Audio Engineering Society, Zurich (unpublished). Bothe, S. J. and L. F. Elfner, "Monaural vs. Binaural Auditory Localization for Noise Bursts in the Median Vertical Plane," J. Aud. Res., 1972, Vol. 12, pp. 291-296. Butler, R. A., "Monaural and Binaural Localization of Noise Bursts Vertically in the Median Sagittal Plane," J. Aud. Res., 1969, Vol. 3, pp. 230-235. Butler, R. A. and N. Planert, "The Influence of Stimulus Bandwidth on Localization of Sound in Space," Percept. Psychophys., 1976, Vol. 19. Butler, R. A. and K. Belediuk, "Spectral Cues Utilized in Localization of Sound in the Median Sagittal Plane:" J. Acoust. Soc. Am., 1977, Vol. 61, pp. 1264-1269. Damaske, P. and B. Wegener, "Richtung shorversuche Uber Einen Nachgebilde ten Kopf," Acoustica, 1969. Vol. 21, pp. 30-35. Flynn, W. E. and D. N. Elliott, "Role of the Pinna in Hearing," J. Acoust. Soc. Am., 1965, Vol. 38, pp. 104-105. Gardner, M. B. and R. S. Gardner, "Problem of Localization in the Median Plane: Effect of Pinna Cavity Occlusion." J. Acoust. Soc. Am., 1973, Vol. 53, pp. 400-408. Hartley, R. V. L. and T. C. Fry, ''The Binaural Localization of Pure Tones," Phys. Rev., 1921, Second Series, Vol. 18, pp. 431 442. Hebrank, J. H. and D. Wright, "Are Two Ears Necessary for Localization of Sounds on the Median Plane?" J. Acoust. Soc. Am., 1974, Vol. 56, pp. 935-938. Hebrank, J. H. and D. Wright, "Spectral Cues Used in the Localization of Sound Sources on the Median Plane," J. Acoust. Soc. Am., 1974, Vol. 56, pp. 1629-1634. Hirsch, I. J., The Measurement of Hearing, McGraw-Hill, New York, 1952, pp. 105 110. Jonkees, L. W. B. and J. J. Groen, "On Directional Hearing," J. Laryngol. Otol., 1946, Vol. 61, pp. 494-504. Mehrgardt, S. and V. Mellert, "Transformation Characteristics of the External Human Ear," J. Acoust. Soc. Am., 1977, Vol. 61, pp. 1567-1576. Pratt, C. C., "The Spatial Character of High and Low Tones," J. Exptl. Psycho!, 1930, Vol. 13, pp. 276-285. Roffler, S. K. and R. A. Butler, "Factors That Influence the Localization of Sound in the Vertical Plane," J. Acoust. Soc. Am., 1968, Vol. 43, pp. 1255-1259. Searle, C. L., L. D. Braida, D. R. Cuddy, and M. F. Davis, "Binaural Pinna Disparity: Another Auditory Localization Cue," J. Acoust. Soc. Am., 1975, Vol. 57, pp. 448 455. Shaw, E. A. G. and R. Teranishi, "Sound Pressure Generated in an External Ear Replica and Real Human Ears by a Near by Point Source," J. Acoust. Soc. Am., 1968, Vol. 44, pp. 240-249. Shaw, E. A. G., "Acoustic Response of External Ear with Progressive Wave Source," J. Acoust. Soc. Am., 1972, Vol. 51, p. 150. Shaw, E. A. G., "Acoustic Response of External Ear Replica at Various Angles of Incidence," J. Acoust. Soc. Am., 1974, Vol. 55, p. 432. Shaw, E. A. G., "Transformation of Sound Pressure Level from Free Field to the Ear drum in the Horizontal Plane," J. Acoust. Soc. Am., 1974, Vol. 56, pp. 1848-1861. Shaw, E. A. G., "Physical Models of the External Ear," 8th Intl. Cong. Acoust., London, 1974. Shaw, E. A. G., "The External Ear," in Hand book of Sensory Physiology, edited by W. D. Keidel and W. D. Neff, Springer Veilag, Berlin, 1974, Vol. V, Chap. 14. Teranishi, R. and E. A. G. Shaw, "External Ear Acoustic Models with Simple Geometry," J. Acoust. Soc. Am., 1968, Vol. 44, pp. 143-146. Thurlow, W. R. and P. S. Runge, "Effect of Induced Head Movement on the Localization of the Directional Sounds," J. Acoust. Soc. Am., 1967, Vol. 42, pp. 480 488. Trimble, 0. S., "Localization of Sound in the Anterior, Posterior and Vertical Dimensions of Auditory Space," Brit. J. Psycho!., 1934, Vol. 24, pp. 320-324. Wallach, H., "The Role of Head Movements and Vestibular and Visual Cues in Sound Localization," J. Exptl. Psycho!., 1940, Vol. 27, pp. 339-368. Wiener, F. M., "On the Diffraction of a Progressive Sound Wave by the Human Head," J. Acoust. Soc. Am., 1947, Vol. 19. pp. 257-263. Wiener, F. M. and D. A. Ross, "The Pressure Distribution in the Auditory Canal in a Progressive Sound Field," J. Acoust. Soc. Am., 1946, Vol. 18, pp. 401-405.
(adapted from Audio magazine, Aug. 1992) Also see: Perception and Geometry [by Richard C. Heyser; June 1977] Listening and Experience (Aug. 1992) = = = = |
Prev. | Next |