Psychology of Sound Reproduction (June 1972)

Home | Audio Magazine | Stereo Review magazine | Good Sound | Troubleshooting


Departments | Features | ADs | Equipment | Music/Recordings | History




by Harry F. Olson [RCA Laboratories, Princeton, N.J.]

THE PSYCHOLOGY OF SOUND REPRODUCTION involves the psychology of music, speech, and emotional responsiveness. The psychology of music provides the law and order in the structure and operation of the human hearing mechanism and the musical mind. The psychology of speech involves the factors which determine the communication of information through the human hearing mechanism to the brain. The psychology of emotional responsiveness to music and speech may be expressed as the degree and kind of perception and sensation which is produced. The sensation and perception of music involves imagery, memory, and mentality.

These attributes are related to the musical mind in finding music and speech attracting or repelling, pleasing or antagonizing, and tolerable or intolerable.

The purpose of this paper is to present an exposition on the psychology of sound reproduction involving the psychology of music, speech, hearing, distortion, noise, and spatial phenomena.

PSYCHOLOGY OF MUSIC

The psychology of music is the science of musical experience and behavior. Psychology provides a working knowledge of the performance of the musical mind. Accordingly, the fundamentals for the classification of events in musical experience and behavior can be established. A scientific musical terminology can be developed from these fundamentals. Employing the definitions from the terminology, research in the psychology of music has shown that the musical listener must have four capacities for apprehending and appreciating all music, namely, the sense of pitch, the sense of loudness, the sense of time, and the sense of timbre.

The foregoing simplifies the understanding of the capacity of the musical mind in that each of these four basic functions appear in such complex musical forms as harmony, melody, dynamics, rhythm, volume, and tone quality. These musical forms are influenced by other important musical properties involved in listening to live or reproduced music such as localization, and perspective of the sound sources, acoustic ambience, reverberation, and other spatial phenomena.

The preceding brief introductory exposition shows that the psychology of music relates to the reproduction of sound in a very important and significant manner.

The purpose of this section is to describe the major psychological phenomena involved in the reproduction of music.


Fig. 1--Physical properties of a musical tone.

Properties of a Tone

The psychological properties of sound, namely, pitch, loudness, time, and timbre depend upon the physical characteristics of the sound wave, namely, frequency, amplitude, duration, and waveform. The four physical characteristics of a sound wave completely describe every type of sound wave whether original or reproduced. The musical mind must be capable of apprehending and appreciating, to a more or less degree, the four characteristics of a sound wave representing a musical selection.

The four physical characteristics can be broken down into seven physical characteristics which are more specific and individualistic in describing a musical tone, namely, frequency, intensity, growth, steady state and decay (duration), portamento, timbre, vibrato and deviations. These characteristics of a tone are depicted in Fig. 1.

The following definitions of tone properties are descriptive rather than absolutely rigorous presentations of the formal and somewhat abstruse language of the standards.

Frequency of a sound wave is the number of cycles occurring per unit of time, measured in Hertz. The subjective counterpart of frequency is pitch.

Intensity of a sound wave is energy transmitted per unit of time. The intensity is usually expressed in decibels above the threshold of hearing at 1000 Hz, which is 0 dB = 10^-16 W/cm^2.

The subjective counterpart of intensity is loudness.

Growth is the time required for a sound to build up to some fraction of the ultimate value.

Steady state of a sound is the length of time in which there is no change in the intensity.

Decay is the time required for sound to fall to some fraction of the original value. Note that growth, steady state, and decay lumped together become the envelope of a tone.

Duration is the length of time that a sound persists without interruption or discontinuity in the output.

Portamento is a uniform glide in frequency from a sound of one frequency to a sound of another frequency. Portamento is also termed a frequency glide.

A complex sound wave is made up of the fundamental tone and overtones. The timbre or spectrum of a tone is expressed in the number, intensity, and phase relations of the components; that is, the fundamental and overtones or partials.

Vibrato is a low-frequency modulation of a musical tone.

This may result from either frequency modulation or amplitude modulation or a combination of both. In general, the modulation frequency is of the order of 7 Hz. Tremolo, a special case of vibrato, is created by amplitude modulation only.

Deviation is a departure from the regular and is one of the beautiful and artistic characteristics of some types of music.

With reference to the preceding descriptions, many of the properties of a tone are interdependent. For example, timbre is influenced by the attack, decay, portamento, vibrato, etc.

When the properties of a tone as just defined and depicted in Fig. 1 are specified, the tone can be completely described.

Furthermore, the tone can be produced from these specifications by providing electronic means for generating its characteristic properties.


Fig. 2--The loudness index contours for octave bands.


Fig. 3--The tolerable or preferred top sound level as a function of the volume of the room. The bar graphs show the tolerable top sound level and the threshold sound level due to the ambient noise for a concert hall and a room in the home.


Fig. 4--The frequency ranges required for the reproduction of speech, musical instruments, and noises without any notice able frequency distortion or discrimination.


Fig. 5--The effect of frequency range upon the quality of orchestral music. HP is high pass filter, that is, all frequencies below the frequency given by the abscissa removed. LP is low pass filter, that is, all frequencies above the frequency given by the abscissa removed. The data is for a quadraphonic sound reproducing system.


Fig. 6--The effect of nonlinear distortion upon reproduced speech and music depicting O-objectionable, T-tolerable and P-perceptible nonlinear distortion for various high frequency cutoffs. The spectrums showing four values of nonlinear distortion are typical.


Fig. 7--The transient response to a tone burst of frequency fR. A--A sound reproducing element with uniform response. B--A sound reproducing element with a peak in the response frequency fR. C--A sound reproducing element with a dip […] at frequency fR.

Loudness

Loudness of a sound is the magnitude of the auditory sensation produced by the sound. The units on the scale of loudness should agree with the common experience in the estimates made upon the sensation magnitude. A true loudness scale must be constructed so that when units are doubled the sensation will be doubled, when the scale is trebled the sensation will be trebled, etc. The sone is the unit of loudness. By definition, a pure tone of 1000 Hertz, 40 dB above the listener's threshold produces a loudness of 1 sone. The loudness of another sound as judged by the listener to be n times the loudness of 1 sone is n sones. The loudness level of a sound, in phons, is numerically equal to the sound pressure level, in dB relative to the threshold of 0.0002 microbar, of a free progressive plane sound wave of 1000 Hertz which is judged to be equally loud.

The relation between loudness and loudness level is given by

S = 2^(p-40)/10 (1)

Where

S = loudness, in sones and

P = loudness level, in phons.

To establish the loudness of a complex sound, such as music, at least three specifications must be available as follows:

1. A scale of loudness termed the sone scale, the relation between sones and phons is given by equation 1.

2. The equal loudness contours for discrete frequency bands of the complex sound. The octave is a convenient frequency band.

3. The rule by which loudness adds as the discrete frequency bands of the sound are added.

Specifically, after the sound pressure level in each octave band has been determined, the next step is the proper summation in each octave band of these data. The relation between the loudness and the loudness index is given by

ST = 0.75SM + 0.3 Sigma S (2)

where ST= total loudness of the complex sound, in sones,

S = loudness index in each octave band and

SM = greatest of the loudness indices.

The loudness index in each octave band is obtained from the graph of Fig. 2. The use of equation 2 and Fig. 2 makes it possible to determine the loudness of a complex sound.

There is a very definite relationship between the preferred or tolerable top level of sound reproduction and the volume of the room. Subjective tests have been carried out on the preferred or tolerable top level of sound reproduction and the volume of the room. The results are shown in Fig. 3. The preferred or tolerable top sound level in the home is 80 dB. For the concert hall, the top level is 100 dB. The threshold due to the ambient noise is 30 dB for the average home and the threshold due to ambient noise in the concert hall is 35 dB. The bar graphs of Fig. 3 depict the threshold levels and the preferred or tolerable levels for the home and the concert hall. The bar graphs show that the amplitude range in the home and the concert hall are 50 dB and 65 dB respectively.

Quality

The quality of the reproduced music is a subjective property describing the degree of resemblance of the reproduced to the original music.

To obtain resemblance of the reproduced music with the original music requires a high order of fidelity of performance, particularly from the standpoints of frequency range, nonlinear distortion, transient response, and noise.

The individual tones of musical instruments are composed of the fundamental and the partials or overtones. The relationship between the fundamental and the partials and the various partials must be maintained in order to preserve the quality of the reproduced music.

The frequency ranges required to reproduce speech, musical instruments, and some noises without any noticeable frequency discrimination is shown in Fig. 4.

The effect of frequency discrimination upon the quality of orchestral music is depicted in Fig. 5. To reproduce orchestral music with no discernible frequency discrimination requires a frequency range of 30 to 15,000 Hertz (Many experts would extend the upper limit to 18,000 or 20,000 Hz.-Ed.) A sound reproducing system which introduces nonlinear distortion generates new partials and modifies the relative amplitudes of the original partials. The effects of nonlinear distortion upon the reproduction of music has been determined.

Typical spectrums of the nonlinear distortion for four values of nonlinear distortion are shown in Fig. 6. There are three subjective levels of nonlinear distortion, namely, perceptible, tolerable, and objectionable. Perceptible is the amount of distortion required to be just discernible. Tolerable and objectionable are not as definite terms and are a matter of opinion. By tolerable distortion is meant the amount of distortion that could be allowed in medium-grade consumer sound reproducing systems as exemplified by the phonograph, magnetic tape, radio, and television. By objectionable distortion is meant the amount of distortion that would be definitely unsatisfactory for the reproduction of sound in consumer sound reproducing systems. Referring to Fig. 6 it will be seen that the amount of perceptible, tolerable, and objectionable nonlinear distortion decreases as the high frequency cutoff increases. To reproduce music over the entire audio frequency range with imperceptible distortion requires a system with less than one percent nonlinear distortion.


Fig. 8--An enclosure containing three sound sources, S1, S2 and S3 and a listener. D1, D2 and D3 represent the direct pencils of sound. R 11, R 12 and R13 represent reflected pencils of sound with one, two, and three reflections. The direct sound supplies the auditory perspective and the reflected sound supplies the acoustic ambiance or reverberation envelope.


Fig. 9--The effect of frequency range upon the syllable articulation of speech. HP-High pass filter, that is, all frequencies below the frequency given by the abscissa removed. LP--Low pass filter, that is, all frequencies above the frequency given by the abscissa removed.

All speech, voice and music are of a transient character. Therefore, the transient response of a sound reproducing system is an important performance characteristic. Poor transient response alters the envelope of the sound, that is, the growth, duration, and decay. The response of a sound reproducing system to a tone burst depicts the transient response.

The transient response of sound reproducing elements with various frequency response characteristics is shown in Fig. 7.

In general, if the response is uniform as shown in Fig. 7A, the transient response will be good. If there is a peak in the frequency response as shown in Fig. 7B, there will be a lag` in the growth. Following the end of tone burst input, there is a hangover in the response which decays with time. If there is a dip in the response as shown in Fig. 7C, there will be an immediate full response followed by a decrease to a lower steady level.

Following the end of the tone burst input, there is a sudden rise in output followed by a decay. The data of Fig. 7 shows that poor transient response changes the envelope of the musical tone. Poor transient response destroys the clarity and incisiveness of a musical tone.

Noise

Noise is any undesired audio signal in a sound reproducing system. In general, noise is an erratic, intermittent or statistically random oscillation. Some disc record reproduction exhibits "ticks" of noise due to an imperfection in the record groove. Some magnetic tape reproduction exhibits ticks of noise due to "dropouts," that is, imperfections of the tape coating. Most noise is, however, of a statistically random type covering the entire audio range and is manifested as "hiss." Noise is present to some degree in all sound reproducing systems. The objective is a noise level that is imperceptible.

This is indeed difficult to achieve at the present stage of the art. Most high quality systems can be designed so that the annoyance is negligible. To achieve this order of performance requires a signal-to-noise ratio of at least 60 dB. Under these conditions the noise of the sound reproducing system will be lost in the ambient noise of the room.

Perspective and Ambience

The normal human hearing mechanism combined with a […] mind can apprehend all the tonal characteristics of the […] that enter the ear. In general, the human hearing mechanism operates in a field of sound, that is, the sound source and the ears are separated in space. In addition, for the most part, the sound source and the ears are located in an enclosure.

The human hearing mechanism is binaural and thereby attributes a directional sense to the sound that is received so that the source of sound can be localized. That is, the binaural hearing mechanism provides the means for placing the sound sources in perspective.

When a sound source operates in a room, the acoustic ambience or reverberation envelope is comprised of sound that has encountered one, two, three, etc. reflections from the boundaries before impinging upon the listener's ear.

The field effects of sound waves, namely, perspective and acoustic ambience or reverberation envelope, will be described as follows: Three sound sources Si, S2, and S3 are located in an enclosure as depicted in Fig. 8. The listener determines the angular location of the three sound sources from the direct pencils of sound Di, Dz and D3. This angular localization is obtained by means of the human binaural hearing mechanism. The angular discrimination is the matter of only a few degrees in a part of the audio frequency range. The angular localization provides the listener with auditory perspective of the sound sources.

The sounds emanating from the sound sources are reflected by the walls, ceiling and floor. The reflections in two dimensions are depicted in Fig. 8. Only a few of the pencils of sound that have been reflected one, twice, and three times are shown in Fig. 8. The intensity decreases with each reflection due to absorption of sound by the boundaries. The decrease in the level of each reflection continues and the amplitude of the reflected sound is ultimately lost in ambient noise level. The reflected sounds provide the acoustic ambience or reverberation envelope.

Perspective and ambience are important psychological factors in apprehending and appreciating original and reproduced sound in an enclosure. The optimum acoustic ambience which is determined by the acoustical performance of the enclosure depends for the most part upon the reverberation time of the enclosure.

The auditory perspective of the original sound is reproduced in stereophonic and quadraphonic sound reproducing systems.

The auditory perspective and acoustic ambience are reproduced in the quadraphonic sound reproducing system.

PSYCHOLOGY OF SPEECH

The psychology of speech is the science of the transfer of intelligence by means of the sounds of speech. Psychology provides a workable insight into the nature of perception of speech sounds by the human hearing mechanism and the mind.

There are two fundamental quantities involved in the transmission of speech, namely, intelligence and resemblance. The transmission of the intelligence of speech is determined by the articulation. The transmission of resemblance is determined by the quality.

Articulation

The recognition or intelligibility of speech is an important aspect of a sound reproducing system involving the transmission of information. In measuring speech recognition through a transmission system the speaker reads aloud sounds, syllables or words to a listener who writes down what he thinks he hears. A comparison of sounds, syllables, or words recorded by the listener with those uttered by the speaker provides the fraction of what is interpreted correctly. The fraction is termed sound and syllable articulation and word intelligibility.

There are three types of recognition measurements involving sounds, syllables, and words. Sound articulation refers to the use of speech sounds such as "p," "a," "t," etc. Syllable articulation refers to the use of syllables such as "pat," "run," "eat," etc. Word intelligibility refers to the use of the complete word.

The effect of reducing the high and low frequency ranges upon syllable articulation of speech at a normal conversational level is shown in Fig. 9. A consideration of Fig. 9 shows that a relatively high articulation can be obtained with a very narrow frequency range. However, the quality of the reproduced speech is very much impaired by transmission over a narrow frequency band. From the standpoint of articulation, a limited frequency range may be actually superior to a wider frequency range because of the introduction of additional noises and distortions in a wider band, unless particular precautions are observed. In the case of speeches, plays, and songs a limited frequency range impairs the quality and artistic value of the reproduced sound.

Quality

The quality of reproduced speech is a subjective property describing the degree of resemblance of the reproduced speech to the original speech.

To obtain resemblance of the reproduced speech with the original speech requires a high order of performance particularly from the standpoints of frequency range, nonlinear distortion, transient response and noise.

To reproduce speech without any deterioration of the quality requires a frequency range of 70 to 15,000 Hertz. The effect of the frequency range upon the quality of speech is shown in Fig. 10. This data also shows that to reproduce speech without any deterioration of quality requires a frequency range of 70 to 15,000 Hertz.

A sound reproducing system which introduces nonlinear distortion generates new partials and modifies the original partials. As in the case of music the introduction of nonlinear distortion in the reproduction of speech deteriorates the quality of speech. Typical spectrums of nonlinear distortion for four values of nonlinear distortion are shown in Fig. 6.

As in the case of music, there are three levels of nonlinear distortion, namely, perceptible, tolerable, and objectionable.

To reproduce speech over the entire audio frequency range with imperceptible distortion requires a system with less than one percent nonlinear distortion.

The subject of transient response was considered in a preceding section. Poor transient response destroys the brightness of speech.

The exposition on noise relative to the reproduction of music also applies to the reproduction of speech.


Fig. 10--The effect of frequency range upon the quality of speech. HP--High pass filter, that is, all frequencies below the frequency given by the abscissa removed. LP--Low pass filter, that is, all frequencies above the frequency given by the abscissa removed.

PSYCHOLOGY OF SOUND REPRODUCTION

Perfect Transfer Characteristics

Modern sound reproduction involves two transfer characteristics, namely, the perfect and the ideal transfer characteristic. In the perfect transfer characteristic there is a constant relationship between the output and input parameters that define the signal. The perfect transfer characteristic provides the means for achieving realism in sound reproduction. To achieve realism in a sound reproducing system four conditions must be satisfied as follows:

1. The frequency range must be such as to include without frequency discrimination all the audible components of the various sounds to be reproduced.

2. The volume range must be such as to permit noiseless and distortion-less reproduction of the entire range of intensities associated with the sounds.

3. The spatial sound pattern (auditory perspective) of the original sound should be preserved in the reproduced sound.

4. The reverberation or acoustic ambience of the original sound be approximated in the reproduced sound.

The requirements for satisfying the above conditions have been discussed in the preceding sections.

A high order of realism can be achieved if the four conditions outlined above are satisfied. However, with the electronic means available today the emotional responsiveness can be extended beyond that of simulating the realism of the original recorded music.

Ideal Transfer Characteristic

In the ideal transfer characteristic the relationship between the output and input parameters defining the signal is modified as dictated by subjective aspects involving realism and emotionalism. In general, in order to attain the ideal transfer characteristic by the application and implementation of modifications to elevate the subjective aspects of sound reproduction the start must be from a perfect transfer characteristic.

In 95 percent of the records produced today some sort of modification is used to heighten the artistic and emotional impact and thereby lead towards an ideal transfer characteristic. Delayers, frequency and timbre modifiers, vibrato and tremolo generators, reverberators, and nonlinear and fuzz producers are some of the electronic devices employed to modify the original recorded music to produce the final product. In these modifications there may be changes in the spatial sound pattern from the original or conventional. For example, there are the possibilities of sound sources in rapid motion which is impossible in original sound. The reverberation or acoustic ambience can be very varied rapidly and from instrument to instrument. There are almost limitless possibilities in the modifications involving the subjective aspects leading to an ideal transfer characteristic.

REFERENCES

C. E. Seashore, "Psychology of Music," McGraw Hill Co., New York, N.Y., 1938. This book provides a basic exposition on the psychology of music and therefore supplies the foundation for the psychology of sound reproduction.

H. F. Olson, "Acoustical Engineering," Van Nostrand Reinhold Co., New York, N.Y., 1957.

H. F. Olson, "Music, Physics and Engineering," Dover Publications Inc., New York, N.Y., 1967.

H. F. Olson, "Electronic Music Synthesis for Recording," IEEE Spectrum, Vol. 8, No. 3, p. 18, 1971.

H. Fletcher, "Speech and Hearing in Communication," Van Nostrand Reinhold Co., New York, N.Y., 1953.

R. Miyagawa, T. Nakayama and T. Miura, "Design of Reproduced Sound by the ESP Method," Reports of the 6th International Congress on Acoustics, Tokyo, 1968.

W. B. Snow, " Audible Frequency Ranges of Music, Speech and Noise," Jour. Acous. Soc. Amer., Vol. 3, No. 1, Part I, p. 155, 1931.

H. F. Olson, "The Measurement of Loudness," Audio, Vol. 56, No. 2, p. 1F February, 1972.

(Audio magazine, Jun. 1972)

Also see:

Equalization of Sound Reinforcement Systems (Nov. 1972)

The Transient Power of A X-Tereo System (Feb. 1970)

= = = =

Prev. | Next

Top of Page    Home

Updated: Wednesday, 2019-03-20 9:42 PST