Digital Audio: Sound quality considerations (part 2)

Home | Audio mag. | Stereo Review mag. | High Fidelity mag. | AE/AA mag.

6. Subjective testing

Subjective testing can only be carried out by placing the device under test in series with an existing sound-reproduction system. Unless the DUT is itself a loudspeaker, the testing will only be as stringent as the loudspeakers in the system allow. Unfortunately the great majority of loudspeakers don’t reach the standard required for meaningful subjective testing of units placed in series and consequently the majority of such tests are of questionable value.

If useful subjective testing is to be carried out, it’s necessary to use the most accurate loudspeakers available and to test the loudspeakers themselves before using them as any kind of reference. Whilst simple tests such as on-axis frequency response give an idea of the performance of a loudspeaker, the majority produce so much distortion and modulation noise that the figures are not even published.

Digital audio systems potentially have high signal resolution, but subjective testing of high-performance convertors is very difficult because of loudspeaker limitations. Consequently it’s important to find listening tests which will meaningfully assess loudspeakers, especially for linearity and resolution, whilst eliminating other variables as much as possible.

Linearity and resolution are essential to allow superimposition of an indefinite number of sounds in a stereo image.

Non-linearity in stereo has the effect of creating intermodulated sound objects which are in a different place in the image from the genuine sounds.

Consequently the requirements for stereo are more stringent than for mono. This can be used for speaker testing. One stringent test is to listen to a high-quality stereo recording in which multi-tracking has been used to superimpose a number of takes of a musician or vocalist playing/singing the same material. The use of multi-tracking reduces the effect of intermodulation at the microphone and ADC as these handle only one source at a time. The use of a panpot eliminates any effects due to inadequate directivity in a stereo microphone.

It should be possible to hear how many simultaneous sources are present, i.e. whether the recording is double, triple or quadruple tracked, and it should be possible to concentrate on each source to the exclusion of the others.

It should also be possible to pan each version of a multi-tracked recording to a slightly different place in the stereo image and individually identify each source even when the spacing is very small. Poor loudspeakers smear the width of an image because of diffraction and fail this test.

In another intermodulation test it’s necessary to find a good-quality recording in which a vocalist sings solo at some point, and at another point is accompanied by a powerful low-frequency instrument such as a pipe organ or a bass guitar. There should be no change in the imaging or timbre of the vocal whether or not the LF is present.

Another stringent test of linearity is to listen to a recording made on a coincident stereo microphone of a spatially complex source such as a choir.

It should be possible to identify the location of each chorister and to concentrate on the voice of each. The music of Tallis is highly suitable.

Massed strings are another useful test, with the end of Barber's Adagio for Strings being particularly revealing. Coincident stereo recordings should also be able to reproduce depth. It should be possible to resolve two instruments or vocalists one directly behind the other at different distances from the microphone. Loudspeakers which cannot pass these tests are not suitable for subjective quality testing.

When a pair of reference-grade loudspeakers has been found which will demonstrate all the above effects, it will be possible to make meaningful comparisons between devices such as microphones, consoles, analog recorders, ADCs and DACs. Quality variations between the analog outputs of different CD players or DAT machines will be readily apparent.

Those which pay the most attention to convertor clock jitter are generally found to be preferable. Very expensive high-end CD players are often disappointing because these units concentrate on one aspect of performance and neglect others.

One myth which has taken a long time to be revealed is the belief that a low-grade loudspeaker should be used in the production process so that an indication of how the mix will sound on mediocre consumer equipment will be obtained. If an average loudspeaker could be obtained this would be possible. Unfortunately the main defect of a poor loudspeaker is that it stamps its own characteristic footprint on the audio. These footprints vary so much that there is no such thing as an average poor loudspeaker and people who make decisions on cheap loudspeakers are taking serious risks. It’s a simple fact that an audio production can never be better than the monitor loudspeakers used and the author's extensive collection of defective CDs indicates that good monitoring is rare.

7. Digital audio quality

In theory the quality of a digital audio system comprising an ideal ADC followed by an ideal DAC is determined at the ADC. This will be true if the digital signal path is sufficiently well engineered that no numerical errors occur, which is the case with most reasonably maintained equipment. The ADC parameters such as the sampling rate, the wordlength and any noise shaping used put limits on the quality which can be achieved. Conversely, the DAC itself may be transparent, because it only converts data whose quality are already determined back to the analog domain. In other words, the ideal ADC determines the system quality and the ideal DAC does not make things any worse.

In practice both ADCs and DACs can fall short of the ideal, but with modern convertor components and attention to detail the theoretical limits can be approached very closely and at reasonable cost. Short comings may be the result of an inadequacy in an individual component such as a convertor chip, or due to incorporating a high-quality component into a poorly though-out system. Poor system design or implementation can destroy the performance of a convertor. Whilst oversampling is a powerful technique for realizing high-quality convertors, its use depends on digital interpolators and decimators whose quality affects the overall conversion quality. Interpolators and decimators with erroneous arithmetic or inadequate filtering performance have been known.

ADCs and DACs have the same transfer function, since they are only distinguished by the direction of operation, and therefore the same terminology can be used to classify the possible shortcomings of both.

FIG. 11 shows the transfer functions resulting from the main types of convertor error.

FIG. 11 Main convertor errors (solid line) compared with perfect transfer line.

These graphs hold for ADCs and DACs, and the axes are interchangeable. If one is chosen to be analog, the other will be digital.

FIG. 12 (a) Equivalent circuit of DAC with 12710 input. (b) DAC with 12810 input.

On a major overflow, here from 2710 to 12810, one current source (128I) must be precisely I greater than the sum of all the lower-order sources. If 128I is too small, the result shown in (c) will occur. This is non-monotonicity.

FIG. 11(a) shows offset error. A constant appears to have been added to the digital signal. This has no effect on sound quality, unless the offset is gross, when the symptom would be premature clipping. DAC offset is of little consequence, but ADC offset is undesirable since it can cause an audible thump if an edit is made between two signals having different offsets. Offset error is sometimes cancelled by digitally averaging the convertor output and feeding it back to the analog input as a small control voltage. Alternatively, a digital high-pass filter can be used.

FIG. 11(b) shows gain error. The slope of the transfer function is incorrect. Since convertors are referred to one end of the range, gain error causes an offset error. The gain stability is probably the least important factor in a digital audio convertor, since ears, meters and gain controls are logarithmic.

FIG. 11(c) shows integral linearity. This is the deviation of the dithered transfer function from a straight line. It has exactly the same significance and consequences as linearity in analog circuits, since if it’s inadequate, distortion will be caused.

Differential non-linearity is the amount by which adjacent quantizing intervals differ in size. This is usually expressed as a fraction of a quantizing interval. In audio applications the differential non-linearity requirement is quite stringent. This is because with properly employed dither, an ideal system can remain linear under low-level signal conditions. When low levels are present, only a few quantizing intervals are in use. If these change in size, clearly waveform distortion will take place despite the dither. Enhancing the subjective quality of convertors using noise shaping will only serve to reveal such shortcomings.

FIG. 12 shows that monotonicity is a special case of differential non linearity. Non-monotonicity means that the output does not increase for an increase in input. FIG. 12(a) shows that in a DAC with a convertor input code of 01111111 (127 decimal), the seven low-order current sources of the convertor will be on. The next code is 10000000 (128 decimal), shown in FIG. 12(b), where only the eighth current source is operating. If the current it supplies is in error on the low side, the analog output for 128 may be less than that for 127 as shown in FIG. 12(c).

In an ADC non-monotonicity can result in missing codes. This means that certain binary combinations within the range cannot be generated by any analog voltage. If a device has better than 1/2Q linearity it must be monotonic. It’s difficult for a one-bit convertor to be non-monotonic.

Absolute accuracy is the difference between actual and ideal output for a given input. For audio it’s rather less important than linearity. For example, if all the current sources in a convertor have good thermal tracking, linearity will be maintained, even though the absolute accuracy drifts.

Clocks which are free of jitter are a critical requirement in convertors as was shown in Section 4. The effects of clock jitter are proportional to the slewing rate of the audio signal rather than depending on the sampling rate, and as a result oversampling convertors are no more prone to jitter than conventional convertors.

Clock jitter is a form of frequency modulation with a small modulation index. Sinusoidal jitter produces sidebands which may be audible. Random jitter raises the noise floor which is more benign but still undesirable. As clock jitter produces artifacts proportional to the audio slew rate, it’s quite easy to detect. A spectrum analyzer is connected to the convertor output and a low audio frequency signal in input. The test is then repeated with a high audio frequency. If the noise floor changes, there is clock jitter. If the noise floor rises but remains substantially flat, the jitter is random. If there are discrete frequencies in the spectrum, the jitter is periodic. The spacing of the discrete frequencies from the input frequency will reveal the frequencies in the jitter.

Aliasing of audio frequencies is not generally a problem, especially if oversampling is used. However, the nature of aliasing is such that it works in the frequency domain only and translates frequencies to new values without changing amplitudes. Aliasing can occur for any frequency above one half the sampling rate. The frequency to which it aliases will be the difference frequency between the input and the nearest sampling rate multiple. Thus in a non-oversampling convertor, all frequencies above half the sampling rate alias into the audio band. This includes radio frequencies which have entered via audio or power wiring or directly. RF can leap-frog an analog anti-aliasing filter capacitively. Thus good RF screening is necessary around ADCs, and the manner of entry of cables to equipment must be such that RF energy on them is directed to earth. Recent legislation regarding the sensitivity of equipment to electromagnetic interference can only be beneficial in this respect.

Oversampling convertors respond to RF on the input in a different manner. Although all frequencies above half the sampling rate are folded into the baseband, only those which fold into the audio band will be audible. Thus an unscreened oversampling convertor will be sensitive to RF energy on the input at frequencies within ±20 kHz of integer multiples of the sampling rate. Fortunately interference from the digital circuitry at exactly the sampling rate will alias to DC and be inaudible.

Convertors are also sensitive to unwanted signals superimposed on the references. In fact the multiplicative nature of a convertor means that reference noise amplitude modulates the audio to create sidebands. Power supply ripple on the reference due to inadequate regulation or decoupling causes sidebands 50, 60, 100 or 120Hz away from the audio frequencies, yet does not raise the noise floor when the input is quiescent. The multiplicative effect reveals how to test for it. Once more a spectrum analyzer is connected to the convertor output. An audio frequency tone is input, and the level is changed. If the noise floor changes with the input signal level, there is reference noise. RF interference on a convertor reference is more insidious, particularly in the case of noise-shaped devices. Noise-shaped convertors operate with signals which must contain a great deal of high-frequency noise just beyond the audio band. RF on the reference amplitude modulates this noise and the sidebands can enter the audio band, raising the noise floor or causing discrete tones depending on the nature of the pickup.

Noise-shaped convertors are particularly sensitive to a signal of half the sampling rate on the reference. When a small DC offset is present on the input, the bit density at the quantizer must change slightly from 50 percent. This results in idle patterns whose spectrum may contain discrete frequencies. Ordinarily these are designed to occur near half the sampling rate so that they are beyond the audio band. In the presence of half sampling-rate interference on the reference, these tones may be demodulated into the audio band.

Although the faithful reproduction of the audio band is the goal, the nature of sampling is such that convertor design must respect EMC and RF engineering principles if quality is not to be lost. Clean references, analog inputs, outputs and clocks are all required, despite the potential radiation from digital circuitry within the equipment and uncontrolled electro magnetic interference outside.

Unwanted signals may be induced directly by ground currents, or indirectly by capacitive or magnetic coupling. It’s essential practice to separate grounds for analog and digital circuitry, connecting them in one place only. Capacitive coupling uses stray capacitance between the signal source and point where the interference is picked up. Increasing the distance or conductive screening helps. Coupling is proportional to frequency and the impedance of the receiving point. Lowering the impedance at the interfering frequency will reduce the pickup. If this is done with capacitors to ground, it need not reduce the impedance at the frequency of wanted signals.

Magnetic or inductive coupling relies upon a magnetic field due to the source current flow inducing voltages in a loop. Reduction in inductive coupling requires the size of any loops to be minimized. Digital circuitry should always have ground planes in which return currents for the logic signals can flow. At high frequency, return currents flow in the ground plane directly below the signal tracks and this minimizes the area of the transmitting loop. Similarly, ground planes in the analog circuitry minimize the receiving loop whilst having no effect on baseband audio. A further weapon against inductive coupling is to use ground fill between all traces on the circuit board. Ground fill will act like a shorted turn to alternating magnetic fields. Ferrous screening material will also reduce inductive coupling as well as capacitive coupling.

The reference of a convertor should be decoupled to ground as near to the integrated circuit as possible. This does not prevent inductive coupling to the lead frame and the wire to the chip itself. In the future convertors with on-chip references may be developed to overcome this problem.

In summary, spectral analysis of convertors gives a useful insight into design weaknesses. If the noise floor is affected by the signal level, reference noise is a possibility. If the noise floor is affected by signal frequency, clock jitter is likely. Should the noise floor be unaffected by both, the noise may be inherent in the signal or in analog circuit stages.

One interesting technique which has been developed recently for ADC testing is a statistical analysis of the frequency of occurrence of the various code values in data. If, For example, a full-scale sine wave is input to an ADC having a frequency which is asynchronous to the sampling rate, the probability of a particular code occurring in the output of an ideal convertor is a function only of the slew rate of the signal. At the peaks of the sine wave the slew rate is small and the codes there are more frequent. Near the zero crossing the slew rate is high and the probability is lower. Near the zero crossing, the probability of codes being created is nearly equal. However, if one quantizing interval is slightly larger than its neighbors, the signal will take longer to cross it and the probability of that code appearing will rise. Conversely, if the interval is smaller the probability will fall. By collecting a large quantity of data from a test and displaying the statistics it’s possible to measure differential non-linearity to phenomenal accuracy.

This technique has been used to show that oversampled noise-shaped convertors are virtually free of differential non-linearity because of the averaging in the decimation process.

In practice signals used are not restricted to high-level sine waves. A low-level sine wave will only exercise a small number of codes near the audiologically sensitive centre of the quantizing range. However, it may be better to use a combination of three sine waves which exercises the whole range. As the test method reveals differences in probability of occurrence of individual codes, it can be used with program material. In this case the exact distribution of code probabilities is not important.

Instead it’s important that the probability distribution should be smooth.

As FIG. 13 shows, spikes in the distribution indicate an unusually high or low probability for certain codes.

In an analysis of code probability on a number of commercially available CDs, a disturbing number of those tested had surprising characteristics such as missing codes, particularly in older recordings.

Single missing codes can be due to an imperfect ADC, but in some cases there were a large number of missing codes spaced evenly apart. This could only be due to primitive gain controls applied in the digital domain without proper redithering. This may have been required with under modulated master tapes which would be digitally amplified prior to the cutting process in order to play back at a reasonable level.

Statistical code analysis is quite useful to the professional audio engineer as it can be applied using actual program material at any point in the production and mastering process. Following an ADC it will reveal convertor non-linearities, but used later, it will reveal DSP shortcomings.

It’s highly likely that a correlation will be found between subjectively perceived resolution and the results of tests of this kind.

FIG. 13 An ideal ADC should show a smooth probability curve of code values on real audio signals. The exact curve will vary with signal content. However, a convertor having uneven quantizing intervals will show positive spikes in the probability if the intervals are too wide and negative spikes if they are too narrow.

8. Use of high sampling rates

From time to time there have been proposals to raise the sampling rates used in digital audio to, For example, 96 kHz and even 192 kHz. These are invariably backed with the results of experiments and demonstrations 'proving' that the sampling rate makes a difference. The reality is different because careful study of these experiments show them to be flawed.

The most famous bandwidth myth is the fact that it’s possible to hear the difference between a 10 kHz sine wave and a 10 kHz squarewave when the difference between the two starts with the third harmonic at 30 kHz. If we could only hear 20 kHz it wouldn't be audible, but it is. The reason is non-linearity in practical equipment. Even if the signal system, speakers and air were perfectly linear, so we could inject a 10 kHz acoustic squarewave into the ear, we would still hear the difference because the ear itself isn't linear. The ossicles in the ear are a mechanical lever system and have limitations. Consequently hearing a difference between a 10 kHz sine wave and a squarewave doesn't prove anything about the bandwidth of human hearing.

Another classic myth is the experiment shown in FIG. 14. This takes a 96 kHz source and allows monitoring of the source directly or through a decimation to 48 kHz followed by an interpolation back to 96 kHz. This is supposed to test whether the difference between 48 kHz and 96 kHz is audible. Actually all it proves is that the more stages a signal goes through, the worse it gets. The decimation and interpolation processes will cause degradation of the signal within the 20 kHz band, so it's no wonder that the subjects prefer the 96 kHz path.

What the experiment should have done was to replicate the degradation of the decimate/interpolate path. In other words the elevated noise floor due to two arithmetic roundoffs in series and the ripple and phase response of the filters should also have been present in the 96 kHz path.

Tests should have been made to ensure that both paths were identical in all respects up to 20 kHz. Unfortunately they weren't and the conclusions are meaningless because the experiment was not properly designed so that the only difference between the two stimuli was the bandwidth.

FIG. 14 A flawed experiment to 'prove' that 96 kHz sampling sounds better.

FIG. 15 A better explanation for the apparent improvement in audio quality using very high sampling rates. If the resolution of the system is inadequate, raising the sampling rate will lower the noise floor.

When a properly designed experiment is performed, in which 96 kHz source material is or is not bandwidth limited to 20 kHz by a psycho acoustically adequate low-pass filter, it’s impossible to hear any difference.

Some ADC manufacturers have demonstrated better sound quality from convertors running at 96 kHz. However, this does not prove that 96 kHz is necessary. FIG. 15 shows that if an oversampling convertor has suboptimal decimating filters it will suffer from a modulation noise floor which damages resolution. If the sampling rate is doubled, the noise will be spread over twice the bandwidth so the level will be reduced. This is why the high sampling rate convertor sounds better. However, the same sound quality could be obtained by improving the design of the 48 kHz convertor.

9. Digital audio interface quality

There are three parameters of interest when conveying audio down a digital interface such as AES/EBU or SPDIF, and these have quite different importance depending on the application. The parameters are:

(a) The jitter tolerance of the serial FM data separator.

(b) The jitter tolerance of the audio samples at the point of conversion back to analog.

A digital interface is designed to convey discrete numerical values from one place to another. If those samples are correctly received with no numerical change, the interface is perfect. The serial interface carries clocking information, in the form of the transitions of the FM channel code and the sync patterns and this information is designed to enable the data separator to determine the correct data values in the presence of jitter. It was shown in Section 8 that the jitter window of the FM code is half a data bit period in the absence of noise. This becomes a quarter of a data bit when the eye opening has reached the minimum allowable in the professional specification as can be seen from Figure 8.2. If jitter is within this limit, which corresponds to about 80 nanoseconds pk-pk, the serial digital interface perfectly reproduces the sample data, irrespective of the intended use of the data. The data separator of an AES/EBU receiver requires a phase-locked loop in order to decode the serial message. This phase-locked loop will have jitter of its own, particularly if it’s a digital phase-locked loop where the phase steps are of finite size. Digital phase-locked loops are easier to implement along with other logic in integrated circuits. There is no point in making the jitter of the phase-locked loop vanishingly small as the jitter tolerance of the channel code will absorb it. In fact the digital phase-locked loop is simpler to implement and locks up quicker if it has larger phase steps and therefore more jitter.

This has no effect on the ability of the interface to convey discrete values, and if the data transfer is simply an input to a digital recorder no other parameter is of consequence as the data values will be faithfully recorded. However, it’s a further requirement in some applications that a sampling clock for a convertor is derived from a serial interface signal.

FIG. 16 In an outboard convertor, the clock from the data separator is not sufficiently free of jitter and additional clock regeneration is necessary to drive the DAC.

It was shown in Section 4 that the jitter tolerance of convertor clocks is measured in picoseconds. Thus a phase-locked loop in the FM data separator of a serial receiver chip is quite unable to drive a convertor directly as the jitter it contains will be as much as a thousand times too great. Nevertheless this is exactly how a great many consumer outboard DACs are built, regardless of price. The consequence of this poor engineering is that the serial interface is no longer truly digital. Analog variations in the interface waveform cause variations in the convertor clock jitter and thus variations in the reproduced sound quality. Different types of digital cable 'sound' different and journalists claim that digital optical interfaces are 'sonically superior' to electrical interfaces. The digital outputs of some CD players 'sound' better than others and so on.

In fact source and cable substitution is an excellent test of outboard convertor quality. A properly engineered outboard convertor will sound the same despite changes in CD player, cable type and length and despite changing from electrical to optical input because it accepts only data from the serial signal and regenerates its own clock. Audible differences simply mean the convertor is of poor design and should be rejected.

FIG. 16 shows how a convertor should be con figured. The serial data separator has its own phase-locked loop which is less jittery than the serial waveform and so recovers the audio data. The serial data are presented to a shift register which is read in parallel to a latch when an entire sample is present by a clock edge from the data separator. The data separator has done its job of correctly returning a sample value to parallel format. A quite separate phase-locked loop with extremely high damping and low jitter is used to regenerate the sampling clock. This may use a crystal oscillator or it may be a number of loops in series to increase the order of the jitter filtering. In the professional channel status, bit 5 of byte 0 indicates whether the source is locked or unlocked. This bit can be used to change the damping factor of the phase-locked loop or to switch from a crystal to a varicap oscillator. When the source is unlocked, perhaps because a recorder is in varispeed, the capture range of the phase-locked loop can be widened and the increased jitter is accepted. When the source is locked, the capture range is reduced and the jitter is rejected.

The third timing criterion is only relevant when more than one signal is involved as it affects the ability of, For example, a mixer to combine two inputs.

In order to decide which criterion is most important, the following may be helpful. A single signal which is involved in a data transfer to a recording medium is concerned only with eye pattern jitter as this affects the data reliability.

FIG. 17 Low convertor jitter is easier if the transport is slaved to the crystal oscillator which drives the convertor. This is readily achieved in a single box device.

FIG. 18 If a separate transport is to be slaved to a crystal oscillator in the DAC a reference signal must be sent. Not many consumer transports have external clock inputs.

A signal which is to be converted to analog is concerned primarily with the jitter at the convertor clock. Signals which are to be mixed are concerned with the eye pattern jitter and the relative timing. If the mix is to be monitored, all three parameters become important.

A better way of ensuring low jitter conversion to analog in digital audio reproducers is to generate a master clock from a crystal adjacent to the convertor, and then to slave the transport to produce data at the same rate. This approach is shown in FIG. 17. Memory buffering between transport and convertor then ensures that the transport jitter is eliminated. Whilst this can also be done with a remote convertor, it does then require a reference clock to be sent to the transport as in FIG. 18 so that data can be sent at the correct rate. Unfortunately most consumer CD and DAT players have no reference input and this approach cannot be used. Consumer remote DACs then must regenerate a clock from the player and seldom do it accurately enough. In fact it’s a myth that outboard convertors are necessary for high quality. For the same production cost, a properly engineered inboard convertor adhering to the quality criteria of Section 4 can sound better than a two-box system. The real benefit of an outboard convertor is that in theory it allows several digital sources to be replayed for the cost of one convertor. In practice few consumer devices are available with only a digital output, and the convertors are duplicated in each device.

10 Compression in stereo

The human hearing mechanism has an ability to concentrate on one of many simultaneous sound sources based on direction. The brain appears to be able to insert a controllable time delay in the nerve signals from one ear with respect to the other so that when sound arrives from a given direction the nerve signals from both ears are coherent causing the binaural threshold of hearing to be 3-6 dB better than monaural at around 4 kHz. Sounds arriving from other directions are incoherent and are heard less well. This is known as attentional selectivity.

Human hearing can also locate a number of different sound sources simultaneously presented by constantly comparing excitation patterns from the two ears with different delays. Strong correlation will be found where the delay corresponds to the interaural delay for a given source.

This delay-varying mechanism will take time and the ear is slow to react to changes in source direction. Oscillating sources can only be tracked up to 2-3Hz and the ability to locate bursts of noise improves with burst duration up to about 700ms. Location accuracy is finite.

Stereophonic and surround systems should allow attentional selectivity to function such that the listener can concentrate on specific sound sources in a reproduced image with the same facility as in the original sound.

We live in a reverberant world which is filled with sound reflections. If we could separately distinguish every different reflection in a reverberant room we would hear a confusing cacaphony. In practice we hear very well in reverberant surroundings, far better than microphones can, because of the transform nature of the ear and the way in which the brain processes nerve signals. Because the ear has finite frequency discrimination ability in the form of critical bands, it must also have finite temporal discrimination.

This is good news for the loudspeaker designer because the ear has finite accuracy in frequency, time and spatial domains. This means that a blameless loudspeaker is not just a concept, it could be made real by the application of sufficient rigor.

When two or more versions of a sound arrive at the ear, provided they fall within a time span of about 30ms, they won’t be treated as separate sounds, but will be fused into one sound. Only when the time separation reaches 50-60ms do the delayed sounds appear as echoes from different directions. As we have evolved to function in reverberant surroundings, most reflections don’t impair our ability to locate the source of a sound.

Clearly the first version of a transient sound to reach the ears must be the one which has travelled by the shortest path and this must be the direct sound rather than a reflection. Consequently the ear has evolved to attribute source direction from the time of arrival difference at the two ears of the first version of a transient.

Versions which may arrive from elsewhere simply add to the perceived loudness but don’t change the perceived location of the source unless they arrive within the inter-aural delay of about 700 s when the precedence effect breaks down and the perceived direction can be pulled away from that of the first arriving source by an increase in level. This area is known as the time-intensity trading region. Once the maximum inter-aural delay is exceeded, the hearing mechanism knows that the time difference must be due to reverberation and the trading ceases to change with level.

Unfortunately reflections with delays of the order of 700 s are exactly what are provided by the legacy rectangular loudspeaker with sharp corners. These reflections are due to acoustic impedance changes and if we could see sound we would double up with mirth at how ineptly the sound is being radiated. Effectively the spatial information in the audio signals is being convolved with the spatial footprint of the speaker. This has the effect of defocusing the image. Now the effect can be measured.

Intensity stereo, the type obtained with coincident mikes or panpots, works purely by amplitude differences at the two loudspeakers. The two signals should be exactly in phase. As both ears hear both speakers the result is that the space between the speakers and the ears turns the intensity differences into time of arrival differences. These give the illusion of virtual sound sources.

FIG. 19 At (a) sounds from a pan -pot appear as point sources with ideal speakers.

(b) Reverberation and ambience appear between the point sources. (c) Most legacy loudspeakers cause smearing which widens the sound sources and marks the ambience between.

A virtual sound source from a panpot has zero width and on diffraction-free speakers would appear as a virtual point source. FIG. 19(a) shows how a panpotted dry mix should appear spatially on ideal speakers whereas (b) shows what happens when stereo reverb is added.

In fact (b) is also what is obtained with real sources using a coincident pair of mikes. In this case the sources are the real sources and the sound between is reverb/ambience.

FIG. 19(c) is the result obtained with traditional square box speakers. Note that the point sources have spread so that there are almost no gaps between them, effectively masking the ambience. This represents a lack of spatial fidelity, so we can say that rectangular box loudspeakers cannot accurately reproduce a stereo image, nor can they be used for assessing the amount of reverbertion added to a 'dry' recording. Such speakers cannot meaningfully be used to assess compression codecs.

A compressor works by raising the level of 'noise' in parts of the signal where it’s believed to be masked. If this belief is correct, the compression will be inaudible. However, if the codec is tested using a signal path in which there is another masking effect taking place, the results of the test are meaningless. Theoretical analysis and practical measurement that legacy loudspeakers have exactly such a masking process, both tempo rally and spatially.

If a stereophonic system comprising a variable bit rate codec in series with a pair of speakers is considered to be a communication channel, then it will have a finite information rate in frequency, temporal and spatial domains. If this information rate exceeds the capacity of the human hearing mechanism, it will be deemed transparent. However, in the system mentioned, either the codec or the speakers could be the limiting factor and ordinarily there would be no way to separate the effects.

If a variable bit-rate codec is available, some conclusions can be drawn.

FIG. 20(a) shows what happens as the bit rate is increased with an ideal speaker. The sound quality increases up to the point where the capacity of the ear is reached, after which raising the bit rate appears to have no effect. However, if suboptimal speakers are used, the situation of FIG. 20(b) arises. Now, as the bit rate is increased, the quality levels off prematurely where the information capacity of the loudspeaker has been reached. As a result simply by varying the bit rate of a coder, it becomes possible to measure the effective bit rate of a pair of loudspeakers.

In subjective compression tests, the configuration of FIG. 20(c) is used. The listener switches between the uncompressed and compressed versions to see if a difference can be detected. If the speaker of FIG. 20(b) is used, the experimenter is misled, because it would appear that there is no difference between direct and compressed listening at an artificially low bit rate, whereas in fact the limiting factor is the speaker.

FIG. 20 (a) If an ideal speaker is used, the quality levels off when the compressor is delivering enough data for the requirement of the ear. However, (b), if an inadequate speaker is used, the quality appears to level off early, giving an incorrect assessment of the bit rate needed. (c) Using compression to test loudspeakers. The better the loudspeaker, the less compression can be used before it becomes audible.

FIG. 21 Compression is less effective in stereo. In (a) is shown the spatial result of a 'dry' panpotted mix. (b) shows the result after artificial reverberation which can also be obtained in an acoustic recording with coincident mikes. After compression (c) the ambience and reverberation may be reduced or absent. (d) Reverberation may also decay prematurely.

At the point shown in FIG. 20(b) the masking due to the speaker is equal to the level of artifacts from the coder. At any lower bit rate, compression artifacts will become audible over the footprint of the speaker. The lower the information capacity of the speaker, the lower the bit rate at which the artifacts are audible.

Non-ideal loudspeakers act like bit-rate compressors in that they conceal or mask information in the audio signal. If a real compressor is tested with non-ideal loudspeakers certain deficiencies of the compressor won’t be heard and it may erroneously be assumed that the compressor is transparent when in fact it’s not. Compression artifacts which are inaudible in mono may be audible in stereo and the spatial compression of non-ideal stereo loudspeakers conceals real spatial compression artifacts.

Precision monitor speakers should be free of reflections in the sub 700 s trading region so that the imaging actually reveals what is going on spatially. When such speakers are used to assess audio compressors, even at high bit rates corresponding to the smallest amount of compression, it’s obvious that there is a difference between the original and the compressed result. FIG. 21 shows graphically what is found.

The dominant sound sources are reproduced fairly accurately, but the ambience and reverb between are virtually absent, making the decoded sound much drier than the original.

The effect will be apparent to the same extent with, For example, both MPEG Layer II and Dolby AC-3 coders even though their internal workings are quite different. This is not surprising because both are probably based on the same psychoacoustic masking model. MPEG-3 fares even worse because the bit rate is lower. Transient material has a peculiar effect whereby the ambience will come and go according to the entropy of the dominant source. A percussive note will narrow the sound stage and appear dry but afterwards the reverb level will come back up.

All these effects largely disappear when the signals to the speakers are added to make mono, removing the ear's ability to discriminate spatially.

These effects are not subtle and don’t require golden ears. The author has successfully demonstrated them to various audiences up to 60 in number in a variety of untreated rooms. Whilst compression may be adequate to deliver post-produced audio to a consumer with mediocre loudspeakers, these results underline that it has no place in a quality production environment. When assessing codecs, loudspeakers having poor diffraction design will conceal artifacts. When mixing for a compressed delivery system, it will be necessary to include the codec in the monitor feeds so that the results can be compensated. Where high quality stereo is required, either full bit rate PCM or lossless (packing) techniques must be used.

. ===

Prev. | Next