Currents by John Eargle (Apr. 1991)

Home | Audio Magazine | Stereo Review magazine | Good Sound | Troubleshooting


Departments | Features | ADs | Equipment | Music/Recordings | History




DATA DIET


Little thought was given to the subject of audio data reduction when the CD and R-DAT were being developed in European and Japanese laboratories during the late 1970s and early '80s. The engineers had a primary goal, which was to develop recording and playback systems that could accommodate a sufficient digital bit rate to encode any kind of signal, music or otherwise, with archival accuracy. That goal was accomplished, and both the CD and R-DAT are truly archival mediums in that they contain digital information which is identical to that on the digital master recording from which they were derived.

In areas such as international communications and broadcasting, the number of satellite links is limited and information channels are already reaching capacity. The most direct way around these problems is to somehow compress the audio signal so that it occupies less information "real estate." This way, more program channels can be accommodated. The designer's real challenge is to do this so well that the listener cannot tell that anything deleterious has happened to the audio signal.

Consider these observations regarding speech and music signals and our psycho-acoustical perception of them:

1. Few musical fundamental pitches exist above about 4 kHz, and the ear cannot detect definite pitch of individual tones above that frequency.

2. The dynamic range that the ear perceives varies over the frequency range, and the threshold of hearing is much higher at very low and high frequencies than it is in the midrange.

3. Music and speech signals are normally highly redundant in that the waveform of one small "slice of time" will probably look very much like the following or preceding slice.

4. Substantial program energy in a given frequency band tends to mask low-level signals in higher bands.

Current linear digital encoding methods have been designed with other goals in mind and have not necessarily considered these perceptual aspects.

Note that in linear encoding systems, fully one-half of the information that is recorded deals solely with the range between 10 and 20 kHz. In that, range, the ear basically perceives harmonics of lower fundamentals and the general sensation of brightness.

The above observations have become the starting point for new digital encoding methods aimed at reducing the information rate for already crowded information channels. Recently, I had the pleasure of listening to an Apt-X 100 encode/decode system that employed only four bits per channel (!) to encode music and speech program.

The unit I heard was developed in England for broadcast applications and had been modified for wider bandwidth operation, so it was operating at a 44.1-kHz sampling frequency. Try as I did with a variety of program material, I could not hear any difference between the input signal and the processed signal in an A/B/X test setup.

The Apt-X system uses three basic algorithms. First, it divides the audio range into three sub-bands, encoding the range up to about 4 kHz with full 16-bit accuracy, but encoding the three higher bands with progressively lower resolution, in accordance with reduced hearing threshold. In the upper bands, the encode quantization step size is varied, based on the masking effect of what has gone before, and a complementary adjustment is made in decoding. Finally, linear prediction considers the "unmasking" effect of t spectrally pure input signals and adjusts encoding parameters for greater noise masking over extended program segments with repetitive signals.

One consequence is that CD, for example, could be re-encoded via the Apt-X 100 system and give us eight channels of output instead of two! This would require no fundamental re-engineering of the CD medium but only the addition of encode and decode black boxes. This is not likely to happen.

Data reduction has a more important role to play in normal two-channel audio applications, including the recently announced Philips Digital Compact Cassette (DCC). This consumer format is due to be released in 1992 as a digital follow-up to the immensely successful analog Compact Cassette. It is a stationary-head digital format operating at the normal cassette tape speed of 1 7/8 ips. Its thin-film head will have nine tracks for digital record and play (eight data tracks plus one control track) and two tracks for analog play, and will rotate to play the tape's second side. Philips claims that the perceptual encoding system for DCC, which uses 32 sub-bands, yields up to 18-bit dynamic range.

How is such complex signal analysis made so that the costs of reconstructing the signal in playback units can be kept within acceptable bounds? In analog noise reduction (a close kin to the processes we are discussing here), the complexity of encoding circuitry and decoding circuitry is virtually the same. In the digital domain, the complexity of input signal analysis is far more complicated than the decoding circuitry required to sort it all out. For example, during encoding it might be necessary to perform a fast Fourier transform (FFT) on a succession of short waveform samples. The output of the FFT is a spectrum analysis of the signal, and this might be used in the encoding algorithm to weigh the audibility of masking effects-and hence determine quantization size and signal-to-noise demands in a given frequency band. The hardware needed to perform the FFT may be relatively complex, but the information needed to decode the signal adjustment called for may be no more than a simple instruction imbedded in the playback signal.

Still, the playback processing required for such reduced-data recording is complex enough to require new integrated circuitry dedicated to the standard at hand. In general, however, this playback processing may be no more complex than in linear digital recording, and the benefit of the new process will show up mainly in the reduced demands on the basic recording medium or transmission channel.

Another area where audio data reduction may have an important role is in digital sound for motion pictures.

The Optical Radiation Corporation Cinema Digital Sound (CDS) system was introduced in 1990 and may embody some degree of data reduction. Dolby Laboratories has been working on a digital system for film, and word is that they intend to demonstrate it early this year. Dolby Laboratories is the acknowledged expert in analog data reduction techniques, having developed them for nearly 25 years, and their introduction of a digital system would command considerable attention.

It is safe to say that any digital data reduction system seriously put forth now as a standard must have passed the "ear test." The remaining points of merit that will be used to sort out the various systems will have to do with economy, reliability, and how the various systems perform under duress.

(adapted from Audio magazine, Apr. 1991)

= = = =

Prev. | Next

Top of Page    Home

Updated: Tuesday, 2018-07-17 7:49 PST