Ultimate Guide to Digital Audio: Why digital? (part 1)

Home | Audio Magazine | Stereo Review magazine | Good Sound | Troubleshooting

1. Introduction

The applications of audio technology are numerous, but generally the goal is to reproduce sound at a later time, at another place or both. The consumer needs reasonably affordable equipment which will reproduce recordings or receive transmissions, whereas the record company or broadcaster needs equipment which can manipulate audio signals to produce recordings or programs. In this case flexibility and speed of operation are more important than first cost.

In one sense provided the sound is reproduced to an acceptable standard, the user doesn't care how it’s done. The point to be stressed is that it’s the service that is needed, not the technology. People don't want technology; instead they want the services it provides. As a result when a new technology comes along it may be adopted if the service is better in some way or if the same service is possible at lower cost or with smaller equipment. Digital audio did just that, irrevocably transforming the face of audio in a very short time for both consumer and professional alike. This is not a history book, but readers interested in the history of digital audio are referred to Section 8 of Magnetic Recording: The first 100 years.

The first techniques to be used for sound recording, transmission and processing were understandably analog. Some mechanical, electrical or magnetic parameter was caused to vary in the same way that the sound to be recorded had varied the air pressure. The voltage coming from a microphone is an analog of the air pressure (or sometimes velocity), but both vary in the same timescale; the magnetism on a tape or the deflection of a disk groove is an analog of the electrical input signal, but in recorders there is a further analog between time in the input signal and distance along the medium.

In an analog system, information is conveyed by some infinite variation of a continuous parameter such as the voltage on a wire or the strength of flux on a tape. In a recorder, distance along the medium is a further, continuous, analog of time. It does not matter at what point a recording is examined along its length, a value will be found for the recorded signal. That value can itself change with infinite resolution within the physical limits of the system.

Those characteristics are the main weakness of analog signals. Within the allowable bandwidth, any waveform is valid. If the speed of the medium is not constant, one valid waveform is changed into another valid waveform; a timebase error cannot be detected in an analog system.

In addition, a voltage error simply changes one valid voltage into another; noise cannot be detected in an analog system. We might suspect noise, but how is one to know what proportion of the received voltage is noise and what is the original? If the transfer function of a system is not linear, distortion results, but the distorted waveforms are still valid; an analog system cannot detect distortion. Again we might suspect distortion, but how are we to know how much of the third harmonic energy received is due to the distortion and how much was actually present in the original signal? It’s a characteristic of analog systems that degradations cannot be separated from the original signal, so nothing can be done about them. At the end of a system a signal carries the sum of all degradations introduced in the stages through which it passed. This sets a limit to the number of stages through which a signal can be passed before it’s useless. Alternatively, if many stages are envisaged, each piece of equipment must be far better than necessary so that the signal is still acceptable at the end. The equipment will naturally be more expensive.

When setting out to design any audio equipment, it’s important to appreciate that the final arbiter is the human hearing system. If the audio signal is reproduced less accurately than our senses, these shortcomings will be audible, whereas if the system is more accurate than our senses, it will appear perfect even though it’s not. Making the system better still is then a waste of resources. This topic will be explored in more detail in Sections 2 and 13.

2. What is digital audio?

One of the vital concepts to grasp is that digital audio is simply an alternative means of carrying audio information. An ideal digital audio recorder has the same characteristics as an ideal analog recorder: both of them are totally transparent and reproduce the original applied wave form without error. One need only compare high-quality analog and digital equipment side by side with the same signals to realize how transparent modern equipment can be. Needless to say, in the real world ideal conditions seldom prevail, so analog and digital equipment both fall short of the ideal. Digital audio simply falls short of the ideal by a smaller distance than does analog and at lower cost, or, if the designer chooses, can have the same performance as analog at much lower cost.

Although there are a number of ways in which audio can be represented digitally, there is one system, known as pulse code modulation (PCM), which is in virtually universal use. FIG. 1 shows how PCM works. Instead of being continuous, the time axis is represented in a discrete, or stepwise manner. The waveform is not carried by continuous representation, but by measurement at regular intervals. This process is called sampling and the frequency with which samples are taken is called the sampling rate or sampling frequency Fs .

FIG. 1 In pulse code modulation (PCM) the analog waveform is measured periodically at the sampling rate. The voltage (represented here by the height) of each sample is then described by a whole number. The whole numbers are stored or transmitted rather than the waveform itself.

The sampling rate is generally fixed and is thus independent of any frequency in the signal. If every effort is made to rid the sampling clock of jitter, or time instability, every sample will be made at an exactly even time step. Clearly if there is any subsequent timebase error, the instants at which samples arrive will be changed and the effect can be detected. If samples arrive at some destination with an irregular timebase, the effect can be eliminated by storing the samples temporarily in a memory and reading them out using a stable, locally generated clock. This process is called timebase correction and all properly engineered digital audio systems must use it. Clearly timebase error is not reduced; it’s totally eliminated. As a result there is little point measuring the wow and flutter of a digital recorder; it doesn't have any. What happens is that the crystal clock in the timebase corrector measures the stability of the flutter meter.

It should be stressed that sampling is an analog process. Each sample still varies infinitely as the original waveform did. Sampled analog devices are well known in audio. These are generally implemented with charge coupled registers and are used for chorus effects in keyboards and for delay in public address systems.

Those who are not familiar with digital audio often worry that sampling takes away something from a signal because it’s not taking notice of what happened between the samples. This would be true in a system having infinite bandwidth, but no analog audio signal can have infinite bandwidth. All analog signal sources such as microphones, tape decks, pickup cartridges and so on have a frequency response limit, as indeed do our ears. When a signal has finite bandwidth, the rate at which it can change is limited, and the way in which it changes becomes predictable. When a waveform can only change between samples in one way, the original waveform can be reconstructed from them. A more detailed treatment of the principle will be given in Section 4.

FIG. 1 also shows that each sample is also discrete, or represented in a stepwise manner. The length of the sample, which will be proportional to the voltage of the audio waveform, is represented by a whole number. This process is known as quantizing and results in an approximation, but the size of the error can be controlled until it’s negligible. If, For example, we were to measure the height of humans to the nearest meter, virtually all adults would register two meters high and obvious difficulties would result. These are generally overcome by measuring height to the nearest centimeter. Clearly there is no advantage in going further and expressing our height in a whole number of millimeters or even micrometers, although no doubt some Hi-Fi enthusiasts will be able to advance reasons for doing so. The point is that an appropriate resolution can also be found for audio, and a higher figure is not beneficial. The link between audio quality and sample resolution is explored in Section 4.

The advantage of using whole numbers is that they are not prone to drift. If a whole number can be carried from one place to another without numerical error, it has not changed at all. By describing audio waveforms numerically, the original information has been expressed in a way which is better able to resist unwanted changes.

Essentially, digital audio carries the original waveform numerically.

The number of the sample is an analog of time, and the magnitude of the sample is an analog of the pressure at the microphone. In fact the succession of samples in a digital system is actually an analog of the original waveform. This sounds like a contradiction and as a result some authorities prefer the term 'numerical audio' to 'digital audio' and in fact the French word is numerique. The term 'digital' is so well established that it’s unlikely to change.

As both axes of the digitally represented waveform are discrete, the waveform can accurately be restored from numbers as if it were being drawn on graph paper. If we require greater accuracy, we simply choose paper with smaller squares. Clearly more numbers are then required and each one could change over a larger range.

In simple terms, the audio waveform is conveyed in a digital recorder as if the voltage had been measured at regular intervals with a digital meter and the readings had been written down on a roll of paper. The rate at which the measurements were taken and the accuracy of the meter are the only factors which determine the quality, because once a parameter is expressed as a discrete number, a series of such numbers can be conveyed unchanged. Clearly in this example the handwriting used and the grade of paper have no effect on the information. The quality is determined only by the accuracy of conversion and is independent of the quality of the signal path.

3. Why binary?

Humans insist on using numbers expressed to the base of ten, having evolved with that number of digits. Other number bases exist; most people are familiar with the duodecimal system which uses the dozen and the gross. The most minimal system is binary, which has only two digits, 0 and 1. BInary digiTS are universally contracted to bits. These are readily conveyed in switching circuits by an 'on' state and an 'off' state.

With only two states, there is little chance of error.

In decimal systems, the digits in a number (counting from the right, or least significant end) represent ones, tens, hundreds and thousands etc.

FIG. 2 shows that in binary, the bits represent one, two, four, eight, sixteen etc. A multi-digit binary number is commonly called a word, and the number of bits in the word is called the wordlength. The right-hand bit is called the least significant bit (LSB) whereas the bit on the left-hand end of the word is called the most significant bit (MSB). Clearly more digits are required in binary than in decimal, but they are more easily handled. A word of eight bits is called a byte, which is a contraction of 'by eight'.

The capacity of memories and storage media is measured in bytes, but to avoid large numbers, kilobytes, megabytes and gigabytes are often used. As memory addresses are themselves binary numbers, the wordlength limits the address range. The range is found by raising two to the power of the wordlength. Thus a four-bit word has sixteen combinations, and could address a memory having sixteen locations. A ten-bit word has 1024 combinations, which is close to one thousand. In digital terminology, 1K = 1024, so a kilobyte of memory contains 1024 bytes. A megabyte (1MB) contains 1024 kilobytes and a gigabyte contains 1024 megabytes.

In a digital audio system, the whole number representing the length of the sample is expressed in binary. The signals sent have two states, and change at predetermined times according to some stable clock. FIG. 3 shows the consequences of this form of transmission. If the binary signal is degraded by noise, this will be rejected by the receiver, which judges the signal solely by whether it’s above or below the half-way threshold, a process known as slicing. The signal will be carried in a channel with finite bandwidth, and this limits the slew rate of the signal; an ideally upright edge is made to slope. Noise added to a sloping signal can change the time at which the slicer judges that the level passed through the threshold. This effect is also eliminated when the output of the slicer is reclocked. However many stages the binary signal passes through, the information is unchanged except for a delay.

FIG. 2 In a binary number, the digits represent increasing powers of two from the LSB. Also defined here are MSB and wordlength. When the wordlength is eight bits, the word is a byte. Binary numbers are used as memory addresses, and the range is defined by the address wordlength. Some examples are shown here.

FIG. 3 (a) A binary signal is compared with a threshold and reclocked on receipt, thus the meaning will be unchanged. (b) Jitter on a signal can appear as noise with respect to fixed timing. (c) Noise on a signal can appear as jitter when compared with a fixed threshold.

FIG. 4 When a signal is carried in numerical form, either parallel or serial, the mechanisms of FIG. 3 ensure that the only degradation is in the conversion processes.

Audio samples which are represented by whole numbers can reliably be carried from one place to another by such a scheme, and if the number is correctly received, there has been no loss of information en route.

There are two ways in which binary signals can be used to carry audio samples and these are shown in FIG. 4. When each digit of the binary number is carried on a separate wire this is called parallel transmission.

The state of the wires changes at the sampling rate. Using multiple wires is cumbersome, particularly where a long wordlength is in use, and a single wire can be used where successive digits from each sample are sent serially. This is the definition of pulse code modulation. Clearly the clock frequency must now be higher than the sampling rate. Whilst the transmission of audio by such a scheme is advantageous in that noise and timebase error have been eliminated, there is a penalty that a single high quality audio channel requires around one million bits per second. Digital audio came into wide use as soon as such a data rate could be handled economically. Further applications become possible when means to reduce or compress the data rate become economic. Section 5 considers audio compression.

4. Why digital?

There are two main answers to this question, and it’s not possible to say which is the most important, as it will depend on one's standpoint.

(a) The quality of reproduction of a well-engineered digital audio system is independent of the medium and depends only on the quality of the conversion processes. If compression is used this can also affect the quality.

(b) The conversion of audio to the digital domain allows tremendous opportunities which were denied to analog signals.

Someone who is only interested in sound quality will judge the former the most relevant. If good-quality converters can be obtained, all the shortcomings of analog recording can be eliminated to great advantage.

One's greatest effort is expended in the design of convert ors, whereas those parts of the system which handle data need only be workmanlike.

Wow, flutter, particulate noise, print-through, dropouts, modulation noise, HF squashing, azimuth error, and interchannel phase errors are all eliminated.

When a digital recording is copied, the same numbers appear on the copy: it’s not a dub, it’s a clone. If the copy is indistinguishable from the original, there has been no generation loss. Digital recordings can be copied indefinitely without loss of quality. If you happen to be a sound engineer, this is heaven. If you are a record company executive you take another pill for blood pressure and phone your lawyer to see if you can have it stopped.

In the real world everything has a cost, and one of the greatest strengths of digital technology is low cost. If copying causes no quality loss, recorders don’t need to be far better than necessary in order to withstand generation loss. They need only be adequate on the first generation whose quality is then maintained. There is no need for the great size and extravagant tape consumption of professional analog recorders. When the information to be recorded is discrete numbers, they can be packed densely on the medium without quality loss. Should some bits be in error because of noise or dropout, error correction can restore the original value. Digital recordings take up less space than analog recordings for the same or better quality. Tape costs are far less and storage costs are reduced.

Digital circuitry costs less to manufacture. Switching circuitry which handles binary can be integrated more densely than analog circuitry.

More functionality can be put in the same chip. Analog circuits are built from a host of different component types which have a variety of shapes and sizes and are costly to assemble and adjust. Digital circuitry uses standardized component outlines and is easier to assemble on automated equipment. Little if any adjustment is needed.

Once audio is in the digital domain, it becomes data, and as such is indistinguishable from any other type of data. Systems and techniques developed in other industries for other purposes can be used for audio.

Computer equipment is available at low cost because the volume of production is far greater than that of professional audio equipment. Disk drives and memories developed for computers can be put to use in audio products. A word processor adapted to handle audio samples becomes a workstation. There seems to be little point in waiting for a tape to wind when a disk head can access data in milliseconds. The difficulty of locating the edit point and the irrevocable nature of tape-cut editing are immediately seen as outmoded when the edit point can be located by viewing the audio waveform on a screen or by listening at any speed to audio from a memory. The edit can be simulated or previewed and trimmed before it’s made permanent.

The merging of digital audio and computation is two-sided. Whilst audio may borrow RAM and hard disk technology from the computer industry, Compact Disc and DAT were borrowed back to create CD-ROM and DDS (digital data storage).

Communications networks developed to handle data can happily carry digital audio over indefinite distances without quality loss. Digital audio broadcasting (DAB) makes use of these techniques to eliminate the interference, fading and multipath reception problems of analog broad casting. At the same time, more efficient use is made of available bandwidth. In one sense DAB is just conventional radio done with digital transmission. The listener still has to accept what the broadcaster chooses to transmit. In contrast, if the listener uses a data communication channel such as the Internet, any audio program material can in principle be accessed at any time over any distance.

Digital equipment can have self-diagnosis programs built-in. The machine points out its own failures. The days of chasing a signal with an oscilloscope are over. Even if a faulty component in a digital circuit could be located with such a primitive tool, it may be impossible to replace a chip having 60 pins soldered through a six-layer circuit board. The cost of finding the fault may be more than the board is worth. Routine, mind numbing adjustment of analog circuits to counteract drift is no longer needed. The cost of maintenance falls. A small operation may not need maintenance staff at all; a service contract is sufficient. A larger organization will still need maintenance staff, but they will be fewer in number and their skills will be oriented more to systems than to devices.

As a result of the above, the cost of ownership of digital equipment has for some time now been less than that of analog. Debates about quality are academic; in recording and transmission, analog equipment can no longer compete economically, and it’s going out of service as surely as the transistor once replaced the vacuum-tube in electronics and the turbine replaced the piston engine in commercial aviation.

5. Some digital audio processes outlined

Whilst digital audio is a large subject, it’s not necessarily a difficult one.

Every process can be broken down into smaller steps, each of which is relatively easy to follow. The main difficulty with study is to appreciate where the small steps fit in the overall picture. Subsequent sections of this guide will describe the key processes found in digital technology in some detail, whereas this section illustrates why these processes are necessary and shows how they are combined in various ways in real equipment. Once the general structure of digital devices is appreciated, the following sections can be put in perspective.

FIG. 5(a) shows a minimal digital audio system. This is no more than a point-to-point link which conveys analog audio from one place to another. It consists of a pair of convert ors and hardware to serialize and deserialize the samples. There is a need for standardization in serial transmission so that various devices can be connected together. These standards for digital audio interfaces are described in Section 7.

Analog audio entering the system is converted in the analog-to-digital convert or (ADC) to samples which are expressed as binary numbers. A typical sample would have a wordlength of sixteen bits. The sample is loaded in parallel into a shift register which is then shifted with a clock running at sixteen times the sampling rate. The data are sent serially to the other end of the line where a slicer rejects noise picked up on the signal. Sliced data are then shifted into a receiving shift register with a bit clock. Once every sixteen bits, the shift register contains a whole sample, and this is read out by the sampling rate clock, or word clock, and sent to the digital-to-analog convert or (DAC), which converts the sample back to an analog voltage.

Following a casual study one might conclude that if the convert ors were of transparent quality, the system would be ideal. Unfortunately this is incorrect. As FIG. 3 showed, noise can change the timing of a sliced signal. Whilst this system rejects noise which threatens to change the numerical value of the samples, it’s powerless to prevent noise from causing jitter in the receipt of the word clock. Noise on the word clock means that samples are not converted with a regular timebase and the impairment caused can be audible. Stated another way, analog characteristics of the interconnect are not prevented from affecting the reproduced waveform and so the system is not truly digital.

The jitter problem is overcome in FIG. 5(b) by the inclusion of a phase-locked loop which is an oscillator which synchronizes itself to the average frequency of the word clock but which filters out the instantaneous jitter. The operation of a phase-locked loop is analogous to the function of the flywheel on a piston engine. The samples are then fed to the convert or with a regular spacing and the impairment is no longer audible. Section 4 shows why the effect occurs and deduces the remarkable clock accuracy needed for accurate conversion.

FIG. 5 In (a) two convert ors are joined by a serial link. Although simple, this system is deficient because it has no means to prevent noise on the clock lines causing jitter at the receiver. In (b) a phase-locked loop is incorporated, which filters jitter from the clock.

Whilst this effect is reasonably obvious, it does not guarantee that all convert ors take steps to deal with it. Many outboard DACs sold on the consumer market have no phase-locked loop, and one should not be surprised that they can sound worse than the inboard convert or they are supposed to replace. In the absence of timebase correction, the sound quality of an outboard convert or can be affected by such factors as the type of data cable used and the power supply noise of the digital source.

Clearly if the sound of a given DAC is affected by cable or source, it’s simply not well engineered and should be rejected. Almost by definition a good remote DAC rejects noise and jitter on the digital inputs and its sound is not affected by the digital source or the analog characteristics of the cable.

6 The sampler

The system of FIG. 5 is extended in FIG. 6 by the addition of some random access memory (RAM). The operation of RAM is described in Section 3. What the device does is determined by the way in which the RAM address is controlled. If the RAM address increases by one every time a sample from the ADC is stored in the RAM, a recording can be made for a short period until the RAM is full. The recording can be played back by repeating the address sequence at the same clock rate but reading the memory into the DAC. The result is generally called a sampler. By running the replay clock at various rates, the pitch and duration of the reproduced sound can be altered. At a rate of one million bits per second, a megabyte of memory gives only eight seconds' worth of recording, so clearly samplers will be restricted to a fairly short playing time.

FIG. 6 In the digital sampler, the recording medium is a random access memory (RAM). Recording time available is short compared with other media, but access to the recording is immediate and flexible as it’s controlled by addressing the RAM.

Using compression, the playing time of a RAM-based recorder can be extended. Some telephone answering machines take messages in RAM and eliminate the cassette tape. For pre-determined messages read only memory (ROM) can be used instead as it’s non-volatile. Announcements in aircraft, trains and elevators are one application of such devices. RAM based recorders are now available which can download suitably compressed audio data over the Internet. Having no moving parts these are highly portable.

7. The programmable delay

If the RAM is used in a different way, it can be written and read at the same time. The device then becomes an audio delay. Controlling the relationship between the addresses then changes the delay. The addresses are generated by counters which overflow to zero after they have reached a maximum count. As a result the memory space appears to be circular as shown in FIG. 7. The read and write addresses are driven by a common clock and chase one another around the circle. If the read address follows close behind the write address, the delay is short. If it just stays ahead of the write address, the maximum delay is reached. Programmable delays are useful in TV studios where they allow audio to be aligned with video which has been delayed in various processes. They can also be used in auditoria to align the sound from various loudspeakers.

One of the earliest digital audio products was a delay unit of the type shown here which was used to delay the signal leading to a vinyl disk cutter. The cutter control system could use the input to the delay to obtain advance warning of a loud passage and increase the groove pitch accordingly.

FIG. 7 If the memory address is arranged to come from a counter which overflows, the memory can be made to appear circular. The write address then rotates endlessly, overwriting previous data once per revolution. The read address can follow the write address by a variable distance (not exceeding one revolution) and so a variable delay takes place between reading and writing.

FIG. 8 In time compression, the unbroken real-time stream of samples from an ADC is broken up into discrete blocks. This is accomplished by the configuration shown here. Samples are written into one RAM at the sampling rate by the write clock. When the first RAM is full, the switches change over, and writing continues into the second RAM whilst the first is read using a higher-frequency clock. The RAM is read faster than it was written and so all the data will be output before the other RAM is full. This opens spaces in the data flow which are used as described in the text.

(cont to part 2)

Prev. | Next

Top of Page   All Related Articles    Home

Updated: Thursday, 2017-10-12 18:55 PST