|Home | Audio Magazine | Stereo Review magazine | Good Sound | Troubleshooting|
<< cont. from part 1
8. Time compression
When samples are converted, the ADC must run at a constant clock rate and it outputs an unbroken stream of samples. Time compression allows the sample stream to be broken into blocks for convenient handling.
FIG. 8 shows an ADC feeding a pair of RAMs. When one is being written by the ADC, the other can be read, and vice versa. As soon as the first RAM is full, the ADC output switched to the input of the other RAM so that there is no loss of samples. The first RAM can then be read at a higher clock rate than the sampling rate. As a result the RAM is read in less time than it took to write it, and the output from the system then pauses until the second RAM is full. The samples are now time compressed. Instead of being an unbroken stream which is difficult to handle, the samples are now arranged in blocks with convenient pauses in between them. In these pauses numerous processes can take place. A rotary head recorder might switch heads; a hard disk might move to another track. On a tape recording, the time compression of the audio samples allows time for synchronizing patterns, subcode and error correction words to be recorded.
In digital audio recorders based on video cassette recorders (VCRs) time compression allows the continuous audio samples to be placed in blocks in the unblanked parts of the video waveform, separated by synchronizing pulses.
Subsequently, any time compression can be reversed by time expansion. Samples are written into a RAM at the incoming clock rate, but read out at the standard sampling rate. Unless there is a design fault, time compression is totally inaudible. In a recorder, the time expansion stage can be combined with the timebase correction stage so that speed variations in the medium can be eliminated at the same time. The use of time compression is universal in digital audio recording. In general the instantaneous data rate at the medium is not the same as the rate at the convert ors, although clearly the average rate must be the same.
Another application of time compression is to allow more than one channel of audio to be carried in a single channel. If, For example, audio samples are time compressed by a factor of two, it’s possible to carry samples from a stereo source in one cable. In digital video recorders both audio and video data are time compressed so that they can share the same heads and tape tracks.
Transfer of samples between digital audio devices in real time is only possible if both use a common sampling rate and they are synchronized.
A digital audio recorder must be able to synchronize to the sampling rate of a digital input in order to record the samples. It’s frequently necessary for such a recorder to be able to play back locked to an external sampling rate reference so that it can be connected to, For example, a digital mixer.
The process is already common in video systems but now extends to digital audio. Section 8 describes a digital audio reference signal (DARS).
FIG. 9 shows how the external reference locking process works. The timebase expansion is controlled by the external reference which becomes the read clock for the RAM and so determines the rate at which the RAM address changes. In the case of a digital tape deck, the write clock for the RAM would be proportional to the tape speed. If the tape is going too fast, the write address will catch up with the read address in the memory, whereas if the tape is going too slow the read address will catch up with the write address. The tape speed is controlled by subtracting the read address from the write address. The address difference is used to control the tape speed. Thus if the tape speed is too high, the memory will fill faster than it’s being emptied, and the address difference will grow larger than normal. This slows down the tape.
Thus in a digital recorder the speed of the medium is constantly changing to keep the data rate correct. Clearly this is inaudible as properly engineered timebase correction totally isolates any instabilities on the medium from the data fed to the convert or
In multitrack recorders, the various tracks can be synchronized to sample accuracy so that no timing errors can exist between the tracks.
Extra transports can be slaved to the first to the same degree of accuracy if more tracks are required. In stereo recorders image shift due to phase errors is eliminated.
In order to replay without a reference, perhaps to provide an analog output, a digital recorder generates a sampling clock locally by means of a crystal oscillator. Provision will be made on professional machines to switch between internal and external references.
10 Error correction and concealment
As anyone familiar with analog recording will know, magnetic tape is an imperfect medium. It suffers from noise and dropouts, which in analog recording are audible. In a digital recording of binary data, a bit is either correct or wrong, with no intermediate stage. Small amounts of noise are rejected, but inevitably, infrequent noise impulses cause some individual bits to be in error. Dropouts cause a larger number of bits in one place to be in error. An error of this kind is called a burst error. Whatever the medium and whatever the nature of the mechanism responsible, data are either recovered correctly, or suffer some combination of bit errors and burst errors. In Compact Disc and DVD, random errors can be caused by imperfections in the molding process, whereas burst errors are due to contamination or scratching of the disc surface.
The audibility of a bit error depends upon which bit of the sample is involved. If the LSB of one sample was in error in a loud passage of music, the effect would be totally masked and no-one could detect it.
Conversely, if the MSB of one sample was in error in a quiet passage, no one could fail to notice the resulting loud transient. Clearly a means is needed to render errors from the medium inaudible. This is the purpose of error correction.
In binary, a bit has only two states. If it’s wrong, it’s only necessary to reverse the state and it must be right. Thus the correction process is trivial and perfect. The main difficulty is in reliably identifying the bits which are in error. This is done by coding the data by adding redundant bits.
Adding redundancy is not confined to digital technology, airliners have several engines and cars have twin braking systems. Clearly the more failures which have to be handled, the more redundancy is needed. If a four-engine airliner is designed to fly normally with one engine failed, three of the engines have enough power to reach cruise speed, and the fourth one is redundant. The amount of redundancy is equal to the amount of failure which can be handled. In the case of the failure of two engines, the plane can still fly, but it must slow down; this is graceful degradation. Clearly the chances of a two-engine failure on the same flight are remote.
In digital audio, the amount of error which can be corrected is proportional to the amount of redundancy, and it will be shown in Section 7 that within this limit, the samples are returned to exactly their original value. Consequently corrected samples are audibly indistinguishable from the originals. If the amount of error exceeds the amount of redundancy, correction is not possible, and, in order to allow graceful degradation, concealment will be used. Concealment is a process where the value of a missing sample is estimated from those nearby. The estimated sample value is not necessarily exactly the same as the original, and so under some circumstances concealment can be audible, especially if it’s frequent. However, in a well-designed system, concealments occur with negligible frequency unless there is an actual fault or problem.
Concealment is made possible by rearranging or shuffling the sample sequence prior to recording. This is shown in FIG. 10 where odd numbered samples are separated from even-numbered samples prior to recording. The odd and even sets of samples may be recorded in different places, so that an uncorrectable burst error only affects one set.
On replay, the samples are recombined into their natural sequence, and the error is now split up so that it results in every other sample being lost. The waveform is now described half as often, but can still be reproduced with some loss of accuracy. This is better than not being reproduced at all even if it’s not perfect. Almost all digital recorders use such an odd/even shuffle for concealment. Clearly if any errors are fully correctable, the shuffle is a waste of time; it’s only needed if correction is not possible.
In high-density recorders, more data are lost in a given sized dropout.
Adding redundancy equal to the size of a dropout to every code is inefficient. FIG. 11 shows that the efficiency of the system can be raised using interleaving. Sequential samples from the ADC are assembled into codes, but these are not recorded in their natural sequence. A number of sequential codes are assembled along rows in a memory.
When the memory is full, it’s copied to the medium by reading down columns. On replay, the samples need to be de-interleaved to return them to their natural sequence. This is done by writing samples from tape into a memory in columns, and when it’s full, the memory is read in rows.
Samples read from the memory are now in their original sequence so there is no effect on the recording. However, if a burst error occurs on the medium, it will damage sequential samples in a vertical direction in the de-interleave memory. When the memory is read, a single large error is broken down into a number of small errors whose size is exactly equal to the correcting power of the codes and the correction is performed with maximum efficiency.
The interleave, de-interleave, time compression and timebase correction processes cause delay and this is evident in the time taken before audio emerges after starting a digital machine. Confidence replay takes place later than the distance between record and replay heads would indicate. In stationary head recorders, confidence replay may be about one tenth of a second behind the input. Synchronous recording requires new techniques to overcome the effect of the delays.
The presence of an error-correction system means that the audio quality is independent of the tape/head quality within limits. There is no point in trying to assess the health of a machine by listening to it, as this won’t reveal whether the error rate is normal or within a whisker of failure. The only useful procedure is to monitor the frequency with which errors are being corrected, and to compare it with normal figures.
Professional digital audio equipment should have an error rate display.
Some people claim to be able to hear error correction and misguidedly conclude that the above theory is flawed. Not all digital audio machines are properly engineered, however, and if the DAC shares a common power supply with the error- correction logic, a burst of errors will raise the current taken by the logic, which in turn loads the power supply and interferes with the operation of the DAC. The effect is harder to eliminate in small battery-powered machines where space for screening and decoupling components is difficult to find, but it’s only a matter of good engineering; there is no flaw in the theory.
11 Channel coding
In most recorders used for storing digital information, the medium carries a track which reproduces a single waveform. The audio samples have to be recorded serially, one bit at a time. Some media, such as CD, only have one track, so it must be totally self-contained. Other media, such as digital compact cassette (DCC) have many parallel tracks. At high recording densities, physical tolerances cause phase shifts, or timing errors, between parallel tracks and so it’s not possible to read them in parallel. Each track must still be self-contained until the replayed signal has been timebase corrected.
Recording data serially is not as simple as connecting the serial output of a shift register to the head. In digital audio, a common sample value is all zeros, as this corresponds to silence. If a shift register is loaded with all zeros and shifted out serially, the output stays at a constant low level, and no events are recorded on the track. On replay there is nothing to indicate how many zeros were present, or even how fast to move the medium.
Clearly serialized raw data cannot be recorded directly, it has to be modulated into a waveform which contains an embedded clock irrespective of the values of the bits in the samples. On replay a circuit called a data separator can lock to the embedded clock and use it to count and separate strings of identical bits.
The process of modulating serial data to make it self-clocking is called channel coding. Channel coding also shapes the spectrum of the serialized waveform to make it more efficient. With a good channel code, more data can be stored on a given medium. Spectrum shaping is used in optical disks to prevent the data from interfering with the focus and tracking servos, and in DAT to allow rerecording without erase heads.
Channel coding is also needed to broadcast digital information where shaping of the spectrum is an obvious requirement to avoid interference with other services. NICAM TV sound, digital video broadcasting (DVB) and digital audio broadcasting (DAB) rely on it.
All the techniques of channel coding are covered in detail in Section 6 and digital broadcasting is considered in Section 9.
The human hearing system comprises not only the physical organs but also processes taking place within the brain. One of purposes of the subconscious processing is to limit the amount of information presented to the conscious mind, to prevent stress and to make everyday life safer and easier. Section 2 shows how auditory masking selects only the most important frequencies from the spectrum applied to the ear. Compression takes advantage of this process to reduce the amount of data needed to carry sound of a given subjective quality. The data-reduction process mimics the operation of the hearing mechanism as there is little point in recording information only for the ear to discard it. Compression is explained in detail in Section 5.
Compression is essential for services such as DAB and DVB where the bandwidth needed to broadcast regular PCM would be excessive. It can be used to reduce consumption of the medium in consumer recorders such as DCC and MiniDisc. Reduction to around one quarter or one fifth of the PCM data rate with small loss of quality is possible with high quality compression systems. Greater compression factors inevitably result in quality loss which may be acceptable for certain applications such as communications but not for quality music reproduction.
The output of a compressor is called an elementary stream. This is still binary data, but it’s no longer regular PCM, so it cannot be fed to a normal DAC without passing through a matching decoder which provides a conventional PCM output. Compressed data are more sensitive to bit errors than PCM data and concealment is more complex to implement.
There are numerous proprietary compression algorithms, and each needs the appropriate decoder to return to PCM. The combination of a compressor and a decoder is called a codec. The performance of a codec is tested on a single pass, as it would be for use in DAB or in a single generation recording. The same performance is not necessarily obtained if codecs are cascaded, particularly if they are of different types. If an equalization step is performed on audio which has been through a codec, artifacts may be raised above the masking threshold. As a result, compression may not be suitable for the recording of original material prior to post-production.
13 Hard disk recorders
The hard disk stores data on concentric tracks which it accesses by moving the head radially. Clearly while the head is moving it cannot transfer data. Using time compression, a hard disk drive can be made into an audio recorder with the addition of a certain amount of memory.
FIG. 12 shows the principle. The instantaneous data rate of the disk drive is far in excess of the sampling rate at the convert or, and so a large time-compression factor can be used. The disk drive can read a block of data from disk, and place it in the timebase corrector in a fraction of the real time it represents in the audio waveform. As the timebase corrector read address steadily advances through the memory, the disk drive has time to move the heads to another track before the memory runs out of data. When there is sufficient space in the memory for another block, the drive is commanded to read, and fills up the space. Although the data transfer at the medium is highly discontinuous, the buffer memory provides an unbroken stream of samples to the DAC and so continuous audio is obtained.
Recording is performed by using the memory to assemble samples until the contents of one disk block is available. This is then transferred to disk at high data rate. The drive can then reposition the head before the next block is available in memory.
An advantage of hard disks is that access to the audio is much quicker than with tape, as all the data are available within the time taken to move the head. This speeds up editing considerably. As hard disks offer so much to digital audio, the entirety of Section 10 is devoted to them.
The use of compression allows the recording time of a disk to be extended considerably. This technique is often used in personal computers or organizers to allow them to function as a recorder.
14 The PCM adaptor
The PCM adaptor was an early solution to recording the wide bandwidth of PCM audio before high density digital recording developed. The video recorder offered sufficient bandwidth at moderate tape consumption. Whilst they were a breakthrough at the time of their introduction, by modern standards PCM adaptors are crude and obsolescent. FIG. 13 shows the essential components of a digital audio recorder using this technique. Input analog audio is converted to digital and time compressed to fit into the parts of the video waveform which are not blanked. Time-compressed samples are then odd-even shuffled to allow concealment. Next, redundancy is added and the data are interleaved for recording. The data are serialized and set on the active line of the video signal as black and white levels shown in FIG. 14. The video is sent to the recorder, where the analog FM modulator switches between two frequencies representing the black and white levels, a system called frequency shift keying (FSK). This takes the place of the channel coder in a conventional digital recorder.
On replay the FM demodulator of the video recorder acts to return the FSK recording to the black/white video waveform which is sent to the PCM adaptor. The PCM adaptor extracts a clock from the video sync pulses and uses it to separate the serially recorded bits. Error correction is performed after de-interleaving, unless the errors are too great, in which case concealment is used after the de-shuffle. The samples are then returned to the standard sampling rate by the timebase expansion process, which also eliminates any speed variations from the recorder.
They can then be converted back to the analog domain.
In order to synchronize playback to a reference and to simplify the circuitry, a whole number of samples is recorded on each unblanked line.
The common sampling rate of 44.1 kHz is obtained by recording three samples per line on 245 active lines at 60Hz. The sampling rate is thus locked to the video sync frequencies and the tape is made to move at the correct speed by sending the video recorder syncs which are generated in the PCM adaptor.
15 An open-reel digital recorder
FIG. 15 shows the block diagram of a machine of this type. Analog inputs are converted to the digital domain by convert ors Clearly there will be one convert or for every audio channel to be recorded. Unlike an analog machine, there is not necessarily one tape track per audio channel.
In stereo machines the two channels of audio samples may be distributed over a number of tracks each in order to reduce the tape speed and extend the playing time.
The samples from the convert or will be separated into odd and even for concealment purposes, and usually one set of samples will be delayed with respect to the other before recording. The continuous stream of samples from the convert or will be broken into blocks by time compression prior to recording. Time compression allows the insertion of edit gaps, addresses and redundancy into the data stream. An interleaving process is also necessary to reorder the samples prior to recording. As explained above, the subsequent de-interleaving breaks up the effects of burst errors on replay.
The result of the processes so far is still raw data, and these will need to be channel coded before they can be recorded on the medium. On replay a data separator reverses the channel coding to give the original raw data with the addition of some errors. Following de-interleave, the errors are reduced in size and are more readily correctable. The memory required for de-interleave may double as the timebase correction memory, so that variations in the speed of the tape are rendered undetectable. Any errors which are beyond the power of the correction system will be concealed after the odd-even shift is reversed. Following conversion in the DAC an analog output emerges.
On replay a digital recorder works rather differently from an analog recorder, which simply drives the tape at constant speed. In contrast, a digital recorder drives the tape at constant sampling rate. The timebase corrector works by reading samples out to the convert or at constant frequency. This reference frequency comes typically from a crystal oscillator. If the tape goes too fast, the memory will be written faster than it’s being read, and will eventually overflow. Conversely, if the tape goes too slow, the memory will become exhausted of data. In order to avoid these problems, the speed of the tape is controlled by the quantity of data in the memory. If the memory is filling up, the tape slows down, if the memory is becoming empty, the tape speeds up. As a result, the tape will be driven at whatever speed is necessary to obtain the correct sampling rate.
16 Rotary head digital recorders
The rotary head recorder borrows technology from videorecorders.
Rotary heads have a number of advantages which will be detailed in Section 9. One of these is extremely high packing density: the number of data bits which can be recorded in a given space. In a digital audio recorder packing density directly translates into the playing time available for a given size of the medium.
In a rotary head recorder, the heads are mounted in a revolving drum and the tape is wrapped around the surface of the drum in a helix as can be seen in FIG. 16. The helical tape path results in the heads traversing the tape in a series of diagonal or slanting tracks. The space between the tracks is controlled not by head design but by the speed of the tape and in modern recorders this space is reduced to zero with corresponding improvement in packing density.
The added complexity of the rotating heads and the circuitry necessary to control them is offset by the improvement in density. These techniques are detailed in Section 8. The discontinuous tracks of the rotary head recorder are naturally compatible with time compressed data. As FIG. 16 illustrates, the audio samples are time compressed into blocks each of which can be contained in one slant track.
In a machine such as DAT (rotary-head digital audio tape) there are two heads mounted on opposite sides of the drum. One rotation of the drum lays down two tracks. Effective concealment can be had by recording odd-numbered samples on one track of the pair and even numbered samples on the other.
As can be seen from the block diagram shown in FIG. 17, a rotary head recorder contains the same basic steps as any digital audio recorder.
The record side needs ADCs, time compression, the addition of redundancy for error correction, and channel coding. On replay the channel coding is reversed by the data separator, errors are broken up by the de-interleave process and corrected or concealed, and the time compression and any fluctuations from the transport are removed by timebase correction. The corrected, time stable, samples are then fed to the DAC.
17 Digital Compact Cassette
Digital Compact Cassette (DCC) is a consumer digital audio recorder using compression. Although the convert ors at either end of the machine work with PCM data, these data are not directly recorded, but are reduced to one quarter of their normal rate by processing. This allows a reasonable tape consumption similar to that achieved by a rotary head recorder. In a sense the complexity of the rotary head transport has been exchanged for the electronic complexity of the compression and expansion circuitry.
FIG. 18 shows that DCC uses stationary heads in a conventional tape transport which can also play analog cassettes. Data are distributed over eight parallel tracks which occupy half the width of the tape. At the end of the tape the head rotates and plays the other eight tracks in reverse. The advantage of the conventional approach with linear tracks is that tape duplication can be carried out at high speed. This makes DCC attractive to record companies.
Owing to the low frequencies recorded, DCC has to use active heads which actually measure the flux on the tape. These magneto-resistive heads are more complex than conventional inductive heads, and have only recently become economic as manufacturing techniques have been developed. DCC is treated in detail in Section 9.
As was introduced in section 1.12, compression relies on the phenomenon of auditory masking and this may effectively restrict DCC to being a consumer format. It will be seen from FIG. 19 that the compression unit adjacent to the input is complemented by the expansion unit or decoder prior to the DAC.
18 Digital audio broadcasting
Digital audio broadcasting operates by modulating the transmitter with audio data instead of an analog waveform. Analog modulation works reasonably well for fixed reception sites where a decent directional antenna can be erected at a selected location, but has serious short-comings for mobile reception where there is no control over the location and a large directional antenna is out of the question. The greatest drawback of broadcasting is multipath reception, where the direct signal is received along with delayed echoes from large reflecting bodies such as high-rise buildings. At certain wavelengths the reflection is received antiphase to the direct signal, and cancellation takes place which causes a notch in the received spectrum. In an analog system loss of the signal is inevitable.
In DAB, several digital audio broadcasts are merged into one transmission which is wider than the multipath notches. The data from the different signals are distributed uniformly within the channel so that a notch removes a small part of each channel instead of all of one.
Sufficient data are received to allow error correction to re-create the missing values.
A DAB receiver actually receives the entire transmission and the process of 'tuning in' the desired channel is now performed by selecting the appropriate data channel for conversion to analog, making a DAB receiver easier to operate.
DAB resists multipath reception to permit mobile reception and the improvement to reception in car radios is dramatic. The data rate of PCM audio is too great to allow it to be economic for DAB. Compression is essential and this is detailed in Section 5.
This allows a slow linear tape speed which can only be read with an MR head. The compression unit is mirrored by the decoder on replay.
19 Audio in PCs
Whilst the quality digital audio permits in undeniable, the potential of digital audio may turn out to be more important in the long term. Once audio becomes data, there is tremendous freedom to store and process it in computer-related equipment. The restrictions of analog technology are no longer applicable, yet we often needlessly build restrictions into equipment by making a digital replica of an analog system. The analog system evolved to operate within the restrictions imposed by the technology. To take the same system and merely digitize it’s to miss the point.
A good example of missing the point was the development of the stereo quarter-inch digital audio tape recorder with open reels. Open-reel tape is sub-optimal for high-density digital recording because it’s unprotected from contamination. The recorded wavelengths must be kept reasonably long or the reliability will be poor. Thus the tape consumption of these machines was excessive and more efficient cassette technologies such as DAT proved to have lower purchase cost and running costs as well as being a fraction of the size and weight. The speed and flexibility with which editing could be carried out by hard disk systems took away any remaining advantage. Quarter-inch digital tape found itself trapped between DAT and hard disks and passed into history because it was the wrong approach.
Part of the problem of missed opportunity is that traditionally, professional audio equipment manufacturers have specialized in one area leaving users to assemble systems from several suppliers. Mixer manufacturers may have no expertise in recording. Tape recorder manufacturers may have no knowledge of disk drives.
In contrast, computer companies have always taken a systems view and con figure disks, tapes, RAM, processors and communications links as necessary to meet a given requirement. Now that audio is another form of data, this approach is being used to solve audio problems.
Small note guide computers are increasingly available with microphones and audio convert ors so that they can act as dictating machines. A personal computer with high-quality audio convert ors, compression algorithms and sufficient disk storage becomes an audio recorder. The recording levels and the timer are displayed on screen and soft keys become the rewind, record, etc. controls for the virtual recorder. The recordings can be edited to sample accuracy on disk, with displays of the waveforms in the area of the in and out points on screen. Once edited, the audio data can be sent anywhere in the world using telephone modems and data networks. The PC can be programmed to dial the destination itself at a selected time. At the same time as sending the audio, text files can be sent, along with images from a CCD camera. Without digital technology such a device would be unthinkable.
The market for such devices may well be captured by those with digital backgrounds, but not necessarily in audio. Computer, calculator and other consumer electronics manufacturers have the wider view of the potential of digital techniques.
Digital also blurs the distinction between consumer and professional equipment. In the traditional analog audio world, professional equipment sounded better but cost a lot more than consumer equipment. Now that digital technology is here, the sound quality is determined by the convert ors Once converted, the audio is data. If a bit can only convey whether it’s one or zero, how does it know if it’s a professional bit or a consumer bit? What is a professional disk drive? The cost of a digital product is a function not of its complexity, but of the volume to be sold.
Professional equipment may be forced to use chip sets and transports designed for the volume market because the cost of designing an alternative is prohibitive. A professional machine may be a consumer machine in a stronger box with XLRs instead of phono sockets and PPM level meters. It may be that there will be little room for traditional professional audio manufacturers in the long term.
The conventional analog routing structure used in professional installations was simply replicated in the digital domain by the AES/EBU digital audio interface. However, using computer data approaches digital audio routing can also be achieved using networks, interconnecting a number of file servers which store the audio data with workstations from which the recordings can be manipulated. No dedicated audio routing hardware is required. Section 8 considers how data networks operate.
1. Daniel, E.D., Mee, C.D. and Clark, M.H. (eds), Magnetic Recording: The first 100 years, Piscataway: IEEE Press (1999)