Digital Audio: Conversion: Choice of sampling rate, Sample and hold, Sampling clock jitter

Home | Audio mag. | Stereo Review mag. | High Fidelity mag. | AE/AA mag.

Choice of sampling rate

Sampling theory is only the beginning of the process which must be followed to arrive at a suitable sampling rate. The finite slope of realizable filters will compel designers to raise the sampling rate. For consumer products, the lower the sampling rate, the better, since the cost of the medium is directly proportional to the sampling rate: thus sampling rates near to twice 20 kHz are to be expected. For professional products, there is a need to operate at variable speed for pitch correction.

When the speed of a digital recorder is reduced, the off-tape sampling rate falls, and FGR. 10 shows that with a minimal sampling rate the first image frequency can become low enough to pass the reconstruction filter.

If the sampling frequency is raised without changing the response of the filters, the speed can be reduced without this problem. It follows that variable-speed recorders, generally those with stationary heads, must use a higher sampling rate.

In the early days of digital audio research, the necessary bandwidth of about 1 megabit per second per audio channel was difficult to store. Disk drives had the bandwidth but not the capacity for long recording time, so attention turned to video recorders. In Section 9 it will be seen that these were adapted to store audio samples by creating a pseudo-video waveform which could convey binary as black and white levels. The sampling rate of such a system is constrained to relate simply to the field rate and field structure of the television standard used, so that an integer number of samples can be stored on each usable TV line in the field. Such a recording can be made on a monochrome recorder, and these recordings are made in two standards, 525 lines at 60Hz and 625 lines at 50Hz. Thus it’s possible to find a frequency which is a common multiple of the two and also suitable for use as a sampling rate.

The allowable sampling rates in a pseudo-video system can be deduced by multiplying the field rate by the number of active lines in a field (blanked lines cannot be used) and again by the number of samples in a line. By careful choice of parameters it’s possible to use either 525/60 or 625/50 video with a sampling rate of 44.1 kHz.

In 60Hz video, there are 35 blanked lines, leaving 490 lines per frame, or 245 lines per field for samples. If three samples are stored per line, the sampling rate becomes:

60 x245x3 = 44.1 kHz

In 50Hz video, there are 37 lines of blanking, leaving 588 active lines per frame, or 294 per field, so the same sampling rate is given by 50.00 x294x 3 = 44.1 kHz.

The sampling rate of 44.1 kHz came to be that of the Compact Disc. Even though CD has no video circuitry, the equipment originally used to make CD masters was video based and determines the sampling rate.

For landlines to FM stereo broadcast transmitters having a 15 kHz audio bandwidth, the sampling rate of 32 kHz is more than adequate, and has been in use for some time in the United Kingdom and Japan. This frequency is also in use in the NICAM 728 stereo TV sound system and in DAB. It’s also used for the Sony NT format mini-cassette. The professional sampling rate of 48 kHz was proposed as having a simple relationship to 32 kHz, being far enough above 40 kHz for variable-speed operation.

Although in a perfect world the adoption of a single sampling rate might have had virtues, for practical and economic reasons digital audio now has essentially three rates to support: 32 kHz for broadcast, 44.1 kHz for CD and its mastering equipment, and 48 kHz for 'professional' use.

In fact the use of 48 kHz is not as common as its title would indicate. The runaway success of CD has meant that much equipment is run at 44.1 kHz to suit CD. With the advent of digital filters, which can track the sampling rate, a higher sampling rate is no longer necessary for pitch changing. 48 kHz is extensively used in television where it can be synchronized to both line standards relatively easily. The currently available DVTR formats offer only 48 kHz audio sampling. A number of formats can operate at more than one sampling rate. Both DAT and DASH formats are specified for all three rates, although not all available hardware implements every possibility. Most hard disk recorders will operate at a range of rates.

Recently there have been proposals calling for dramatically increased audio sampling rates. These are misguided and won’t be considered further here. The subject will, however, be treated in Section 13.

===

FGR. 11 (a) The simple track-hold circuit shown has poor frequency response as the resistance of the FET causes a rolloff in conjunction with the capacitor. In (b) the resistance of the FET is now inside a feedback loop and will be eliminated, provided the left-hand op-amp never runs out of gain or swing.

===

FGR. 12 Characteristics of the feedback track-hold circuit of FGR. 11(b) showing major sources of error.

===

Sample and hold

In practice many analog to digital convertors require a finite time to operate, and instantaneous samples must be extended by a device called a sample-and-hold or, more accurately, a track-hold circuit.

The simplest possible track-hold circuit is shown in FGR. 11(a).

When the switch is closed, the output will follow the input. When the switch is opened, the capacitor holds the signal voltage which existed at the instant of opening. This simple arrangement has a number of shortcomings, particularly the time constant of the on-resistance of the switch with the capacitor, which extends the settling time. The effect can be alleviated by putting the switch in a feedback loop as shown in FGR. 11(b). The buffer amplifiers must meet a stringent specification, because they need bandwidth well in excess of audio frequencies to ensure that operation is always feedback controlled between holding periods. When the switch is opened, the slightest change in input voltage causes the input buffer to saturate, and it must be able to rapidly recover from this condition when the switch next closes. The feedback minimizes the effect of the on-resistance of the switch, but the off-resistance must be high to prevent the input signal affecting the held voltage. The leakage current of the integrator must be low to prevent droop which is the term given to an unwanted slow change in the held voltage.

FGR. 12 shows the various events during a track-hold sequence and catalogs the various potential sources of inaccuracy. A further phenomenon which is not shown in FGR. 12 is that of dielectric relaxation.

When a capacitor is discharged rapidly by connecting a low resistance path across its terminals, not all the charge is removed. After the discharge circuit is disconnected, the capacitor voltage may rise again slightly as charge which was trapped in the high-resistivity dielectric slowly leaks back to the electrodes. In track-hold circuits dielectric relaxation can cause the value of one sample to be affected by the previous one. Some dielectrics display less relaxation than others. Mica capacitors, traditionally regarded as being of high quality, actually display substantially worse relaxation characteristics than many other types. Polypropylene and Teflon are significantly better.

The track-hold circuit is extremely difficult to design because of the accuracy demanded by audio applications. In particular it’s very difficult to meet the droop specification for much more than sixteen-bit applications. Greater accuracy has been reported by modeling the effect of dielectric relaxation and applying an inverse correction signal.

When a performance limitation such as the track-hold stage is found, it’s better to find an alternative approach. It will be seen later in this section that more advanced conversion techniques allow the track-hold circuit and its shortcomings to be eliminated.

===

FGR. 13 The effect of sampling timing jitter on noise, and calculation of the required accuracy for a sixteen-bit system. (a) Ramp sampled with jitter has error proportional to slope. (b) When jitter is removed by later circuits, error appears as noise added to samples. For a sixteen-bit system there are 216 Q, and the maximum slope at 20 kHz will be 20 000_ _ 216 Q per second. If jitter is to be neglected, the noise must be less than 1/2Q, thus timing accuracy t multiplied by maximum slope = 1/2Q or 20 000_ _ 216 Qt = 1/2Q

===

FGR. 14 Effects of sample clock jitter on signal-to-noise ratio at different frequencies, compared with theoretical noise floors of systems with different resolutions.

===

Sampling clock jitter

The instants at which samples are taken in an ADC and the instants at which DACs make conversions must be evenly spaced, otherwise unwanted signals can be added to the audio. FGR. 13 shows the effect of sampling clock jitter on a sloping waveform. Samples are taken at the wrong times. When these samples have passed through a system, the timebase correction stage prior to the DAC will remove the jitter, and the result is shown at (b). The magnitude of the unwanted signal is proportional to the slope of the audio waveform and so the amount of jitter which can be tolerated falls at 6 dB per octave. As the resolution of the system is increased by the use of longer sample wordlength, tolerance to jitter is further reduced. The nature of the unwanted signal depends on the spectrum of the jitter. If the jitter is random, the effect is noise-like and relatively benign unless the amplitude is excessive. FGR. 14 shows the effect of differing amounts of random jitter with respect to the noise floor of various wordlengths. Note that even small amounts of jitter can degrade a twenty-bit convertor to the performance of a good sixteen-bit unit. There is thus no point in upgrading to higher-resolution convertors if the clock stability of the system is insufficient to allow their performance to be realized.

Clock jitter is not necessarily random. FGR. 15 shows that one source of clock jitter is crosstalk or interference on the clock signal. A balanced clock line will be more immune to such crosstalk, but the consumer electrical digital audio interface is unbalanced and prone to external interference. The unwanted additional signal changes the time at which the sloping clock signal appears to cross the threshold voltage of the clock receiver. This is simply the same phenomenon as that of FGR. 13 but in reverse. The threshold itself may be changed by ripple on the clock receiver power supply. There is no reason why these effects should be random; they may be periodic and potentially audible.

===

FGR. 15 Crosstalk in transmission can result in unwanted signals being added to the clock waveform. It can be seen here that a low-frequency interference signal affects the slicing of the clock and causes a periodic jitter.

===

The allowable jitter is measured in picoseconds, as shown in FGR. 13 and clearly steps must be taken to eliminate it by design. Convertor clocks must be generated from clean power supplies which are well decoupled from the power used by the logic because a convertor clock must have a signal-to-noise ratio of the same order as that of the audio.

Otherwise noise on the clock causes jitter which in turn causes noise in the audio.

Power supply ripple from conventional 50/60Hz transformer rectifiers is difficult to eliminate, but these supplies are giving way to switched mode power supplies on grounds of cost and efficiency. If the switched mode power supply is locked to the sampling clock, the power supply ripple is sampled at its own frequency and appears to be DC. Clock jitter is thus avoided and samples are taken in between switching transients.

This approach is used in some digital multi-track recorders where the amount of logic and power required is considerable. In variable-speed operation the power supply switching speed varies along with the capstan speed and the sampling rate.

If an external clock source is used, it cannot be used directly, but must be fed through a well-designed, well-damped phase-locked loop which will filter out the jitter. The operation of a phase-locked loop was described in Section 2. The phase-locked loop must be built to a higher accuracy standard than in most applications. Noise reaching the frequency control element will cause the very jitter the device is meant to eliminate. Some designs use a crystal oscillator whose natural frequency can be shifted slightly by a varicap diode. The high Q of the crystal produces a cleaner clock. Unfortunately this high Q also means that the frequency swing which can be achieved is quite small. It’s sufficient for locking to a single standard sampling rate reference, but not for locking to a range of sampling rates or for variable-speed operation. In this case a conventional varicap VCO is required. Some machines can switch between a crystal VCO and a wideband VCO depending on the sampling rate accuracy. As will be seen in Section 8, the AES/EBU interface has provision for conveying sampling rate accuracy in the channel status data and this could be used to select the appropriate oscillator. Some machines which need to operate at variable speed but with the highest quality use a double-phase-locked loop arrangement where the residual jitter in the first loop is further reduced by the second. The external clock signal is sometimes fed into the clean circuitry using an optical coupler to improve isolation.

Although it has been documented for many years, attention to control of clock jitter is not as great in actual hardware as it might be. It accounts for much of the slight audible differences between convertors reproducing the same data. A well-engineered convertor should substantially reject jitter on an external clock and should sound the same when reproducing the same data irrespective of the source of the data. A remote convertor which sounds different when reproducing, for example, the same Compact Disc via the digital outputs of a variety of CD players is simply not well engineered and should be rejected. Similarly if the effect of changing the type of digital cable feeding the convertor can be heard, the unit is a dud. Unfortunately many consumer external DACs fall into this category, as the steps outlined above have not been taken. Some consumer external DACs, however, have RAM timebase correction which has a large enough correction range that the convertor can run from a local fixed frequency crystal. The incoming clock does no more than control the memory write cycles. Any incoming jitter is rejected totally.

Many portable digital machines have compromised jitter performance because their small size and weight constraints make the provision of adequate screening, decoupling and phase-locked loop circuits difficult.

===

FGR. 16 Frequency response with 100 percent aperture has nulls at multiples of sampling rate. Area of interest is up to half sampling rate.

===

Prev. | Next