Prev. | Next
.
Choice of sampling rate
Sampling theory is only the beginning of the process which must be followed
to arrive at a suitable sampling rate. The finite slope of realizable filters
will compel designers to raise the sampling rate. For consumer products, the
lower the sampling rate, the better, since the cost of the medium is directly
proportional to the sampling rate: thus sampling rates near to twice 20 kHz
are to be expected. For professional products, there is a need to operate at
variable speed for pitch correction.
When the speed of a digital recorder is reduced, the off-tape sampling rate
falls, and FGR. 10 shows that with a minimal sampling rate the first image
frequency can become low enough to pass the reconstruction filter.
If the sampling frequency is raised without changing the response of the filters,
the speed can be reduced without this problem. It follows that variable-speed
recorders, generally those with stationary heads, must use a higher sampling
rate.
In the early days of digital audio research, the necessary bandwidth of about
1 megabit per second per audio channel was difficult to store. Disk drives
had the bandwidth but not the capacity for long recording time, so attention
turned to video recorders. In Section 9 it will be seen that these were adapted
to store audio samples by creating a pseudo-video waveform which could convey
binary as black and white levels. The sampling rate of such a system is constrained
to relate simply to the field rate and field structure of the television standard
used, so that an integer number of samples can be stored on each usable TV
line in the field. Such a recording can be made on a monochrome recorder, and
these recordings are made in two standards, 525 lines at 60Hz and 625 lines
at 50Hz. Thus it’s possible to find a frequency which is a common multiple
of the two and also suitable for use as a sampling rate.
The allowable sampling rates in a pseudo-video system can be deduced by multiplying
the field rate by the number of active lines in a field (blanked lines cannot
be used) and again by the number of samples in a line. By careful choice of
parameters it’s possible to use either 525/60 or 625/50 video with a sampling
rate of 44.1 kHz.
In 60Hz video, there are 35 blanked lines, leaving 490 lines per frame, or
245 lines per field for samples. If three samples are stored per line, the
sampling rate becomes:
60 x245x3 = 44.1 kHz
In 50Hz video, there are 37 lines of blanking, leaving 588 active lines per
frame, or 294 per field, so the same sampling rate is given by 50.00 x294x
3 = 44.1 kHz.
The sampling rate of 44.1 kHz came to be that of the Compact Disc. Even though
CD has no video circuitry, the equipment originally used to make CD masters
was video based and determines the sampling rate.
For landlines to FM stereo broadcast transmitters having a 15 kHz audio bandwidth,
the sampling rate of 32 kHz is more than adequate, and has been in use for
some time in the United Kingdom and Japan. This frequency is also in use in
the NICAM 728 stereo TV sound system and in DAB. It’s also used for the Sony
NT format mini-cassette. The professional sampling rate of 48 kHz was proposed
as having a simple relationship to 32 kHz, being far enough above 40 kHz for
variable-speed operation.
Although in a perfect world the adoption of a single sampling rate might have
had virtues, for practical and economic reasons digital audio now has essentially
three rates to support: 32 kHz for broadcast, 44.1 kHz for CD and its mastering
equipment, and 48 kHz for 'professional' use.
In fact the use of 48 kHz is not as common as its title would indicate. The
runaway success of CD has meant that much equipment is run at 44.1 kHz to suit
CD. With the advent of digital filters, which can track the sampling rate,
a higher sampling rate is no longer necessary for pitch changing. 48 kHz is
extensively used in television where it can be synchronized to both line standards
relatively easily. The currently available DVTR formats offer only 48 kHz audio
sampling. A number of formats can operate at more than one sampling rate. Both
DAT and DASH formats are specified for all three rates, although not all available
hardware implements every possibility. Most hard disk recorders will operate
at a range of rates.
Recently there have been proposals calling for dramatically increased audio
sampling rates. These are misguided and won’t be considered further here. The
subject will, however, be treated in Section 13.
===
FGR. 11 (a) The simple track-hold circuit shown has poor frequency response
as the resistance of the FET causes a rolloff in conjunction with the capacitor.
In (b) the resistance of the FET is now inside a feedback loop and will be
eliminated, provided the left-hand op-amp never runs out of gain or swing.
===
FGR. 12 Characteristics of the feedback track-hold circuit of FGR. 11(b) showing
major sources of error.
===
Sample and hold
In practice many analog to digital convertors require a finite time to operate,
and instantaneous samples must be extended by a device called a sample-and-hold
or, more accurately, a track-hold circuit.
The simplest possible track-hold circuit is shown in FGR. 11(a).
When the switch is closed, the output will follow the input. When the switch
is opened, the capacitor holds the signal voltage which existed at the instant
of opening. This simple arrangement has a number of shortcomings, particularly
the time constant of the on-resistance of the switch with the capacitor, which
extends the settling time. The effect can be alleviated by putting the switch
in a feedback loop as shown in FGR. 11(b). The buffer amplifiers must meet
a stringent specification, because they need bandwidth well in excess of audio
frequencies to ensure that operation is always feedback controlled between
holding periods. When the switch is opened, the slightest change in input voltage
causes the input buffer to saturate, and it must be able to rapidly recover
from this condition when the switch next closes. The feedback minimizes the
effect of the on-resistance of the switch, but the off-resistance must be high
to prevent the input signal affecting the held voltage. The leakage current
of the integrator must be low to prevent droop which is the term given to an
unwanted slow change in the held voltage.
FGR. 12 shows the various events during a track-hold sequence and catalogs
the various potential sources of inaccuracy. A further phenomenon which is
not shown in FGR. 12 is that of dielectric relaxation.
When a capacitor is discharged rapidly by connecting a low resistance path
across its terminals, not all the charge is removed. After the discharge circuit
is disconnected, the capacitor voltage may rise again slightly as charge which
was trapped in the high-resistivity dielectric slowly leaks back to the electrodes.
In track-hold circuits dielectric relaxation can cause the value of one sample
to be affected by the previous one. Some dielectrics display less relaxation
than others. Mica capacitors, traditionally regarded as being of high quality,
actually display substantially worse relaxation characteristics than many other
types. Polypropylene and Teflon are significantly better.
The track-hold circuit is extremely difficult to design because of the accuracy
demanded by audio applications. In particular it’s very difficult to meet the
droop specification for much more than sixteen-bit applications. Greater accuracy
has been reported by modeling the effect of dielectric relaxation and applying
an inverse correction signal.
When a performance limitation such as the track-hold stage is found, it’s
better to find an alternative approach. It will be seen later in this section
that more advanced conversion techniques allow the track-hold circuit and its
shortcomings to be eliminated.
===
FGR. 13 The effect of sampling timing jitter on noise, and calculation of
the required accuracy for a sixteen-bit system. (a) Ramp sampled with jitter
has error proportional to slope. (b) When jitter is removed by later circuits,
error appears as noise added to samples. For a sixteen-bit system there are
216 Q, and the maximum slope at 20 kHz will be 20 000_ _ 216 Q per second.
If jitter is to be neglected, the noise must be less than 1/2Q, thus timing
accuracy t multiplied by maximum slope = 1/2Q or 20 000_ _ 216 Qt = 1/2Q
===
FGR. 14 Effects of sample clock jitter on signal-to-noise ratio at different
frequencies, compared with theoretical noise floors of systems with different
resolutions.
===
Sampling clock jitter
The instants at which samples are taken in an ADC and the instants at which
DACs make conversions must be evenly spaced, otherwise unwanted signals can
be added to the audio. FGR. 13 shows the effect of sampling clock jitter on
a sloping waveform. Samples are taken at the wrong times. When these samples
have passed through a system, the timebase correction stage prior to the DAC
will remove the jitter, and the result is shown at (b). The magnitude of the
unwanted signal is proportional to the slope of the audio waveform and so the
amount of jitter which can be tolerated falls at 6 dB per octave. As the resolution
of the system is increased by the use of longer sample wordlength, tolerance
to jitter is further reduced. The nature of the unwanted signal depends on
the spectrum of the jitter. If the jitter is random, the effect is noise-like
and relatively benign unless the amplitude is excessive. FGR. 14 shows the
effect of differing amounts of random jitter with respect to the noise floor
of various wordlengths. Note that even small amounts of jitter can degrade
a twenty-bit convertor to the performance of a good sixteen-bit unit. There
is thus no point in upgrading to higher-resolution convertors if the clock
stability of the system is insufficient to allow their performance to be realized.
Clock jitter is not necessarily random. FGR. 15 shows that one source of clock
jitter is crosstalk or interference on the clock signal. A balanced clock line
will be more immune to such crosstalk, but the consumer electrical digital
audio interface is unbalanced and prone to external interference. The unwanted
additional signal changes the time at which the sloping clock signal appears
to cross the threshold voltage of the clock receiver. This is simply the same
phenomenon as that of FGR. 13 but in reverse. The threshold itself may be changed
by ripple on the clock receiver power supply. There is no reason why these
effects should be random; they may be periodic and potentially audible.
===
FGR. 15 Crosstalk in transmission can result in unwanted signals being added
to the clock waveform. It can be seen here that a low-frequency interference
signal affects the slicing of the clock and causes a periodic jitter.
===
The allowable jitter is measured in picoseconds, as shown in FGR. 13 and
clearly steps must be taken to eliminate it by design. Convertor clocks must
be generated from clean power supplies which are well decoupled from the power
used by the logic because a convertor clock must have a signal-to-noise ratio
of the same order as that of the audio.
Otherwise noise on the clock causes jitter which in turn causes noise in the
audio.
Power supply ripple from conventional 50/60Hz transformer rectifiers is difficult
to eliminate, but these supplies are giving way to switched mode power supplies
on grounds of cost and efficiency. If the switched mode power supply is locked
to the sampling clock, the power supply ripple is sampled at its own frequency
and appears to be DC. Clock jitter is thus avoided and samples are taken in
between switching transients.
This approach is used in some digital multi-track recorders where the amount
of logic and power required is considerable. In variable-speed operation the
power supply switching speed varies along with the capstan speed and the sampling
rate.
If an external clock source is used, it cannot be used directly, but must
be fed through a well-designed, well-damped phase-locked loop which will filter
out the jitter. The operation of a phase-locked loop was described in Section
2. The phase-locked loop must be built to a higher accuracy standard than in
most applications. Noise reaching the frequency control element will cause
the very jitter the device is meant to eliminate. Some designs use a crystal
oscillator whose natural frequency can be shifted slightly by a varicap diode.
The high Q of the crystal produces a cleaner clock. Unfortunately this high
Q also means that the frequency swing which can be achieved is quite small.
It’s sufficient for locking to a single standard sampling rate reference, but
not for locking to a range of sampling rates or for variable-speed operation.
In this case a conventional varicap VCO is required. Some machines can switch
between a crystal VCO and a wideband VCO depending on the sampling rate accuracy.
As will be seen in Section 8, the AES/EBU interface has provision for conveying
sampling rate accuracy in the channel status data and this could be used to
select the appropriate oscillator. Some machines which need to operate at variable
speed but with the highest quality use a double-phase-locked loop arrangement
where the residual jitter in the first loop is further reduced by the second.
The external clock signal is sometimes fed into the clean circuitry using an
optical coupler to improve isolation.
Although it has been documented for many years, attention to control of clock
jitter is not as great in actual hardware as it might be. It accounts for much
of the slight audible differences between convertors reproducing the same data.
A well-engineered convertor should substantially reject jitter on an external
clock and should sound the same when reproducing the same data irrespective
of the source of the data. A remote convertor which sounds different when reproducing,
for example, the same Compact Disc via the digital outputs of a variety of
CD players is simply not well engineered and should be rejected. Similarly
if the effect of changing the type of digital cable feeding the convertor can
be heard, the unit is a dud. Unfortunately many consumer external DACs fall
into this category, as the steps outlined above have not been taken. Some consumer
external DACs, however, have RAM timebase correction which has a large enough
correction range that the convertor can run from a local fixed frequency crystal.
The incoming clock does no more than control the memory write cycles. Any incoming
jitter is rejected totally.
Many portable digital machines have compromised jitter performance because
their small size and weight constraints make the provision of adequate screening,
decoupling and phase-locked loop circuits difficult.
===
FGR. 16 Frequency response with 100 percent aperture has nulls at multiples
of sampling rate. Area of interest is up to half sampling rate.
=== |