Digital Audio--Compact Disc (CD) (part 3)

Home | Audio mag. | Stereo Review mag. | High Fidelity mag. | AE/AA mag.

CD-i

The Compact Disc Interactive (CD-i) standard was devised as a product-specific application of the CD-ROM format.

CD-i permits storage of a simultaneous combination of audio, video, graphics, and text, and defines specific data formats for these. In addition, titles can function with real time interactivity. For example, a CD-i dictionary might contain a word and its definitions, as well as spoken pronunciation, pictures, and translations into foreign languages. The CD-i standard, codified in the Green Book (issued in 1986), defines how each type of information is encoded as well as logical layout of files on the disc. It also specifies how hardware reads discs and decodes information.

The CD-i data format is derived from the CD-ROM Mode 2 format. CD-i data is arranged in 2352-byte blocks, as in the CD-ROM/XA format. The CD-i format accepts either PCM or ADPCM (adaptive differential pulse-code modulation) data. The full-motion video (FMV) extension allows storage of 74 minutes of full-motion digital video and stereo audio. The MPEG-1 coding standard is used to reduce the video bit rate to 1.15 Mbps and the audio rate to 0.22 Mbps; lower rates can also be used. CD-i players can also play Video CDs coded with MPEG-1. MPEG-1 audio is described in Section 11 and MPEG-1 video in Section 16.

To ensure universal compatibility, dedicated hardware and interfaces are defined. The CD-Bridge format adds information to a CD-ROM/XA disc so it can be played on a CD-i player. Bridge tracks use Mode 2 data, tracks are listed in the TOC as a CD-ROM/XA track, and block layout is identical to CD-i and CD-ROM/XA. The Photo CD is an example of a Bridge disc. The CD-i format did not enjoy success among its targeted consumers.

Photo CD

The Photo CD is used to professionally store, manipulate, and display photographic images. Photographs can be viewed or reproduced as high-quality prints of images using a color printer. The 35-mm version of the Photo CD provides three to four times the resolution required in any high-definition television (HDTV) standard. Conventional photographic images can be scanned to the Photo CD, with 2048 scan lines across the short dimension of a 35 mm frame, with 3072 pixels on each line to yield a 3:2 aspect ratio. Data compression and decomposition are used to increase storage efficiency. During authoring, high resolution image files are subjected to a 4:1 data reduction.

In addition, file sizes can be reduced without significant visual loss by using chroma sub-sampling to take advantage of limitations in human visual perception. The Photo CD was developed by Kodak and is defined in the Beige Book.

Photo CD discs conform to the Orange Book Part II standard and are physically identical to CD-R audio discs; however, different data headers make them incompatible.

Data blocks are written according to the CD-ROM/XA, Mode 2, Form 1 standard. Because discs use the CD Bridge format, they are playable on CD-ROM/XA players.

Because the Orange Book Part II permits additional multisession recording to a disc, images can be added over time. Pacs initially recorded on a disc are structured as a file using the ISO 9660 structure. Subsequently recorded Pacs use a CD-R Volume and File Structures format, using the multisession method. All Pacs are addressed through the block-addressing method used by CD-ROM discs and defined by the ISO/IEC 10149 standard. Because the Photo CD adheres to the CD ROM/XA format, audio and video data can be interleaved; in this way, a soundtrack can accompany visuals. The Picture CD consumer format similarly stores photographic files on a CD-R disc; it provides 1024 × 1536 resolution using JPEG compression. The disc also contains software used to view and edit the photographs.

CD + G and CD + MIDI

The CD + G and CD + MIDI formats were devised to encode graphics or MIDI software on CDs, in addition to regular audio data. Special hardware or software is required to access this data. Eight subcode channels are accumulated over 98 frames; thus, each 98-bit subcode word is output at a 75-Hz rate. Subcode synchronization occupies the first two frames, thus a subcode block contains eight channels with 96 data bits. This data block is called a packet, and each quarter of a packet is called a pack. A pack is generated every 3.3 ms. Only P and Q are reserved for audio control information. Over the length of a CD, the remaining channels, R to W, provide about 25 Mbytes of 8-bit data. Utilization of that capacity has been promoted as CD + G or CD + Graphics, and CD + MIDI , sometimes known as CD + G/M. The player decodes the graphics or MIDI data separately from the audio data. In CD + G discs, data is collected over thousands of CD frames to form video images or other data fields. For example, a CD + G audio disc can contain video images, liner notes, librettos, or other information. Because video images require a large amount of data for storage, CD + G images provide limited resolution.

In the CD + MIDI application, MIDI (Musical Instrument Digital Interface) information is stored in the subcode field, and output synchronously with the audio playback. External MIDI instruments can synchronize to the melody or other musical parameters of an encoded disc. The subcode capacity is sufficient to store up to 16 channels of MIDI information. MIDI information can be supplemented with graphics information; for example, music notation could be supplied. Another variation can encode music notation in the subcode area to allow print out of sheet music. CD + G/M discs are compatible with any CD player, but only players equipped with CD + G/M output ports can retrieve the information from the disc. Alternatively, an external decoder can be connected to any CD player with a digital output port, provided that the full subcode data is available from the port. CD + G is sometimes used for karaoke applications.

CD-3

In addition to regular 120-mm-diameter CD discs, the CD 3 format describes 80-mm-diameter discs. The name derives from the approximately 3-in diameter. This small size promotes greater portability and the format is useful for short audio programs. A CD-3 disc holds a maximum of 20 minutes of music. Because a CD data track begins at the innermost radius, CD-3 discs are compatible with regular discs and players. Some players have concentric rings in their disc drawers to center both diameter discs over the spindle. The CD-3 format is also used to hold over 200 Mbytes of CD-ROM data. The CD-3 format is also used for CD-R and CD-RW discs.

Video CD

The Video CD format is an outgrowth of the CD-i standard; full-motion video was added to the original CD-i standard and that feature was subsequently revised in 1992 to form the Video CD standard. The Video CD uses the MPEG-1 coding standard for audio and video. The audio signal is coded with the Layer II standard at 44.1 kHz. A disc stores about 74 minutes of full-motion digital video and audio; a feature film is placed on two discs. The video decoder chip permits full-motion video (FMV) to be shown at either 29.97 (NTSC) or 25 (PAL/SECAM) frames per second at 352 pixels by 240 lines and 352 pixels by 288 lines, respectively, one-fourth the resolution of DVD's normal mode. The Video CD may be shown as a quarter-screen image. The video bit rate is 1.15 Mbps and the audio bit rate is 0.22 Mbps. The Video CD format is a CD-ROM/XA Bridge disc, Mode 2, Form 2; this allows a Video CD to play on a CD-ROM drive. A Video CD disc will not play on a CD-Audio player, but will play in many DVD players.

Video CD is described in the White Book; version 1.0 of this specification was originally developed in 1992 for karaoke discs and in 1995 it was extended to version 2.0, which supported interactive video. The Video CD is different from the CD-Video format, now abandoned. The MPEG-1 video algorithm is discussed in Section 16. MPEG 1 audio is discussed in Section 11.

The Super Video CD (SVCD) is an enhanced version of the Video CD designed primarily for higher-quality movie playback. SVCD uses MPEG-2 coding for video compression to store about 70 minutes on a disc. The NTSC resolution is 480 × 480, and PAL resolution is 470 × 576-about three-fourths that of DVD's normal mode. Dual mono, stereo or 5.1-channel soundtracks can be used at bit rates ranging from 32 kbps to 384 kbps using MPEG-1 Layer II or MPEG-2 multichannel codecs. Uncompressed audio cannot be stored. The maximum data rate is 2.2 Mbps by virtue of a 2 × drive. However, at the higher data rate, playback time is halved to about 35 minutes; a movie might occupy three discs. Copy Generation Management System (CGMS) copy protection can be enabled. SVCD's development was sponsored by the Chinese government as a low-cost alternative to DVD. Other technical aspects were derived from the Video CD format and the China Video CD (CVD). The SVCD specification was ratified by the China National Committee of Recording Standards in September 1998. SVCD is also standardized in the IEC 62107 document. A similar specification, the Chao-Ji ("Super") VCD standard, was developed to support both China Video CD and SVCD; many SVCD players and changers support the Chao-Ji standard and most discs use the SVCD format. The DSVCD (Double SVCD) format uses a smaller track pitch to permit longer high-quality playing times of about 60 minutes.

Super Audio CD

When the Compact Disc was launched in 1982, it was rightly heralded as a data carrier of immense storage capacity. However, over time the CD seemed increasingly small. Moreover, some audiophiles argued that its specifications constrained audio fidelity. In particular, the CD was insufficient for the large file sizes and high bit rates required by surround sound and high sampling frequency audio. In 1999, Philips and Sony introduced the high density Super Audio CD standard, known as SACD. The SACD format supports discrete-channel (two-channel and multichannel) audio recordings, using the proprietary one bit Direct Stream Digital (DSD) coding method. DSD uses a high sampling frequency and achieves a flat frequency response to 100 kHz and a dynamic range of 120 dB in the 0- to 20-kHz band.

SACD players can play both SACD and CD discs. SACD is not compatible with the DVD or Blu-ray formats.

The mechanical and optical properties of an SACD disc are similar to those of a DVD-5 disc; however, the logical layout of content, the data format, and the copy protection measures are different. DSD data is not playable in standard DVD or Blu-ray drives, but a CD layer, if present on an SACD disc, is playable. Some players may include decoders to accommodate multiple disc formats. Other data such as text and graphics (but not video) can be included on an SACD disc; this content follows the Blue Book "Enhanced CD" standard. The SACD standard is sometimes known as the Scarlet Book, published in March 1999.

FIG. 31 A hybrid SACD disc contains two data layers (high-density and CD). The two layers are bonded together to form a disc with a thickness of 1.2 mm. (Verbakel et al., 1998)

FIG. 32 Both the high-density layer and CD layer in a hybrid SACD are read from one side by a laser. The high density layer is semi-reflective, while the CD layer is fully reflective.

Disc Design

SACD discs use the same dimensions as a CD: 12-cm diameter and 1.2-mm thickness. The laser wavelength is 650 nm, the lens NA is 0.60, the minimum pit/land length is 0.40 µm, and the track pitch is 0.74 µm. (The pertinent CD figures are 780 nm, 0.45, 0.83 µm, and 1.6 µm.) Software providers may choose from three disc types specified in the SACD format: single-layer, dual-layer, and hybrid disc construction. The single-layer disc contains one layer of high-density DSD content (4.7 Gbytes); for two-channel stereo, this provides about 110 minutes of playing time.

The dual-layer disc contains two layers of high-density content (8.5 Gbytes total). The hybrid disc is a dual-layer disc that contains one layer of high-density DSD content (4.7 Gbytes) and one layer of Red Book compatible stereo content (680 Mbytes), as shown in FIG. 31. The semi reflective high-density layer must be reflective (readable) at the 650-nm wavelength of SACD, and transparent at the 780-nm wavelength used by conventional CD players; in other words, it acts as a color filter. The high-density layer is 0.6 mm from the readout surface and the CD layer is 1.2 mm from the surface. An SACD player can read both layers, and a CD player can read the CD layer.

In dual-layer discs, two 0.6-mm substrates are bonded together. In all implementations, there is only one data side.

A semi-reflective layer (20 to 40% reflective and approximately 0.05 µm in thickness) is used on the embedded inner data layer; in some cases, a silicon based dielectric film is used. A fully reflective top metal layer (at least 70% reflective and approximately 0.05 µm in thickness) is used on the outer data surface. This surface is protected by an acrylic layer (approximately 10 µm in thickness) and a printed label. Care must be taken to seal a hybrid disc to limit water absorption and evaporation from the substrate; unequal absorption between the two disc sides could cause disc warpage. The back side is inherently protected by a metal layer and a lacquer layer while the front side is nominally unprotected, thus a front side transparent silicon-based coating (10 nm to 15 nm) is needed. A hybrid disc in which a dual pickup (650 nm and 780 nm) is used to read both SACD and CD data is shown in FIG. 32.

FIG. 33 The SACD high-density data layer is designed to carry both two-channel and multichannel audio data.

The data on an SACD disc is grouped into sectors of 2064 bytes. This comprises: Identification Data (ID) of 4 bytes, ID Error Detection (IED) of 2 bytes, Reserved of 6 bytes, Main Data of 2048 bytes, and Error Detection Code (EDC) of 4 bytes. During encoding, following scrambling, 16 sectors form an error-correction code block, which is processed with a scheme using a Reed-Solomon Product Code. Rows of ECC blocks are interleaved and grouped into recording frames. Frames undergo EFMPlus modulation. Data is then placed in Physical Sectors and recorded to disc.

The radius of the high-density layer is segmented for different kinds of data, as shown in FIG. 33. The innermost radius contains the disc lead-in area, followed by the data area. It is divided into several areas including a Master Table of Contents (Master TOC) containing information on tracks and timing, as well as text data on the title and artist. The Master TOC is stored in three places (sectors 510, 520, and 530) to ensure readability. The next two radial areas are given to two-channel and multichannel recordings (up to six channels). The two-channel and multichannel areas use the same basic structure. The Area TOC for each audio area is placed at the beginning and end of each area. They contain track, sampling frequency, timing, and text information about the tracks included in that section. The SACD standard permits up to 255 tracks.

Audio tracks contain two types of streams: audio elementary stream and supplementary data elementary stream; they are multiplexed. In addition, there are sequences of audio frames each with a timecode, and supplementary data frames for pictures, text, and graphics; each frame represents 1/75 second. Following the audio tracks, there is an area for optional data such as text, graphics, and video. This data can only be accessed by a file system; its format is not specified in the SACD specification. The outermost radius holds the disc lead-out.

SACD discs can be read using a hierarchical TOC, or by optionally using a UDF or ISO 9660 file system.

FIG. 34 In principle, DSD coding is based on a one-bit quantization method. A. A one-bit quantizer produces a square wave output. B. The output square wave from a one bit quantizer yields a large difference signal.

All SACD discs incorporate an invisible watermark that is physically embedded in the substrate of the disc. Virtually impossible to copy, the watermark is used to conduct mutual authentication of the player and the disc. SACD players read the watermark and will reject any discs that do not bear an authentic watermark. Visible watermarks on the signal side of the disc in the form of faint images or letters may also be employed. A process called Pit Signal Processing (PSP) uses a controlled array of pit widths to create both invisible and visible watermarks; user data stored as pit/land lengths is unaffected by this watermarking.

DSD Modulation

Whereas all CD discs carry PCM data, all SACD discs carry Direct Stream Digital (DSD) data, in which audio signals are coded in one-bit pulse density form using sigma-delta modulation. Most conventional analog-to digital (A/D) converters use sigma-delta techniques in which the input signal is upsampled to a high sampling frequency. The signal is passed through a decimation filter and also quantized for output as a PCM signal at a nominal sampling frequency of 44.1 kHz (for CD) and up to 192 kHz (for DVD-Audio or Blu-ray). Likewise, many D/A converters use oversampling to increase the sampling frequency of the output signal, to move the image spectra from the audio band. As in PCM systems, DSD begins with a high sampling frequency, but unlike PCM systems, DSD does not require decimation filtering and PCM quantization in the recording process; instead, the original sampling frequency of 2.8224 MHz is retained. One-bit data is recorded directly on the disc. Unlike PCM, DSD does not employ interpolation (oversampling) filtering in the playback process. In other words, the basic DSD specification is based on the direct output of a typical sigma-delta A/D converter at 64 × 44.1 kHz.

FIG. 35 DSD coding uses a sigma-delta coding technique. A. A sigma-delta modulator uses negative feedback to subtract a compensation signal from the input. B. The output signal from a sigma-delta modulator is a pulse-density waveform.

DSD uses sigma-delta modulation and noise shaping. A simple one-bit quantizer is shown in FIG. 34A, and the output waveform resulting from a sine-wave input is shown in FIG. 34B. The shaded portion shows the difference error between the input waveform and the quantized output waveform. An example of a simple sigma-delta encoder is shown in FIG. 35A. The one-bit output signal is also used as an error signal and delayed by one sample and subtracted from the input analog signal. If the input waveform, accumulated over one sampling period, rises above the value accumulated in the negative feedback loop during previous samples, the converter outputs a 1 value.

Similarly, if the waveform falls relative to the accumulated value, a 0 value is output. Fully positive waveforms will generate all 1 values and fully negative waveforms will generate all 0 values. This method of returning output error data to the input signal to be subtracted as compensation data is called negative feedback.

FIG. 36 Noise-shaping algorithms are designed to reduce the low-frequency (in-band) quantization error, but also increase high-frequency (out-of-band) content.

FIG. 35B shows an input sine wave applied to a sigma-delta encoder and the resulting output signal. The pulses of the output signal reflect the magnitude of the input signal; this is a pulse density modulation representation in which a 0 value has no pulse output while a 1 value does.

The shaded portion shows the difference error; analysis shows that the volume of error is the same as in a simple quantizer; however, because the integrator (sigma) in the sigma-delta encoder acts as a lowpass filter, the amount of low-frequency error is reduced while the amount of high frequency error is increased, as shown in FIG. 36. The system's designers note that the ear is sensitive to very high-frequency signals only if they are correlated to lower in-band signals. At frequencies higher than 20 kHz, they state that signal-to-noise ratios become less important.

Thus, they argue that the uncorrelated high-frequency shaped noise is perceptually unimportant. This noise shaping property can be developed with higher-order (perhaps 5th order) noise shaping feedback filters to further decrease error in the audible range of frequencies. In principle, a lowpass filter can decode sigma-delta signals.

Such a low-pass filter would also remove high-frequency noise resulting from noise shaping. The principles of sigma-delta modulation and noise shaping are discussed more fully in Section 18.

FIG. 37 DSD coding used in the SACD format requires significant noise shaping to reduce low-frequency noise.

However, this significantly increases high-frequency noise above 20 kHz.

The DSD modulation used in the SACD format uses a sampling frequency that is 2.8224 MHz. In other words, the analog signal is sampled at a 2.8224 MHz rate and each sample is quantized as a one-bit word. Overall, the bit rate is thus four times higher than on a CD. In principle, the Nyquist frequency is thus 1.4112 MHz. However, in practice, to remove high-frequency noise introduced by high-order noise shaping, the high frequency response is limited to 100 kHz or less by analog filters. As shown in Fig. 37, a significant noise-shaping component is present in the 100-kHz band, as anticipated by the SACD standard.

The SACD standard specifies that noise power in the 100-kHz band should be 20 dB below the standard reference level. When a 100-kHz lowpass filter is used, at a volume level that achieves a 100-watt output, this noise component is thus 1 watt or less. However, at higher volume levels, the SACD standard recommends that SACD players incorporate a lowpass filter with a corner frequency of 50 kHz and a minimum 30-dB/octave slope for use with most conventional power amplifiers and speakers.

When making audio measurements of the SACD, a 20-kHz lowpass filter (such as the 3344A filter by NF Electronic Instruments with 60 dB of attenuation above 24.1 kHz) is recommended to avoid the effects of the shaped components in the higher frequency range.

The 2.8224 MHz (64 × 44.1 kHz) sampling frequency of the one-bit DSD signal can be converted to a variety of standard PCM sampling frequencies with integer computation. Division by 64 and 32 yields 44.1 and 88.2 kHz. Following multiplication by 5, division by 441, 294, and 147 yields 32, 48, and 96 kHz, respectively. Also, an extended sampling frequency of 128 × 44.1 kHz is possible.

DST Lossless Coding

A lossless coding algorithm known as Direct Stream Transfer (DST) is employed in the SACD format to more than double effective disc capacity. Eight DSD channels (six multichannel plus a stereo mix) on a 4.7-Gbyte data layer are allowed a playing time of 27 minutes, 45 seconds.

With DST, a 74-minute playing time is accommodated, effectively increasing storage capacity to about 12 Gbytes.

As with other lossless compression methods, the compression achieved by DCT depends on the audio signal itself. In one survey, DCT yielded a coding gain of 2.4 to 2.5 for pop music, and 2.6 to 2.7 for classical music.

FIG. 38 Direct Stream Transfer (DST) can be used for lossless coding of DSD data using an adaptive prediction filter and entropy (arithmetic) coding. A. DST encoder. B. DST decoder.

The DST encoder and decoder are shown in FIG. 38.

DST uses data framing, an adaptive prediction filter and entropy coding. The use of lossless coding can be decided on a frame-by-frame basis; the flag information for the decoder is contained in each frame header. An area without any DST frames can be marked accordingly in the area TOC. DST coding yields variably sized frames; a buffer model is used to output a fixed bit rate. The theory of lossless coding is discussed in Section 10.

Player Design

SACD players play back both SACD and CD discs. Their design is similar to that of CD players. Dual laser pickups are required to operate at both the SACD 650-nm wavelength and the CD 780-nm wavelength. In some player designs, a single processor accepts the amplified RF signal from the dual pickup and performs clock signal extraction and synchronization, as well as demodulation and error correction for both CD and SACD signals. A servo chip controls the pickup and motor systems. CD data is passed along to the digital filter. SACD data is applied to the DSD decoder; this circuit first reads the invisible watermark, then intermittent data is rearranged and ordered in a buffer memory according to a master clock.

This chip also reads subcode data, including TOC information such as track number, time, and text data.

DSD data is output as a one-bit signal at a frequency of 2.8224 MHz and applied to a pulse-density modulation processor in which the data signal is converted to a complementary signal in which each 1 value creates a wide pulse and each 0 value creates a narrow pulse. A current pulse D/A converter converts the voltage pulse output into a current pulse. This current pulse signal is passed through an analog lowpass filter to create the analog audio waveform. In some designs, this filter's response measures -3 dB at 50 kHz.

Prev. | Next