Digital Audio Recording Systems: MiniDisc

Home | Audio mag. | Stereo Review mag. | High Fidelity mag. | AE/AA mag.

MiniDisc is the first real consumer application where the medium is not a tape, but a rewritable disc, with all its benefits. When Compact Disc came on the market, the conversion from analog audio to digital data was performed in a fairly straight forward way. In other words, the data are a true representation of the analog audio. This conversion method needs a huge amount of data if we want it to be an accurate representation of the input audio.

For MD, there was a need for a smaller disc to make it even more convenient. However, a smaller disc means automatically that the amount of data that we can put on it is also limited. The only solution then is to compress the audio data. The same has already been performed with computer data for several years, but the main difference, and in fact the big problem, is that audio is an extremely complex type of data; if we were to compress audio in the same way as normal data is compressed in computers (usually these data represent text and drawings), the result would be totally inadequate.

As a result, the compression needed special attention, where psychoacoustic basics are the key factor in the compression (and de-compression) methods.

We should keep in mind that the data which are put on a MiniDisc no longer have a straightforward relation with the analog equivalent. Now the data on the disc have a highly complex algorithm structure, enabling one to recalculate and recon struct the original input. One major benefit of this method is the fact that we can change and enhance these algorithms without changing the music itself, and without creating compatibility problems.

The following were target items when designing MiniDisc:

• Digital disc format.

• Smaller than Compact Disc, but with the same time-wise amount of music.

• High quality, up to Compact Disc level.

• Recordable.

• Enabling quick and random access, similar to Compact Disc.

• High shock-proof capability.

• Lightweight and inexpensive.

• Durable.

The rainbow book

The basic specification book for MiniDisc is called the 'rainbow book'. This name was chosen rather sensibly, as MiniDisc is based upon several other basic specification books, each of them being nicknamed another colour:

• The red book, basic specifications for Compact Disc.

• The yellow book, basic specifications for CD-ROM.

• The orange book, basic specifications for MO drives and WO drives.

All of these books are co-developments of Sony and Philips; they have been the basis for CD, CD-ROM, CD-MO/WO and also for CD-I (its specification book is called the green book; this is also a Sony/Philips co-production). They outline the basic parameters and specifications to be used for all the above-mentioned technologies. The aim of course is to have a standardized format, both for hardware and software. Such standards have enabled the growth of global markets where all manufacturers make hard ware and software which is truly compatible.

FIG. 1 shows the basic relationship between the specification books. The yellow book evolved from the red book; from both specifications the rainbow book 'borrowed' the following items:

• the basic data structure;

• the CIRC system (Cross-Interleave Reed-Solomon Correction), but extended and improved (hence the name ACIRC or Advanced CIRC);

• the basic disc geometry, even if the size of the MD is physically smaller.

From the orange book, mainly optical parameters were used, and other parts used in magneto-optical discs were a source of inspiration, like the ATIP system--a system used to ensure that the laser, while recording and reading out, will correctly follow the data track--from MO, which is closely related to the ADIP in MD.

New, important features:

• the use of a cartridge, similar to a floppy disc;

• ATRAC, the psychoacoustic compression system;

• the use of a shock-resistant memory.

FIG. 1 MD basics.

Of course, all the above-mentioned points will be explained. The basic specifications are the following:

• Play/recording time is exactly the same as on a CD, but note the much smaller size of the MiniDisc versus the CD, which has a diameter of 12 cm.

• Similar to a CD, read-out and recording on MD will be per formed starting from the inner side, close to the center.

• The track pitch--i.e. the distance between tracks--is the same as for CD: 1.6 µm or 0.0000016 m.

• Scanning velocity is also similar to CD, varying between 1.2 and 1.4 m s^-1 ; this is the rotation velocity of the disc. The variation is due to the Constant Linear Velocity used. This means that the read-out speed of data is kept constant.

However, as these data are writing on a spiral track, starting from the inside to the outside of the disc, when reading (or writing), the diameter of the circle on which the system is working is varying constantly, and therefore the disc speed needs to be adapted continuously.

• The sampling frequency and modulation system are the same as for CD.

• The optical parameters are a mix of optical parameters used in CD, but also in MO; in particular, the higher laser power is typical for systems which are able to make recordings.

Table 1 MD basics

Block diagram

The block diagram of a MiniDisc can be compared with the block diagram of a Compact Disc; it is therefore obvious that knowledge of Compact Disc systems makes the introduction to MiniDisc much easier. As CD technology has already been explained in previous sections, we can mostly concentrate on the differences at this point.

The differences between CD and MD are mainly due to:

• The recording capability of MiniDisc, which needs an A/D converter, an EFM/ACIRC encoder, a head drive and a magnetic recording head.

• The use of compression in the ATRAC encoder/decoder.

• The shock-resistant memory controller, although a similar system, is nowadays also included in some CD Discman types, but in such cases only in playback.

• The address decoder is typical for the read-out of the record able MiniDisc type.

Most of the other blocks can be related back to the Compact Disc format.

In order to have a quick understanding of the operation of MiniDisc, we will now trace the most important operation paths, and give a brief explanation; in the next sections most of these items will be repeated in more detail.

REC/PB path

• Recording of any disc starts of course with audio input; this can be an analog input or a digital input (typically S/PDIF format).

• The analog input will be A/D-converted and fed to the ATRAC encoder. The digital input is directly fed to the ATRAC encoder.

• In the ATRAC encoder, the digital audio data will be compressed according to several psychoacoustic rules. The compression ratio is about 1:5.

• The compressed data are sent to the EFM/ACIRC encoder through the shock-proof memory; shocks during recording which might disable proper recording will be absorbed here.

• The EFM/ACIRC encoder will handle the digital audio data similarly to the Compact Disc encoding; error encoding and interleaving are performed, subdata are added and all this is EFM-modulated. ACIRC and EFM circuits have to process the audio data such that they are in a correct state to be recorded.

• Recording is then performed through the magnetic recording head while the laser unit is used for the spot heating of the disc.

• Recording is not possible on the pre-mastered disc; only read out is possible. Unlike recordable discs, read-out at the beginning of pre-mastered discs is not completely the same.

FIG. 2 REC/PB path.

Playback

• Read-out will in both cases (pre-mastered/recordable) be performed by the optical pick-up unit; the magnetic recording head is not used during read-out.

• The optical detecting unit (in the pick-up) is able to detect the pit signal as well as the magnetic signal; both will be sent to the RF amplifier.

• In the case of a pre-mastered disc, the focus and tracking servo signals, similar to the Compact Disc system, will also be retrieved at this point.

• In the case of a recordable disc, not only the already known focus/tracking servo signals are seen, but also the pre-groove information is read out.

• As for the RF amplifier, there is no more difference in hand ling between pre-mastered and recordable read-out RF (radio frequency signal, or high-frequency signal).

• This RF signal is EFM-demodulated, and error-decoded in the EFM/ACIRC decoder.

• It is then sent to the shock-resistant memory controller, where it will be buffered before being input in the ATRAC decoder.

When the set endures shocks while reading out and track jumps occur due to these shocks, the buffer memory will enable continuous music output while the system recovers from the shocks.

• The demodulated data are then ATRAC-decoded; in fact, the original music is reconstructed in a transform system, which is the exact reverse of the ATRAC encoding format.

• As from this point digital audio data output is possible, and through the D/A converter analog audio is output.

Servo path--The servo operation is also comparable to the Compact Disc servo. However, as we now have more than one type of disc to handle, and as the recordable disc carries a pre-groove, the servo is more complex.

• For both pre-mastered and recordable disc, the focusing, tracking and sled servo circuits are similar.

• Note, however, that the reflectivity of a pre-mastered disc is much higher than that of the recordable disc. Due to this fact, correct Automatic Gain Control (AGC) handling is very important, especially in the case of the recordable disc. The lower return of a recordable disc has to be boosted correctly to a level where it can be read and interpreted correctly. This is performed in the RF amplifier.

FIG. 3 Servo path.

• The CLV servo is not the same for a pre-mastered and a recordable disc. In the case of the pre-mastered disc, the spindle motor is driven through the RF signal input in the EFM/ACIRC IC, and through the servo control. This is again the same as in a Compact Disc. Synchronization with the incoming data will determine adaptations to the disc rotation speed.

• The recordable disc, however, is the most complex one. Here the pre-groove is also used. The pre-groove signal is detected by the optical pick-up main spot detector, and retrieved in the RF amplifier. A specific address in pre-groove (ADIP) decoder will decode this pre-groove signal; decoded address data will then be output from the RF IC as well as clock signals also retrieved from the pre-groove. This output will be used by the EFM/ACIRC and servo side to control the spindle operation.

System control

• It is obvious that such complex actions need correct system control. For this reason, a microprocessor (also referred to as a syscon) is used; this can be a general-purpose processor driven by dedicated software, or a totally purpose-built processor. The syscon reads and writes from and to the main ICs through a dataline system, but also through many dedicated lines.

• User key input as well as internal detection switches are fed to the syscon. In this way, and also helped by the information on the datalines, the processor can determine the actual state of the system and the actual wishes of the user, and of course act accordingly.

• Based upon all the inputs, the syscon can drive each IC as well as the mechanics within the proper timing, and in a correct, co-ordinated sequence.

• Output from the syscon is also given to the display section as a feedback to the user.

CD-MD comparison

As a conclusion to this part, it is worthwhile comparing the main data handling between a CD and MiniDisc. Note that this is a simplified approach for the sole purpose of pointing out major differences.

• The highlighted part of CD processing is only performed on the disc manufacturer's side.

• This comparison clearly shows that even though a MiniDisc player is very similar to a Compact Disc player, the complexity is much higher and the amount of possible operations and processings to be performed is significantly higher.

• Note also that the above table starts from and ends with analog audio input. It is of course possible to start from and end with digital audio, basically in the S/PDIF format (Sony/ Philips Digital Interface format).

Table 2 CD-MD comparison

Physical format

Disc types

Here, immediately one of the special features of MiniDisc stands out: there is not one type of MiniDisc, but in fact there are two types.

• The first type is the non-recordable MiniDisc; this is a pre mastered disc using CD technology.

• The second type is the user-recordable disc, using CD-MO technology.

When explaining the physical format of the MiniDisc, we should remember that, although the dimensions are always the same, we do have these two different disc types.

The outlook of a MiniDisc is similar to a floppy disc; the pre-mastered/recordable medium is a 2-inch disc housed in a cartridge, closed by a shutter.

FIG. 4 Photograph of discs, one recordable (left) and one pre-mastered (right).

The main dimensions are the following:

Cartridge size 72 × 68 mm × 5mm

Weight 30 g (including disc)

Centre hole in cartridge 18 mm diameter

Disc diameter 64 mm

Clamping area 16.4 mm

Disc thickness 1.2 mm

FIG. 5 Cartridge exploded view.

FIG. 6 Cartridge layout.

FIG. 7 MD clamping plate.

There is a distinct difference in physical appearance between the pre-mastered and recordable disc types.

• In the first case, only read-out is needed; therefore, only one side of the disc needs to be opened. In this case, the shutter is only on one side, leaving room on the other side for a label (showing graphics, information, etc.).

• In the case of a recordable type, there is a need for opening both the top and the bottom sides; therefore, the shutter opens on both sides.

As is the case with most disc and cassette media, some holes and/or switches are used for information on the disc itself. On the MiniDisc cartridge, there are two important information detection holes, one to distinguish between high/low reflectivity and another for write protect/record enable.

The pre-mastered disc is of the high-reflectivity type, similar to a Compact Disc, and it cannot be recorded. It therefore has no low-reflectivity hole and a fixed (open) write protect hole.

The recordable disc is of the low-reflectivity type and can be recorded. It therefore has a low-reflection hole and next to this there is a record inhibit switch. The position of some more holes have been defined, but they are not used now; they are included for possible future use.

Note: the reflectivity factor of a disc is used to quantify the amount of light that it will reflect; this depends of course on the material used. There is no specific unit for this item. In the case of a CD, 70% of the incident light has to be reflected; for MO drives (low reflectivity), it ranges between 15% and 30%; for MiniDisc, MO drive reflectivity is between 15% and 25%.

Clamping

When remembering the cross-section of a Compact Disc, we see that the clamping area in the case of a MiniDisc is somewhat different.

Clamping means the actual holding of the disc while playing in the player; it is obvious that a disc that needs to be rotated must be held correctly without wobbling or eccentricity. For this purpose, discs are clamped. In a Compact Disc recorder, a disc is squeezed between the disc motor (for rotation) and a top clamping plate, sometimes containing a magnet.

For MiniDisc, clamping is performed through a magnetic clamping plate mounted in the disc itself and one single counter-plate on top of the disc motor. In this way, correct centering is ensured, which is very important.

Clamping as it is done in MiniDisc enables one to catch and stabilize the disc from the bottom side only, which is more efficient and avoids the need for another hole in the cartridge.

FIG. 8 CD cross sections.

Cross-sections

The pre-mastered disc is totally comparable with the Compact Disc. When looking at the cross-section, reflection of the laser beam is also ensured by the pits/bumps.

The laser return level of such a disc is considered to be of the high-reflectivity level. The recordable disc has a different cross-section.

On a substrate, an MO layer (TbFeCo) is caught between dielectric layers. These are protection layers. On top of this there is still a reflective layer, but the reflectivity in this case is much lower compared with the reflectivity from a Compact Disc. For this reason, the recordable disc is also referred to as low-reflectivity type and the signals retrieved from such a disc are handled like wise. On top of the reflective layer there is a protective layer. Over this last protective layer there is a lubricant, silicone grease.

Note that this lubricant can be deleted; it has been included to enhance the smoothness of contact between magnetic head and disc.

Address in pre-groove (ADIP)

Another important physical feature on the recordable disc is the wobble or pre-groove, containing address information; hence the name Address In Pre-groove. This is totally different from the Compact Disc, but it is similar to the CD-MO format, where there is also a wobbling pre-groove containing time information (ATIP = absolute time in pre-groove).

Note that this pre-groove is only used on the recordable disc, as the need for additional addressing (as explained hereafter) does not occur in the pre-mastered disc.

FIG. 9 MD cross sections.

FIG. 10 Pre-groove.

FIG. 11 ADIP modulation.

A sine-wave wobble is physically included into the disc; this wobbling groove coincides with the data track on its spiral way.

It is a U-shaped depression, physically stamped on the disc. This groove is the physical representation of an electronic signal, comparable to the 'old' analog phonogram recording.

The size of the laser beam when hitting the disc is somewhat larger than the pre-groove, but it will be modulated by the wobble of the pre-groove.

This wobble signal is a 22.05 kHz frequency signal which is FM-modulated with address information. This address information is exactly the same on each disc. It enables the system to control the CLV motor (the disc motor) and to know the exact location of its laser unit.

It would be incorrect to say that on a recordable disc there are pre-recorded data. A better expression would be that the pre groove modulation enables the detection of address information.

Why this ADIP, and why only on the recordable disc? All reading out of data, regardless of the medium, is highly dependent on the correct addressing. In other words, we can read out as much data as we want, but if we are not able to find out what the logical place and order of these data are it will be useless.

Any pre-recorded medium contains address data, which are of course included during encoding and manufacturing. Therefore, from the first read-out, any system is able to detect the addresses and correctly follow the data track. However, the situation on a recordable disc is different. Initially, such a disc will not contain any information, so during recording, how is the recorder supposed to know whether it is following the correct track? Or when a recording is under way, and the system would endure a vibration, how would it be able to recover if there is nothing at all on the disc? For these reasons, there has to be a fixed addressing on the disc, physically included (non-erasable) before any recording takes place. Another benefit is the fact that, as recordable discs are of low reflectivity, and as the addressing in the pre-groove will coincide with the addressing in the data stream, this double address read-out possibility may overcome errors.

ADIP encoding is of course performed prior to making the disc. The ADIP data (i.e., the address data) are bi-phase-modulated with a 6300 Hz clock. This clock is derived directly from a 44.1 kHz source; a 22.05 kHz carrier is also derived from this source.

The bi-phase signal is then FM-modulated onto the 22.05 kHz carrier; the resultant signal is used for the pre-groove. Bi-phase modulation means that the carrier will only be modulated by two possible modulation inputs, a digital '1' and a digital '0'. The modulated bit rate is 3150 bit s^-1. The frequency shift due to the modulation input is 1 kHz. In other words, depending on the input modulation data (logic '1' and '0'), the 22.05 kHz carrier frequency will shift 1 kHz up or down.

The MiniDisc will use the carrier frequency as well as the address data for CLV operation and track addressing. The contents of ADIP data will be explained later.

Physical track layout

The pre-mastered disc can again be compared with a Compact Disc. The information area consists of a lead-in area (table of contents), a program area and a lead-out area. All these areas are pre-recorded; none of them can be changed.

The recordable disc can be divided into a non-recordable area and a recordable area. The non-recordable area is similar to the lead-in area on the pre-mastered disc. It contains information concerning locations on the disc (for example, the start and end address of User Table Of Contents, UTOC), but also information such as laser power level, disc type, etc. The lead-in area of a recordable disc is a pit signal, similar to a pre-mastered disc, but in this case the reflectivity is much lower compared to a CD type.

The MiniDisc set will adapt to this lower reflectivity. The record able area starts with the UTOC. The UTOC is an area similar to a TOC, but in the case of a recordable disc, the allocation of addresses is not constant, it depends on the user. The way UTOC is handled is very similar to the way a computer allocates addresses on a floppy disc.

FIG. 12 Track layout.

One important note must be made here: similar to the use of a computer floppy disc, writing, erasing, dividing, editing, separating and all other possible data manipulations can only be considered complete when the UTOC area is updated. If, for example, we try to record data, this will of course be recorded in the data area, but when the recording is ended, the start and end addresses need to be written in UTOC. If, for one reason or another, this is not performed (the writing in UTOC), the system will not be able to consider the recorded data as valid.

Another example: when we erase a music track (data track) we do not erase the audio data at all; only the start/end address in the UTOC area will be erased. The area taken by the music data then becomes available and will be overwritten during a next recording session. The same applies for all other possible recording operations. The UTOC area also includes other information such as disc info, copy protect codes, recording time and date, start and end addresses of tracks, etc. After the UTOC area, the recordable user area takes most of the space on the disc and at the end of course a lead-out area, which in this case is record able and also contains the pre-groove.

FIG. 13 Track layout.

FIG. 14 Disc recording.

Recording on MD

• The pre-mastered disc is manufactured similarly to the Compact Disc; it is pre-recorded, and the user can only per form read-out.

• The recordable disc only contains address information in the ADIP pre-groove; on this disc it will be possible to record user audio.

• On these recordable disc types it is possible to re-record a nearly unlimited amount of times.

Recording on a MiniDisc is based upon well-known physical laws.

Many metals and alloys, once they reach a certain temperature, called the Curie point, can be magnetically influenced by external magnetic fields. In other words, by heating and applying external magnetic fields we can change the magnetic orientation of the particles of metals. The substance that we will use for this purpose is a terbium-ferrite-cobalt alloy. Many other alloys can of course be used, but this one is particularly apt as it has some features that were needed for MiniDisc:

• A fairly low Curie point (about 185°C), enabling quick heating with little power.

• A low coercivity of about 80 Oersted (6.4 kA m^-1 ), enabling stable polarity reversal with a relatively weak field for which also no high power is needed. (The higher the coercivity, the more power is needed to impose an external magnetic field onto the material.) Here, the critical reader might object that magnetic recordings have been performed for decades already, and that in most of those cases only a magnetic field was necessary, not the combi nation of heat and a magnetic field. This is a correct observation, but this simple way of magnetic recording has one tremendous drawback: it is not safe! Any magnet or magnetic field which might--by accident or not--be brought into the area of such a simple magnetic recording can erase it. The amount of valuable information that has been lost over so many years in this way is impossible to quantify. The combination of a specific tempera ture which is definitely outside the range of normal environ mental temperatures and a magnetic field will result in a recording that can only be erased or overwritten when required, which is of course much safer. Another fact is that recordings made in this way are extremely stable; they will not change significantly over a long period of time.

The recording uses Magnetic Field Modulation. Recording is performed at one time, the laser will on one side to heat the magnetic layer up to the correct temperature, and on the other side, the magnetic head will impose the correct polarity. At this point, it should be remembered that the information that is writ ten on the disc is digital information; in other words, there is only a need for two states: either logic 1 or 0. Translated to the magnetic information, this means north or south magnetization.

Recording and re-recording are performed by overwrite; the previous data--if there was any--are not erased first. A special magnetic head was developed which enables extremely quick flux reversals of approx. 100 ns. This is important to ensure precise recording, but it also implies that recording is really per formed in one stretch. The layers surrounding the magnetic layer are adapted for high-temperature handling.

Read-out of the disc

Pre-mastered disc

The pre-mastered disc is obviously read out similarly to a compact disc. The laser light level reflected from the compact disc or pre-mastered disc depends on the pits stamped in the disc.

The laser objective shown here is of course very simplistic; this will be explained later. Note that the double detector blocks (PD1 and PD2) are needed for both pre-mastered and recordable disc.

FIG. 15 Pit read-out.

FIG. 16 Recordable MiniDisc read-out.

Read-out of recordable disc

On the recordable part of the recordable disc, another physical law is involved: the laser light sent to the disc will hit the disc surface, pass through the magnetic layer and hit the reflective layer. In this way, the laser light will also be reflected similarly to the Compact Disc or pre-mastered MiniDisc. However, as there are no pits, the reflected laser light level is constant. When passing through the magnetic layer, the so-called Kerr effect takes place: the polarization of a light beam is changed when it passes through a magnetic medium.

As the magnetic layer was modulated magnetically during recording, it is obvious that the modulated contents will now pass on to the laser light.

The magnetic recording included only two possible states (north/south). The read-out polarization will also be predominantly either one of two main polarizations.

Note the following:

• During read-out, the magnetic head is not used; this part will only be used during recording.

• The recordable disc is a low-reflectivity disc, but the reflected level will be fairly constant, contrary to the pre-mastered disc.

• The same laser is used for recording as well as for read-out; for recording, the laser power will be significantly higher than when reading out.

FIG. 17 MO read-out.

The MiniDisc optical block unit

The optical block unit used in MiniDisc resembles the one used in Compact Disc and was explained in previous sections.

The main similarities are the concept used, basic design of the laser unit, the laser type used and wavelength. Even if it has already been explained that read-out of the recordable disc is based upon a different basis than the pre-mastered disc, this does not really affect the way the optical block operates during read out; the differences appear mostly in view of recording.

In view of recording, the laser power used is much higher: up to 7 mW. However , for playback, the laser power used is the same or similar as for a CD read-out: 0.5 mW.

One optical block assembly is used to read out both signals (from pre-mastered as well as from recordable disc). It is obvious that, in order to obtain this dual read-out possibility, as well as the higher power handling, new technologies need to be used. These will now be explained.

A note must be made concerning the laser power: when discussing laser power we must distinguish between total laser power and the power of the main beam. When using total laser power, we refer to the original three-beam system, as mostly used in Compact Disc, and we add the power of main beam and side beams. This is logical, as all beams are emitted from one laser diode. In the case of the MiniDisc, the total power is about 7 mW.

When using the main beam power only, we have of course a lower power (in MiniDisc, about 5 mW), as we only take a part of the total power into account. This distinction should be remembered, as there may exist some confusion when comparing publications.

FIG. 18 MD optical block.

When following the laser beam from laser unit to detector unit, we see initially the same path as in the compact disc OPU.

• The beam is emitted from the laser and passes through a diffraction grating, where the side beams (E and F beams) are created.

• As from this point we have the three-beam laser light. The collimator lens is used to create a parallel beam.

• The beams pass through a beam splitter, but at this time the beam splitter is of no real use.

• A 45° mirror will point the laser correctly to the disc.

• An objective lens is used for focusing purposes.

As explained in the disc section, on the disc there are two possibilities, either the pit signal or the magnetic signal (Kerr effect).

However, in both cases we will have a reflection of the laser light which will carry the information we need. Also, for the record able disc, the return will also carry a component from the pre groove (also referred to as 'wobble').

• The return laser light goes through the objective lens and the 45° mirror.

• In the beam splitter it is sent to the photodetector side.

• After the beam splitter, the return laser light follows a different path from the emitted (from laser diode) laser light.

• The beams are sent through a Wollaston prism; this special type of prism is used to extract from the main laser beam the polarization (north/south or X/Y) components as set by the magnetic layer on the disc.

• After the Wollaston prism, the beams are sent through a multi lens, which is a combination of a concave and a cylindrical lens, resulting in a correct spot configuration on the detector.

• The detector unit is similar to the compact disc detector unit, but two more detectors are included, the I and J detectors.

These two detectors will catch the side beams, extracted through the Wollaston prism and containing the data.

Note also the automatic power control (APC) photodiode, which is located at the left side of the beam splitter. This module is necessary to monitor the power emitted by the laser, in order to control it correctly, as such lasers are easy to break down if they emit too high a power level. In the CD optical block, the APC diode was located at the back of the laser emitter. In the laser unit which is now used in MiniDisc, a back APC photodiode would not give a correct feedback signal; the front APC diode gives the correct feedback signal (i.e., the return is directly related to the laser power).

The Wollaston principle

Before getting to the operation of the Wollaston principle, remember which kind of signals we can expect here.

We have two types of disc, pre-mastered and recordable.

• The pre-mastered disc gives a main beam related to the pit structure, and the E and F side beams related to the tracking.

• The recordable disc gives in the lead-in area a pit signal as well as E and F signal, but with a decreased level.

• After the lead-in area, the main beam polarization has been influenced by the magnetic layer on the disc (Kerr effect).

• The main beam has also been modulated by the (wobble) pre-groove.

• Besides the main beam, the E and F side beams are still present.

The Wollaston prism is a combination of two rock crystals bonded together at a precise angle of 45°.

Any incident beam will result in four outputs:

• Two main outputs, each containing a main polarization part (north/south or X/Y in the drawings, according to an orthogonal structure); these two beams are so close to each other that they can be considered and treated as an unchanged main beam, as they will practically recombine.

• The two other beams are highly important. The Wollaston prism separates the north- and south-oriented components of the incident beam and emits these components as side beams at a 45° angle.

• These side beams are called the I and J beams.

Note:

• Each beam originally has north as well as south orientation.

The Kerr effect will make one of the two more dominant.

• The E and F beams will each create side beams due to the Wollaston effect, but these side beams are not used as they are insignificant.

FIG. 19 Wollaston prism.

The I and J outputs are both derived from the main input (P vector in previous figures); the right part of FIG. 20 shows the polarization angles caused by the Kerr effect on the disc. If a laser light was polarized (either north or south), the P vector will be shifted slightly to one of both sides (I or J side). The angle is fairly small (below 0.5°), but this is enough to detect a difference between I and J outputs. ?K shows the Kerr angle in both directions.

FIG. 20 Wollaston prism operation.

The operation should be clear now; the magnetic surface on the disc polarizes the laser light. This polarization is seen by an angle shift of the original P vector to either the I or J side, due to the Wollaston prism. In this way, one of the two side beams will be larger, which can be detected. The north/south differences on the disc can therefore be detected by the differences in I and J.

The I and J beams now become the most important beams. For the recordable disc the use is obvious, as the beam reflected from the disc contains a north/south polarization related to the data.

The extraction of north/south components by the Wollaston prism will enable the read-out data correctly.

If, for example, the laser beam has been momentarily north polarized by the Kerr effect on the disc, the north-polarized component as extracted by the Wollaston prism will be significantly larger than the south component. In that case, the I beam will be detected with a higher level than the J beam. If, on the other hand, the south polarization is dominant, the J beam will become bigger. The subtraction I--J will therefore enable the MiniDisc system to interpret the magneto-optical signal and convert it to a correct data stream.

FIG. 21 Wollaston prism operation.

FIG. 22 CD detector.

However, the pre-mastered disc system also uses the I and J side beams, contrary to the Compact Disc system. It is now very important to remember that the Wollaston prism is such that the I and J components come at a 45° angle: no matter what the polarization of the incoming beam is, the sum of the I and J side beams will always give an indication of the level which is effectively wanted for the pit signal. With a laser signal which has not been influenced by the Kerr effect, we can expect the north/south component to be equal and thus the summing of I and J will give a perfect reconstruction of the original signal.

FIG. 23 MD detector.

FIG. 24 Main spot on MD pre-groove.

The detector block

Again, the detector unit is similar to the one used in a Compact Disc, but the I and J detectors have been added.

In a Compact Disc optical block, the detector layout is as shown above.

• The main spot will be detected by a block of four segments, named detectors A, B, C and D.

• This quadruple set-up enables one to detect correct focusing of the main beam. For the main (audio data) signal, the signals from each segment are added.

• For focusing the combination of (A + C) - (B + D) is made.

The fact that this calculation is zero, smaller or larger than zero reveals the relative focusing of the beam (correct, focusing lens too close or lens too far). If the beam is correctly focused, it will appear on the ABCD detectors as a circle, with an equal amount of light on either side of the detectors. If focusing is not correct, the return beam, due to the special lenses on the return path, will not be a circle, but an ellipse. The calculation algorithm will then be higher or lower than zero.

• The detectors E and F will be used for tracking. These beams are directed partly on and partly off the data track, each on one side. In this case, the calculation of E--F signals shows whether the laser is correctly centered (E - F = 0) or at the left or right from the track (E - F0 or E - F).

As mentioned before, the E and F beams also create side beams in the Wollaston prism, but these will not be detected as there are no detectors at the place where they will arrive. We now have ABCD, E, F, I and J detectors. The use of these detectors is not totally the same as in a Compact Disc:

• I and J will be used to detect the magneto-optical (I - J) as well as the pit (I + J) signal.

• E and F will still be used for tracking.

• ABCD will be used for several purposes.

• Automatic Gain Control (A + B + C + D) will give the MiniDisc player an indication of the relative strength of the return, in order to set the gains of following amplifiers.

• Focus (A+ C)--(B + D) is performed similarly to the Compact Disc.

Pre-groove (wobble) (A + D)--(C + B) detection can also be explained similarly to the focus operation. When checking the relative position of pre-groove and laser beam, and comparing this with the mathematical algorithm as shown, it should become clear that the calculation is effectively checking the difference between the left and right edges of the returned beam, and as these edges have been modulated slightly by the wobbling pattern of the pre-groove, the calculation will effectively enable detection of the ADIP sine wave.

Each signal is detected (i.e. light input is converted to electric current) and amplified. Next, according to all the required out puts, the detected signals are added or subtracted and fed to the MiniDisc player for further processing.

FIG. 25 Human hearing process.

Psychoacoustics

The MiniDisc specifications show that, on a small disc, it is possible to store the same amount of time (74 minutes) of audio data as on a CD. It is obvious that some data compression is needed to attain this goal. Such compression has to be performed while maintaining the full audio quality. This is performed in the ATRAC compression system, but before explaining the ATRAC compression system, some notes on psychoacoustics have to be made, as the compression is based upon psychoacoustics.

Psychoacoustics is related to the way we perceive sound, or in other words, the relation between a physical stimulus and the subjective sensation that this stimulus will provoke. Here, immediately the complexity becomes clear: it is related to each individual, and it also depends highly on a multitude of surrounding factors. Most measurements in this field have been performed on series of persons, this over several years. The results obtained at these test sessions can be considered good averages.

It is beyond the scope of this book to cover all of the human hearing process, but some items should be remembered:

• sound propagates through air as pressure waves;

• these pressure waves hit our ear drum;

• the combination of outer ear and ear canal, up to the ear drum, partly acts as a horn, and also partly as a resonator (open pipe);

• after the ear drum the pressure will be transposed to the inner ear, which consists of fluid-filled structures;

• in these structures, there will be a transition from sound as pressure to sound as electrochemical signals, and at the same time the sound is already analyzed into its constituent frequencies;

• the electrochemical signals are transported to the auditory systems in our brains, where they will be processed; in this way our brains perceive sound;

• processing will of course be based upon cross-relating input from our two ears.

The outside hearing system, outer ear and ear canal including the drum, are built similarly to, and react as, an analog sound system. In this respect, it is obvious why a significant part of our hearing system was named the 'drum'.

However, from the moment that sound is analyzed and trans posed to elecro-chemical signals, and also the processing that follows, this system is no longer reacting as a known analog sound system. In this respect, it should not really come as a surprise that the way digital systems such as ATRAC operate will appear similar to the way our hearing system processes sound.

The hearing range of a human being starts at about 20 Hz and can go up to just below 20 000 Hz (= 20 kHz). Here again, it should be stated that this is a purely theoretical approach, as large variations are possible depending on:

• the individual;

• ear to ear (left/right);

• age;

• health;

• fitness;

• environmental aspects.

Psychoacoustics related to MiniDisc

Some major facts in psychoacoustics have been used to enable designing the MiniDisc compression systems.

FIG. 26 Frequency spectrum and psychoacoustic effect.

FIG. 27 Frequency components and levels extracted by ATRAC.

Thresholds

The sensitivity of our hearing system is not equal or linear over the total frequency range. A sound at a given frequency must have a minimum level in order to be perceived. This minimum level is called the threshold of hearing.

FIG. 26 shows the audio frequency spectrum and the thresh old level for each frequency. Any sound above the minimum level will be perceived, and any sound below the threshold level will not be perceived.

This phenomenon can be used to our benefit. Suppose we are able to analyze sound into its constituent frequencies with their respective levels. When comparing this with the threshold curve it is possible to delete all frequencies whose levels fall below the threshold curve. In most of the cases these frequencies are noise frequencies anyway.

In this way, it will be possible to lower the amount of sound information that has to be recorded, without touching the audio quality itself.

FIG. 28 Critical bands.

Masking

Masking means that one frequency, with a higher level, can make another frequency with a lower level inaudible. This is easy to explain: suppose two persons are having a conversation, and a train passes by. The higher level sound of the train will mask the lower level voice sound.

This example is typical for simultaneous masking. Strange as it may seem, however, forward and backward masking are also possible, meaning that a higher level sound can mask a lower level sound which is preceding or following.

Sound as we perceive it is usually the composition of a multitude of single frequencies, each with its own specific level.

Suppose now that we are able to analyze sound such that we can distinguish each frequency at a given time with its level. Here, the comparison between all these levels will enable us to delete those frequencies which are masked.

Critical bands

This is also related to frequency resolution, which is the ability of our hearing system to distinguish and separate two simultaneously present signals.

Critical bands are parts of the audio spectrum where signals appear not to be separated. The sensitivity of our ears in the frequency domain within a critical band is about equal, but different compared to another critical band.

Over the whole hearing range, about 24 critical bands have been distinguished. The width of each band is not constant; these widths range from 100 Hz at lower frequencies (Hz) up to several kilohertz at higher frequencies, or in other words, our hearing is more sensitive in certain areas than in others. The critical bands approach can also be used if we can break down sound into its constituent frequencies; we are able to adapt the sensitivity of processing and thereby reduce the amount of data.

Conclusion

• Supposing we are able to analyze sound in a correct way, it is possible to apply psychoacoustic facts to it, and if this is done correctly we can use this to reduce the amount of audio data.

• This data reduction can be performed without noticeable changes to the original sound.

ATRAC

FIG. 29 CD-MD size comparison.

ATRAC is an acronym for Adaptive TRansform Acoustic Coding.

When reading this carefully, the most important actions of the ATRAC system become apparent.

• Audio input is encoded according to a transform method which is adaptive. This means that the transform method can adapt to the input signal; this is in accordance with acoustic phenomena.

• The main target of the ATRAC encoding system is to decrease the information density and in that way increase the record able time on a small disc; all this, however, without degrading the sound quality.

• The decoder, on the other hand, will restore the original audio data based upon the compressed data input. The word compression is often used for this kind of action, although the meaning in this application is not totally the same as in a computer environment.

In a normal CD system, the basic bit stream is 16 bits, two (L/R) channels and 44.1 kHz sampling rate:

16 × 2 × 44.100 = 1.4 Mbit s^-1

After further processing (EFM, CIRC, etc.), this becomes about 4 Mbit s^-1. With this bit rate, we can put 74 minutes of music on the 12-cm disc format. If we want to put the same amount of music time on a disc of nearly half that size, it is obvious that the bit stream has to be reduced drastically. There are several ways to do this, but not many of these give an acceptable result.

In most cases (example: reducing the amount of sampled bits or the sampling rate) the result will be a degraded sound.

The ATRAC system is highly sophisticated, as it is based upon complex mathematical and psychoacoustic methods. Several complex manipulations are performed similarly and in connection with one another. It is therefore very difficulty to give a simple and clear overview of this system.

FIG. 30 Fourier analysis.

FIG. 31 ATRAC block diagram.

Basics on Fourier analysis

Some basic notes on Fourier analysis are necessary for a good understanding of digital sound processing:

• Sound can be analyzed and described as a mathematical function (sine and cosine waves).

• One single frequency, seen mathematically, is one single sine function, with an amplitude (A), angle (?) and phase (?):

f(t) = A sin (fit + ?) where t is a function of time.

• From this basic mathematical formula, one can also derive that sound is a function of frequency and of time. We can describe sound as a function of frequency and also as a function of time; accordingly, we term this as frequency domain and time domain. A 'transform' is a mathematical description of the transition from one domain to another.

• Starting from the mathematical formula of one basic frequency, we can define harmonics, which are derived from this basic frequency, but whose angle is a multiple of the basic angle:

f = A sin (fit + ?) basic frequency f = A sin (2fit + ?) first harmonic

• 'Sound' is a composition of basic sine and/or cosine waves, each basic wave also having series of harmonics. Each sound can be analyzed and written into its mathematical components by using Fourier of other (usually derived) methods.

Differences in sound can be seen as differences in composition of sine/cosine waves and harmonics.

• Normally, Fourier analysis handles infinite series, but as the spectrum that we handle is limited (up to 20 kHz), we will use finite series.

FIG. 30 is a very simple example of a waveform (a) and its constituent elements (b + c + d), consisting of one basic wave and two harmonics. A Fourier analysis performed on the wave form should--if performed correctly--reveal these constituent frequencies with their respective amplitudes.

ATRAC input versus output

• The input to the ATRAC encoder is still the 16-bit, two channel, 44.1 kHz sample rate bit stream, exactly the same as in a CD system.

• The output, however, will not be the 1.4 Mbit s^-1 data stream, but it will be a 292 kbit s^-1 data stream. This is about five times less than the CD system.

• For this reason the MiniDisc compression rate is about 1:5.

ATRAC operation

We already mentioned the basic psychoacoustic phenomena that are taken into account and some basics on Fourier analysis; now both will meet.

At first, the incoming (digital) sound is analyzed for frequency band-passing. The full audio spectrum will be split into three bands: high, middle and low. This simplifies the processing and more detailed and correct handling of each specific part. This splitting is performed in so-called Quadrature Mirror Filters (QMF filters).

• The high-frequency (above 11 kHz) band is separated first.

• Next the middle- and low-frequency bands (0-5.5 kHz/ 5.5-11 kHz) are separated.

• While the middle/low-band analysis takes place, the high frequency band is delayed to keep correct timing.

The three bands will now pass the Modified Discrete Cosine Transform (MDCT) blocks separately. This is a highly complex mathematical transform method, comparable to a Fourier analysis. The input sound which is still in the time domain will be analyzed, and the constituent frequencies and their levels over a certain time slot will be extracted. At this time, a transition is made from time domain to frequency domain.

Before MDCT processing is performed, a decision needs to be taken about the block size. For this purpose, the output of each band will be monitored and evaluated. This block size relates to the time/frequency blocks that will be allocated in the MDCT processor. Simply put, it relates to the amount of processing power, processing time and resolution that will be used.

Depending on the type of input sound, a decision can be taken to allocate more or less time to it, and then analyze it on a higher or lower level. This is also a way to decrease the amount of data bits and enhance the overall efficiency.

A simple example can clarify this: when we encode music for a Compact Disc, it does not make any difference whether there is sound or not; even silent passages or passages with no sound at all will be sampled, processed and stored. Therefore, we can expect a high amount of samples (each taking 16 bits) which only mean: no sound. It is obvious that this is a waste of time and data, as well as storage space. Suppose we were able to detect these 'useless' samples and modify the amount of bits allocated to such meaningless inputs (for example, use only a couple of bits instead of 16 for each sample which has no real content), we would gain much processing time and storage space. The block decision taken in ATRAC is similar to the one of the example, but it goes a lot further.

FIG. 32 Pulse transient (a) and sine-wave constant (b).

Analysis and transform is performed on blocks of data, i.e., a certain number of samples are taken and processed as one block.

When we vary the block size, we consequently also vary the number of samples, which means the time and space allocated to the processing of that particular block.

The maximum time slot for analysis as one block is 11.6 ms.

There are eight possible other timings, the smallest being 11.6/8 = 1.45 ms.

This separation in time blocks is called Non-Uniform Time Splitting.

The idea behind this block decision is the following:

• There are several types of sound. The human hearing system recognizes this and adapts to it.

• Take, for instance, a short-impulse type sound; on the time axis this is a very short burst, therefore we do not need to allocate a big time block to analyze it. Such a short burst needs no detailed analysis, as the sound level changes are very sharp, and only the changes or the change ratio need to be known. This is a typical effect: when the differences in levels between consecutive sounds are large, our hearing system will concentrate more on the differences, not on the details.

• The second example is the opposite: take a slowly varying sinus-type sound as input; on the time axis this takes a long time, and we will need to allocate a long time slot to analyze it. At the same time it is then possible to analyze this sound in a very detailed way. This is necessary as our hearing system will do the same with such sounds. In these cases we will become very sensitive for details.

FIG. 33 Non-uniform splitting.

Apart from the explained non-uniform time splitting, we also use Non-Uniform Frequency Splitting.

This is based upon the critical bands concept, as explained in the section on psychoacoustics. There is, however, a difference: Non Uniform Frequency Splitting as used in ATRAC uses more bands; 52 bands are specified.

• For the lower frequency band, 20 critical bands are used.

• The middle- and high-frequency bands each use 16 critical bands.

It should be obvious that by using a higher amount of critical bands the precision can be very high. Also, in the area where a human ear is the most sensitive (up to approx. 4 kHz) the amount of bands will be the highest.

FIG. 34 Splitting decision algorithm.

The decision on time splitting is of course very important. It is basically performed upon comparison of a number of adjacent blocks, where the peak values are measured and compared.

Based upon this comparison, the decision is taken about the split ting mode to be used.

What is the practical use and benefit of this time/frequency splitting process?

• We can use a processing method which is similar to the processing as performed in a human hearing system.

• For each frequency band, we can define critical bands, adapted to our sensitivity in that area, and we can adapt the processing accordingly.

• For each frequency band, and adapted to the input, we can define the amount of time that will be used for processing.

• Within each frequency band, and according to the time decision, a number of so-called spectra will be calculated.

This is in fact the processing of input sound and it should be obvious that the more spectra calculated, the higher the resolution will be.

• Time/frequency splitting can in this way really adapt processing to the input sound and at the same time keep the output bit rate constant at 292 kbit s^-1.

It all comes down to making a correct choice between allocated time and processing resolution. For those parts of the sound where precision is needed, we can allocate more processing power (typically lower frequencies); for those parts of the sound where speed is of more importance (typically higher frequencies), we can allocate smaller time slots with smaller resolution, and so on.

The output of the MDCT blocks is frequency domain information, and as mentioned before, it can be seen as a frequency/level analysis of the input over a certain time block.

On this output the mentioned psychoacoustic phenomena can now be used to reduce the bit rate.

• threshold checking can be performed;

• masking checking can be performed.

The last stage is then the bit allocation (see FIG. 35, page 263); the significant information is not transmitted as audio data, comparable to a CD format. Each frequency/time data block is now known, along with its level. This is not represented by a fixed amount of bits, but by a variable amount of bits, from 0 to 15, representing the dynamic range of that part. The scale factor gives the relative level of the signal; the rest of the bits can be deleted as quantization noise.

The output of the ATRAC encoding system can in no way be compared with the data as used in a CD system. In the MiniDisc system, the data recorded on the disc describe word lengths, scale factors and spectrum data (data in each block), comparable with the floating-block data calculations in computers.

The encoded output, 292 kbit s^-1 , will be further processed (EFM, ACIRC, etc.) and recorded on the disc.

ATRAC decoding

FIG. 35 Bit allocation.

FIG. 36 ATRAC overview.

During read-out, the reverse processing is performed; after EFM and ACIRC decoding, the data are again sent to ATRAC for decoding.

The input bit stream will be de-multiplexed and checked for errors; based upon the same psychoacoustic principles as used for encoding, decoding will reconstruct and synthesize the original data. These data (in each frequency band) will be sent to an inverted MDCT block, after which they are synthesized with the other frequency bands to obtain the final correct (still digital) audio output.

In this way, the original signal will be reconstructed, minus the deleted parts, but these parts were not necessary in the first place, so the difference between original sound and ATRAC reproduced sound is minimal.

Another benefit of transferring the audio in this way is the possibility to change some of the parameters (in order to improve it even more) in the future without creating compatibility problems.

The system design is indeed such that algorithms can be changed, without creating compatibility problems, as long as basic formats are used.

ATRAC versions

The first MiniDisc sets used ATRAC version 1; since then, we have witnessed versions 2, 3, 4, 4.5 and ATRAC-R.

Each new version was obviously an improvement on the previous one, step by step improving the sonic qualities of the encoding and compression algorithms.

Where ATRAC version 1 was justifiably perceived as not being within full Hi-Fi standards (this version was only used in the first sets), version 2 already showed noticeable improvement, and as for version 3, the sound of a MiniDisc was getting very close to CD standards. Current ATRAC versions are of such remarkable quality that it needs extremely good ears or specific test equipment to be able to distinguish between CD and MiniDisc.

FIG. 37 MD decoding (a) and ATRAC decoding (b).

Table 3 Recording modes

The ways to improve ATRAC from version to version can only be described in rough terms, as they concern proprietary technology of which the design is ongoing. Algorithms such as ATRAC (or MPEG as another example) allow different manufacturers to develop their own versions, which can be very different from one another. The question arises if it leads to incompatibilities when different manufacturers would use different encoding algorithms on disc which can be recorded and played on sets from different brands. The answer is no, as long as the basic code language is used, and some basic data specifications are upheld. A comparison can be made with the PC world: using the same basic software code and some basic standards, various software companies can produce programs which run on the same PC without creating too many problems.

Most improvements between ATRAC versions are related to the following facts:

• Numbers and sizes of critical bands were reviewed and refined.

• Time and frequency splitting parameters were reviewed and made flexible.

• Scale factors and the number of bits to describe scale factors increased.

• Psychoacoustic parameters were refined and adapted more according to user needs.

• The speed and processing power of the processors used for ATRAC encoding and decoding were--similar to the PC world--raised drastically, which allowed faster and more complex calculations.

• The knowhow and technological progress as mentioned in previous points currently allows us to reprocess the audio input a second time in order to make the most of it. This two-stage processing allows a very accurate encoding: during the first stage, the audio is ATRAC processed to set its main parameters. At that time, it becomes clear at which point (frequency, scale, time) the encoder has to spend most effort to obtain the best result, but at that moment it also becomes clear at which point not that much effort is needed, or in other words, where some capacity is available. The second processing will then use this information gained during the first stage to reprocess the audio data for a better result. This two-stage process can easily be compared with a sorting exercise: suppose you needed to sort a number of items in a short time according to a combination of factors (size, colour, smell, etc.), it is obvious that you will get a better result if you were able within the same time to do a first rough sorting, and then a second one starting from the result and the experience you gained during the first one.

ATRAC3/MDLP

Apart from the previously mentioned ATRAC versions 1, 2, 3, etc. a separate ATRAC3 (not to be confused with ATRAC version 3) algorithm has also been developed and used to address the need for long-play recording and playback in MiniDisc, this being abbreviated as MDLP (MiniDisc Long-Play).

The recording modes shown in Table 3 are possible.

ATRAC3 mode encoded discs can be played back in any MiniDisc player produced after the start of ATRAC3 (ca 1999); it should be obvious that backward compatibility was hard to achieve, so it was not done. The 160-min mono mode was foreseen from the start, although not all first sets were programmed for mono play back or recording ability.

It is possible to mix music tracks which have been encoded with different ATRAC modes onto one disc, as information on the recording mode is indeed information which is recorded for each track individually.

The LP2 and LP4 modes start from the same algorithms as all other full rate ATRAC modes. A higher compression is obtained through stronger data reduction; this can be obtained in various ways: less bits for scale factors or timing, less critical bands, different thresholds, etc. It is obvious, however, that this will affect audio quality. It should therefore be known that the LP2 and LP4 modes are a tradeoff between more quality and quantity. Of course, development never stops, and even within these very limited transmission rate modes there is a continuous search for improvement.

Data format

Basic format

The data format of a MiniDisc is similar to the mode 2 of a CD ROM, yellow book specification.

FIG. 38 Cluster layout.

FIG. 39 Detailed cluster layout.

In this CD-ROM, mode 2 data are divided into sectors. The same sector format is used in MiniDisc, but another concept was included; the CD-ROM sector format is now extended to cluster format. A simple comparison of the basic MD data format would be that of a book:

Book MiniDisc

Chapter Cluster

Paragraph Sector

Sentence Sound group

Word Data

One cluster comprises 36 sectors, in which 32 sectors are used for compressed data and four sectors are used for link and sub data. These four sectors (link/subdata) are different in pre mastered and recordable discs.

• In pre-mastered discs, these four sectors are filled with sub data, which can be graphic data (for display), information on the disc, lyrics, etc. In other words, it is a feature area.

• In recordable discs, three sectors need to be used as link sec tor and one for subdata; link data area will be explained later.

We can now further analyze the data format: each sector contains in total 2352 bytes, of which 2332 bytes are data bytes (comparable to the sector format of CD-ROM). The remaining 20 bytes are used for addressing and control.

A further division is the sound group; a sound group is the smallest division, containing 424 bytes (212 left channel, 212 right channel). A sound group gives ATRAC-compressed audio data equal to 512 uncompressed samples at 44.1 kHz sampling rate.

In two sectors, we find 11 sound groups. The first sector contains five full sound groups and the left channel half of the sixth sound group. The next sector starts with the right channel of the sixth sound group and continues with five more sound groups. In this way, we can calculate the amount of bytes per sector to be:

424 × 5 + 424/2 = 2332 bytes

At this point we can confirm the compression rate used in MiniDisc:

• Each ATRAC-compressed sound group contains 424 bytes.

• Each of these sound groups represent 512 samples of 16 bits and two channels after compression: 512 × 16 × 2 = 16 384 bits = 2048 bytes.

• Conclusion: 2048 bytes were compressed into 424 bytes, which shows a compression ratio of about 1:5.

The link concept In MiniDisc, sectors have to be 'linked' to each other; this means that a transition area between sectors has to be included to avoid erroneous overwriting of the beginning of one sector by the end of the previous sector.

The reason for linking sectors comes from the recordability. In CD and CD-ROM (both formats being non re-recordable), CIRC encoding and EFM modulation are used; in MiniDisc, the same CIRC encoding is used as well as EFM modulation, but CIRC is extended (ACIRC = Advanced CIRC).

CIRC was taken as the starting point, but advanced interleave was included, to protect even more against burst errors.

As FIG. 40 shows, the difference between CIRC and ACIRC is that more delay and more interleave was included to improve the error correction capability. One result from this additional delay, however, is the fact that one sector which is input to the encoder still takes 98 frames, but the interleave takes 108 frames.

As a result, there must be a buffer zone between consecutive clusters to allow proper recording of this longer interleave sequence.

One cluster is the minimum area used when recording on a MiniDisc. As the interleave of the last sector is not complete, and the interleave of the first following sector has to start, the linking sectors are needed to separate one cluster from the next one.

Imagine a cluster layout without linking area: we record one cluster, but as the interleave is not complete at the last sector of the cluster, these last interleaved and delayed data words will be recorded into the first sector of the next cluster and of course destroy the information in that sector. Obviously, we must avoid this situation.

FIG. 40 ACIRC-CIRC block diagram.

FIG. 41 Cluster linking.

Three link sectors are sufficient for this purpose. The first link sector and a part of the second link sector are used for the remaining interleave of the last cluster; another part of the second link sector and the third link sector are used for interleave of the next sector.

This linking format is only necessary in recordable MiniDisc. In the pre-mastered format, data are written in one continuous stroke; it is not re-recordable, and therefore there is no need for linking sectors. As already mentioned, in this case the free space is used for subdata.

Address structure

First of all, why is addressing needed? It is obvious that any data, be it audio data or computer data or whatever, which is recorded sequentially will need some addressing to make correct decoding possible afterwards. Just imagine a street with identical houses; no postman will be able to distribute the mail correctly if he has no way of checking addresses, names and numbers.

During the encoding of the disc formats, an address structure is inserted in the audio data. In CD, for example, address data are included in the main data just after CIRC encoding. After each CIRC-encoded frame, a control word is included. In this control word structure, address (and timing) information is included.

• As mentioned before, the cluster format being used in MiniDisc, the basic addressing structure is therefore based upon this cluster format.

• Another very important point is the pre-groove on the recordable disc, which uses this same address format.

Earlier in this section, it was explained that due to this pre-groove the laser return shows a 22.05 kHz carrier frequency modulated pattern with ADIP data. These ADIP data are the cluster and sector address. We now have to differentiate again between the pre-mastered and the recordable disc.

• The pre-mastered disc will carry the cluster/sector address in its subcodes which are in the main data stream.

• Likewise, the recordable disc will carry the cluster/sector address in the main data (including the pre-mastered lead in area), but also in the pre-groove.

• In the case of the recordable disc, it is obvious that the location of the address data in the pre-groove, which is physically included in the disc and therefore not changeable, has to coincide with the (re-recordable) address in the recorded data.

• As mentioned previously, each sector contains 20 bytes for address and control data.

• The cluster address is a 2-byte format.

• The sector address is a 1-byte format.

During lead-in, in the pre-mastered disc as well as the record able disc, the cluster address increases and has to end in the following address: FFFF;1F (cluster; sector, expressed in hexadecimal). The lead-in area of the disc is the inner area where read-out starts, and where still no audio data are recorded.

After lead-in, the cluster/sector addressing starts at 0000; FC and will increase sequentially and uninterrupted to the end of the disc.

As we already know that each cluster contains 36 sectors, of which the first 32 sectors are the data sectors, the sector address will start at 00 and go up to 1F. The link sectors are then addressed starting from FC, up to FF. (All addresses are hexadecimal.) In the case of ADIP, the full sequence is the following: each ADIP address is a 42-bit block, containing four sync bits, 16 cluster address bits (2 bytes), eight sector address bits (1 byte) and a 14-bit cyclic redundancy check. These 42 bits are bi-phase modulated with a 6300 Hz bi-phase clock, and then FM modulated onto the 22.05 Hz carrier. The bit rate is 3150 bit s-1 , the sector rate is in that case 3150/42 = 75 Hz, which is the same as for CD-ROM and CD.

Address in pre-mastered disc

The block diagram already revealed that the data in MiniDisc are also EFM-modulated. It has also been noted that the general data structure is derived from CD and CD-ROM. It is therefore obvious that in all pre-mastered areas the address data, cluster and sector data will be inserted in the CD-like format. As already mentioned, each CD frame (after CIRC encoding) is supplemented with a control word sequence. These control words are used (still in CD) for time indication, track number indication and so on. For MiniDisc, we do not need the same amount of bits as used in CD, but in order to stay within the same format, the number of bits remains unchanged. The bits which are not needed in MiniDisc will therefore be zero bits.

Address in recordable area The addresses in the recordable area are included in the data stream. Each sector has a fixed format with a fixed area for synchronization, addresses and data bytes. Refer also to Table 4.

Data structure:

Each addressable block (each sector) contains 2352 bytes.

• The first 12 bytes are a unique sync pattern.

• The next 4 bytes are header bytes, containing three address bytes (two clusters, one sector) and a mode byte.

• The mode byte describes the nature of the data fields.

• Between the header bytes and the actual data bytes, four all zero bytes are inserted.

FIG. 42 Subcode in pre-mastered disc.

Table 4 Data structure

Table 5 Sound group structure

Sound groups in the data structure

In each sector there are five full and one half sound group. Each sound group can be split into a left-channel and a right-channel sound group. The data structure of a sound group depends on the ATRAC encoding. Please also refer to the ATRAC section for more information on this topic.

Each sound group is divided into sound parameter bytes, audio spectrum bytes and again sound parameter bytes. The second sound parameter bytes group is exactly the same as the first one, but reversed in order.

Table 6

Table 7

Table 8

Data structure in TOC and UTOC

The data structure in TOC and UTOC is basically the same as on the rest of the disc. The general structure remains the same, the sync is the same, the cluster, sector address and mode bytes are the same, but in the data field of course there is a different type of data. However, remember that TOC is pre-recorded and UTOC is recordable.

• A pre-mastered disc only contains TOC.

• A recordable disc contains TOC and UTOC.

In the TOC there is initially an indication of disc type. This information is needed from the start as there is a difference in TOC information between pre-mastered and recordable discs. In the case of pre-mastered discs the TOC will contain the data shown in Table 6.

In the case of recordable discs, the TOC will contain the data shown in Table 7.

FIG. 43 Anti-shock block diagram.

The UTOC area also contains a power calibration area and a reserved area, both after the real UTOC data.

Lead-out contains pre-mastered pits on the pre-mastered MD; again, this is the same as on a Compact Disc. On the recordable disc, the lead-out area contains no data, there is just the pre-groove. Detection of the start of lead-out can therefore be performed on the pre-groove ADIP data.

Anti-shock operation

One of the main features of the MiniDisc is its anti-shock capability, which depends on the amount of memory that is included in the set.

• The data rate needed by the MiniDisc ATRAC decoder is about 0.3 Mbit s^-1 (292 kbit s^-1 ) to produce a correct, continuous audio output.

• Remember that this 0.3 Mbit is the compressed data corresponding to full-scale digital audio as used in a CD player.

• The amount of input data to the audio decoder depends on the linear rotation speed of the disc (the Constant Linear Velocity).

• The disc rotation speed as used in MiniDisc is the same as for a CD player.

• The CLV speed of a Compact Disc is sufficient to read out 1.4 Mbit s^-1 , which is the minimum amount of data necessary to obtain a correct CD read-out.

• Due to this same CLV speed in MiniDisc, we have an input data rate that is nearly five times higher than the necessary input rate (1.4 Mbit s^-1 input, but only 0.3 Mbit s^-1 needed).

FIG. 44 Principle of intermittent read-out.

FIG. 45 Shock recovery.

We can benefit largely from this fact by using a buffer memory.

CD and CD-ROM players are susceptible to shocks and vibration; there is of course a certain amount of recovery range (otherwise there would not be any portable or car CD player), but when such systems endure heavy shocks or vibration, there is a possibility that they lose correct tracking--i.e., the laser forcibly jumps to the wrong track, and incorrect or no data are read out--and if the system is not quick enough to recover--i.e., the laser has to return to the place from which the last correct data was retrieved and read-out has to start again correctly--this will result in momentary loss of output, usually referred to as 'skipping'.

This loss can be recovered through the use of a buffer RAM. The first generation of MD players uses a 4-Mbit RAM, enabling a shock-proof time of about 10 seconds (some parts of the RAM are also used for other operations). Theoretically, it is correct to state that the more RAM will be used, the more shock-proof time will be available, limited of course by the design possibilities and the cost aspect.

The operation is as follows:

• When the system starts to operate, initially the buffer RAM will be empty.

• Data are read out by the laser unit and demodulated in the RF electronics. Next EFM and ACIRC decoding will be performed. At that time, data read-out is performed at 1.4 Mbit s^-1.

• The data should now be ATRAC decoded, but as the needed decoder input is only 0.3 Mbit s^-1 , instead of being fed directly to the ATRAC decoder, data are fed to the buffer RAM.

• It is obvious that the buffer will be full after a short while, the input being about five times higher than the output.

• For this reason, the microcontroller--the main steering and logic system--must be able to control the flow of input data.

• The microcontroller of the MD player controls tracking servo and buffer RAM in such a way that the amount of data in the buffer RAM is always as high as possible.

• Practically speaking, the MD will go into PAUSE mode when ever the buffer RAM is nearly in overflow status.

• Pause mode means that the laser objective will remain at the same track, that no read-out is performed and that the ATRAC decoder still decodes data from the RAM, and still outputs music. It should not be confounded with Pause mode as seen by the user, when no audio is output.

• When this internal Pause occurs, the RAM level decreases, then again the PAUSE will be released until a high level has been reached and so on.

If a shock causes a track jump, the buffer RAM continues to send data to the ATRAC decoder, as there is still enough data avail able. In the mean time, the microcontroller knows that it is under a shock condition (can be found through many ways; refer also to Compact Disc), and will start to recover.

This recovery action is as follows:

• The system has in its RAM the remaining data along with the addresses; based upon the addresses of the last correct data, the system will try to put the laser unit back onto the correct position and restart read-out.

• When this last address is found on the disc (in other words, the system has recovered from the shock), data will be read out as from the next address location and sent to the buffer RAM.

• If this operation is performed within the timing limit of the RAM contents, the audio signal is reproduced continuously, glitchless and noiseless.

• The same buffer RAM is used while recording, to store multiple clusters, and also when a shock occurs during recording the system will try to recover through RAM operation.

It should be noted, however, that when recording the anti-shock operation is more critical.

Prev. | Next