Digital audio editing

Home | Audio mag. | Stereo Review mag. | High Fidelity mag. | AE/AA mag.

Unlike analog, digital audio can take advantage of the freedom to store data in random access media and the signal processing techniques developed in computation. This has had an enormous impact in the way audio is edited, completely displacing traditional methods. This section shows how the digital edit process is achieved using combinations of storage media, processing and control systems.

1. Introduction

At its most basic, editing may be no more than a punch-in on a multi track recorder, or the removal of hesitations from an interview. At a higher level, it includes assembling myriad sound effects and mixing them with timecode-locked dialogue in order to create a film soundtrack.

Mastering is a further form of editing where various tracks are put together to create the master recording from which an album will be made. Unlike vinyl disk cutting, where the operator controls the cutter parameters, the CD and MD 'cutting' process is just a data transfer and is independent of musical content, so responsibility for the subjective quality of the final disk falls entirely on those who make the master recording. The duration of each musical piece, the length of any pauses between pieces and the relative levels of the pieces on the disk have to be determined at the time of mastering. The master recording will be compiled from source media which may each contain only some of the pieces required on the final CD, in any order. The recordings will vary in level, and may contain several retakes of a passage which was unsatisfactory.

The purpose of the digital mastering editor is to take each piece, and insert sections from retakes to correct errors, and then to assemble the pieces in the correct order, with appropriate pauses between and with the correct relative levels to create the master tape. All this is done by copying in the digital domain. The source recordings need not be changed in any way, and degradation of quality is minimal. The master recording will also have contiguous timecode, and with the addition of the subcode information, it is ready for cutting the CD.

At the other end of the spectrum, editors may be used for audio post production of film or video sound tracks. The acoustic of most film sets precludes the use of live recording and the recording made on the set is used only as a guide. In ADR (automatic dialogue replacement), dialogue recorded later under better conditions is fitted to lip movement on the pictures. Sound effects from an effects bank will need to be triggered so that they coincide with visible events. Foreign-language dubs may require the time axis of the audio to be stretched or compressed without pitch change in order to achieve more convincing lip-sync. Each source recording may need equalization, compression or some effect such as reverberation before being added to the mix.

FIG. 1 The function of an editor is to perform a series of assembles to produce a master tape from source tapes.

Digital audio editors work in two basic ways, by assembling or by inserting sections of audio waveform to build the finished waveform.

Both terms have the same meaning as in the context of video recording.

Assembly begins with a blank master file or recording. The beginning of the work is copied from the source, and new material is successively appended to the end of the previous material. FIG. 1 shows how a master recording is made up by assembly from source recordings. Insert editing begins with an existing recording in which a section is replaced by the edit process. Punch-in in multi-track recorders is a form of insert editing.

2. Editing with random access media

In all types of audio editing the goal is the appropriate sequence of sounds at the appropriate times. In analog audio equipment, editing was almost always performed using tape or magnetically striped film. These media have the characteristic that the time through the recording is proportional to the distance along the track. Editing consisted of physically cutting and splicing the medium, in order mechanically to assemble the finished work, or of copying lengths of source medium to the master.

When this was the only way of editing, it did not need a qualifying name. Now that audio is stored as data, alternative storage media have become available which allow editors to reach the same goal but using different techniques. Whilst early open-reel digital audio tape formats were designed to support splice editing, this was an evolutionary dead end. In all other digital audio editing samples from various sources are brought from the storage media to various pages of RAM. The edit is performed by crossfading between sample streams retrieved from RAM and subsequently rewriting on the output medium. Thus the nature of the storage medium does not affect the form of the edit in any way except the amount of time needed to execute it.

Tapes only allow serial access to data, whereas disks and RAM allow random access and so can be much faster. Editing using random access storage devices is very powerful as the shuttling of tape reels is avoided.

The technique is generally called non-linear editing because the time axis of the storage medium is non-linear.

FIG. 2 A single byte cannot be updated on a block-based medium. Instead the whole block is transferred to RAM, which is byte addressable. Following the change, the entire block is written back to the medium.

3. Editing on recording media

Audio editing requires the modification of source material in the correct real-time sequence to sample accuracy. However, all real digital storage media are block-based. Blocks are needed to allow the inclusion of preambles so that synchronized replay can begin and to allow an addressing mechanism. Most media have some form of error correction requiring an interleave, or reordering, of samples to reduce the impact of large errors. As a result, editing to sample accuracy simply cannot be performed directly on real media. Even if an individual sample could be located in a block, replacing the samples after it would destroy the codeword structure and render the block uncorrectable.

The only solution is to ensure that the medium itself is only edited at block boundaries so that entire error-correction codewords are written down. FIG. 2 shows that in order to edit to sample accuracy, entire blocks must be read from the medium and de-interleaved into RAM. The de-interleaved data are then modified in RAM by the edit process and re interleaved for writing back on the medium. This technique is called read-modify-write and is only an extension of the technique used in word processors to correct a spelling mistake in a text file. The block containing the error is read from disk into RAM, the error (which may be a single byte) is corrected in RAM and the whole block is written back to disk.

In disks, blocks are often associated into clusters which consist of a fixed number of blocks in order to increase data throughput. When clustering is used, editing on the disk can only take place by rewriting entire clusters.

4. The structure of an editor

The digital audio editor consists of three main areas. First, the various contributory recordings must enter the processing stage at the right time with respect to the master recording. This will be achieved using a combination of timecode, transport synchronization and RAM timebase correction. The synchronizer will take control of the various transports during an edit so that one section reaches its out-point just as another reaches its in-point.

FIG. 3 A digital audio editor requires an audio path to process the samples, and a timing and synchronizing section to control the time alignment of signals from the various sources. A supervisory control system acts as the interface between the operator and the hardware.

Second, the audio signal path of the editor must take the appropriate action, such as a crossfade, at the edit point. This requires some digital processing circuitry.

Third, the editing operation must be supervised by a control system which coordinates the operation of the transports and the signal processing to achieve the desired result.

FIG. 3 shows a simple block diagram of an editor. Each source device, be it disk or tape or some other medium, must have timecode locked to the audio samples in some way. The synchronizer section of the control system uses the timecode to determine the relative timing of sources and sends remote control signals to the transport to make the timing correct. The master recorder is also fed with timecode in such a way that it can make a contiguous timecode track when performing assembly edits. The control system also generates a master sampling rate clock to which contributing devices must lock in order to feed samples into the edit process. The audio signal processor takes contributing sources and mixes them as instructed by the control system. The mix is then routed to the recorder.

5. Timecode

Timecode is essential to editing, but the standardization of timecode for digital audio recorders has been hampered by the diversity of standards in video. Synchronization between timecode and the sampling rate is essential, otherwise there will be a conflict between the need to lock the various sampling rates in the system with the need to lock the timecodes.

This can only be resolved with synchronous timecode. The EBU timecode format relates easily to digital audio sampling rates of 48 kHz, 44.1 kHz and 32 kHz, but it is not so easy with the drop-frame SMPTE timecode necessary for NTSC recording due to the 0.1 percent slip between the actual field rate and 60Hz.

The timecode used in a great deal of equipment follows the SMPTE standard for 525/60 and is shown in FIG. 4. EBU timecode is basically similar to the SMPTE code except that it is designed for 50Hz frame rate systems. These timecode systems encode hours, minutes, seconds and frames as binary-coded decimal (BCD) numbers. In tape media, the timecode data may be serially encoded along with user bits into an FM channel code (see Section 6) which is recorded on a dedicated linear track. The user bits are not specified in the standard, but a common use is to record the take or session number.

Disks also use timecode for audio synchronization, but the timecode is not recorded on the disk as such. Instead timecode forms part of the access mechanism so that samples are retrieved by specifying the required timecode which the disk subsystem converts into a physical block address. This mechanism was detailed in Section 10.

A further problem with the use of video-based timecode is that the accuracy to which the edit must be made in audio is much greater than the frame boundary accuracy needed in video. A video frame lasts 33 or 40ms and a DAT frame lasts 30ms, whereas audio needs to be edited to an accuracy of a few samples. When the exact edit point is chosen in an audio editor, it will be described to great accuracy and is stored as hours, minutes, seconds, frames and the number of the sample within the frame.

FIG. 4 In SMPTE standard timecode, the frame number and time are stored as eight BCD symbols. There is also space for 32 user-defined bits. The code repeats every frame. Note the asymmetrical sync word which allows the direction of tape movement to be determined.

FIG. 5 The use of a ring memory which overwrites allows storage of samples before and after the coarse edit point.

6 Locating the edit point

Digital audio editors must simulate the traditional 'rock and roll' process of edit-point location in analog tape recorders where the tape reels were moved to and fro by hand. Digital media cannot do this directly as they can only play at one bit rate in one direction. As with editing, the solution is to transfer the recording in the area of the edit point to RAM in the editor. Samples can be read from RAM at any speed in either direction and the precise edit point can then be conveniently found by monitoring audio from the RAM.

FIG. 5 shows how the area of the edit point is transferred to the memory. The source device is commanded to play, and the operator listens to replay samples via a DAC in the monitoring system. The same samples are continuously written into a memory within the editor. This memory is addressed by a counter which repeatedly overflows to give the memory a ring-like structure rather like that of a timebase corrector, but somewhat larger. When the operator hears the rough area in which the edit is required, he will press a button. This action stops the memory writing, not immediately, but one half of the memory contents later. The effect of this deliberate overrun is that the memory contains an equal number of samples before and after the rough edit point.

Typically an operator needs to be able to hear about 30 seconds of audio to be able mentally to synchronize to the rhythm and anticipate the edit point. This requires, in a stereo PCM system, a storage requirement of around five megabytes. In early digital audio editors this represented a significant cost, and to reduce the size of memory needed, many early editors used some form of compression, sampling rate reduction or mixing stereo material down to mono. With today's RAM prices this is no longer an issue. Samples which will be used to make the master recording need never pass through these processes; they are solely to assist in the location of the edit points. The sound quality in edit-point location mode can be impaired, but this does not affect the finished work.

Once the recording is in the memory, it can be accessed at leisure, and the constraints of the source device play no further part in the edit-point location. There are a number of ways in which the memory can be read. If the memory address is supplied by a counter which is clocked at the appropriate rate, the edit area can be replayed at normal speed, or at some fraction of normal speed repeatedly. In order to simulate the analog method of finding an edit point, the operator is usually provided with a scrub wheel or rotor, and the memory address will change at a rate proportional to the speed with which the rotor is turned, and in the same direction. Thus the sound can be heard forward or backward at any speed, and the effect is exactly that of manually rocking an analog tape past the heads of an ATR.

The operation of a scrub wheel encoder was shown in section 3.13.

Although a simple device, there are some difficulties to overcome. There are not enough pulses per revolution to create a clock directly and the human hand cannot turn the rotor smoothly enough to address the memory directly without flutter. A phase-locked loop is generally employed to damp fluctuations in rotor speed and multiply the frequency. A standard sampling rate must be recreated to feed the monitor DAC and a rate convertor, or interpolator, is necessary to restore the sampling rate to normal. These items can be seen in FIG. 6. In low-cost editors the function of the scrub wheel will be performed by the mouse or a trackball.

FIG. 6 In order to simulate the edit location of analog recorders, the samples are read from memory under the control of a hand-operated rotor.

The act of pressing the coarse edit-point button stores the timecode of the source at that point, which is frame-accurate. As the rotor is turned, the memory address is monitored, and used to update the timecode to sample accuracy within the frame.

Before assembly can be performed, two edit points must be deter mined, the out-point at the end of the previously recorded signal and the in-point at the beginning of the new signal. The editor's microprocessor stores these in an edit decision list (EDL) in order to control the automatic assemble process.

Edit-point location can also be done on the fly by reading the musical score as the recording plays, and pressing the edit button at the right instant.

However the edit point is established, the subjective effect can be assessed in a preview process and if the outcome is unsatisfactory the in or out-point can be trimmed any number of times until the desired result is obtained.

7. Editing with disk drives

Using one or other of the above methods, an edit list can be made which contains an in-point, an out-point and an audio filename for each of the segments of audio which need to be assembled to make the final work, along with a crossfade period and a gain parameter. This edit list will also be stored on disk. When a preview of the edited work is performed, the edit list is used to determine what files will be necessary and when, and this information drives the disk controller.

FIG. 7 shows the events during an edit between two files. The edit list causes the relevant audio blocks from the first file to be transferred from disk to memory, and these will be read by the signal processor to produce the preview output. As the edit point approaches, the disk controller will also place blocks from the incoming file into the memory.

It can do this because the rapid data-transfer rate of the drive allows blocks to be transferred to memory much faster than real time, leaving time for the positioner to seek from one file to another. In different areas of the memory there will be simultaneously the end of the outgoing recording and the beginning of the incoming recording.

FIG. 7 In order to edit together two audio files, they are brought to memory sequentially. The audio processor accesses file pages from both together, and performs a crossfade between them. The silo produces the final output at constant steady-sampling rate.

Using timecode alone, the editor can only change the relative timing of the two recordings in frame increments. However, timing to sample accuracy can be obtained by using an area of RAM as a variable delay.

The signal processor will use the fine edit-point parameters to work out the relationship between the actual edit points and the cluster or block boundaries. The relationship between the cluster on disk and the RAM address to which it was transferred is known, and this allows the memory read addresses to be computed in order to obtain samples with the correct timing.

Prior to the edit point, only samples from the outgoing recording are accessed, but as the crossfade begins, samples from the incoming recording are also accessed, multiplied by the gain parameter and then mixed with samples from the outgoing recording according to the crossfade period required. The output of the signal processor becomes the edited preview material, which can be checked for the required subjective effect. If necessary the in- or out-points can be trimmed, or the crossfade period changed, simply by modifying the edit-list file. The preview can be repeated as often as needed, until the desired effect is obtained. At this stage the edited work does not exist as a file, but is re-created each time by a further execution of the EDL. Thus a lengthy editing session need not fill up the disk.

It is important to realize that at no time during the edit process were the original audio files modified in any way. The editing was done solely by reading the audio files. The power of this approach is that if an edit list is created wrongly, the original recording is not damaged, and the problem can be put right simply by correcting the edit list. The advantage of a disk-based system for such work is that location of edit points, previews and reviews are all performed almost instantaneously, because of the random access of the disk. This can reduce the time taken to edit a program to a quarter of that needed with a linear tape machine.

1. During an edit, the disk drive has to provide audio files from two different places on the disk simultaneously, and so it has to work much harder than for a simple playback. If there are many close-spaced edits, the drive may be hard-pressed to keep ahead of real time, especially if there are long crossfades, because during a crossfade the source data rate is twice as great as during replay. A large buffer memory helps this situation because the drive can fill the memory with files before the edit actually begins, and thus the instantaneous sample rate can be met by the memory's emptying during disk-intensive periods. In practice crossfades measured in seconds can be achieved easily.

Disk formats which handle defects dynamically, using techniques such as defect skipping, will also be superior to those using bad-block files when throughput is important. Some drives rotate the sector addressing from one cylinder to the next so that the drive does not lose a revolution when it moves to the next cylinder. Disk-editor performance is usually specified in terms of peak editing activity which can be achieved, but with a recovery period between edits. If an unusually severe editing task is necessary where the drive just cannot access files fast enough, it will be necessary to rearrange the files on the disk surface so that files which will be needed at the same time are on nearby cylinders.

2. An alternative is to spread the material between two or more drives so that overlapped seeks are possible.

Once the editing is finished, it will generally be necessary to transfer the edited material to form a contiguous recording so that the source files can make way for new work. If the source files already exist on tape the disk files can simply be erased. If the disks hold original recordings they will need to be backed up to tape if they will be required again. If editing is not complete, and the editor is required for another purpose, the disk files can be transferred in their entirety to tape so that the disk data can be exactly restored at a later time. In large broadcast systems, the edited work can be broadcast directly from the disk file. In smaller systems it will be necessary to output to some removable medium, since the Winchester drives in the editor have fixed media. It is only necessary to connect the AES/EBU output of the signal processor to any type of digital recorder, and then the edit list is executed once more. The edit sequence will be performed again, exactly as it was during the last preview, and the results will be recorded on the external device.

8. CD mastering

At the time of the introduction of the Compact Disc, mastering was carried out using a PCM adaptor and U-matic rotary-head VCRs. Each frame on tape was treated as a data block addressed by timecode, and FIG. 8 shows that editing to sample accuracy was achieved by setting the VCR into record at a frame boundary, and rerecording what was already on the tape up to the edit point, where the new recording will appear to commence. A crossfade of appropriate length is carried out in the digital domain. The basic operation of the editor was exactly as described above for disk-based editing. As U-matic VCRs had a habit of locking up one frame early or late, a page of RAM was used to resynchronize the data by inserting or removing frame delays so that the edit would not have to be aborted.

The first CD cutters could only operate at normal playing speed and a U-matic master tape which could only play at normal speed was acceptable. However, CD cutting is only a data transfer process, and the most economic data transfer rate is as fast as possible so that one CD cutter can do more work. Today, CD cutters can operate many times faster than real time. Master recordings can be supplied on any economic medium, typically a computer-type data tape. The tape is read into a disk store and checked for data integrity and the disk store then delivers data to the cutter using a RAM TBC to deliver an exactly constant bit rate. Instead of delivering a physical medium, the master recording can also be transferred to the cutter as a data file over a communications network.

FIG. 8 Video recorders can only start recording at the beginning of the frame; fine position of the edit point is determined by rerecording the old data up to the edit point.

9. Editing in DAT

In order to edit a DAT tape, many of the constraints of pseudo-video editing apply. Editing can only take place at the beginning of an interleave block, known as a frame, which is contained in two diagonal tracks. The transport would need to perform a pre-roll, starting before the edit point, so that the drum and capstan servos would be synchronized to the tape tracks before the edit was reached. Fortunately, the very small drum means that mechanical inertia is minute by the standards of video recorders, and lock-up can be very rapid. One way in which a read- modify-write edit could be performed would be to use an editor of the type designed for PCM adaptors. This would permit editing on a DAT machine which could only record or play.

A better solution used in professional machines is to fit two sets of heads in the drum. The standard permits the drum size to be increased and the wrap angle to be reduced provided that the tape tracks are recorded to the same dimensions. In normal recording, the first heads to reach the tape tracks would make the recording, and the second set of heads would be able to replay the recording immediately afterwards for confidence monitoring. For editing, the situation would be reversed. The first heads to meet a given tape track would play back the existing recording, and this would be de-interleaved and corrected, and presented as a sample stream to the record circuitry. The record circuitry would then interleave the samples ready for recording. If the heads are mounted a suitable distance apart in the scanner along the axis of rotation, the time taken for tape to travel from the first set of heads to the second will be equal to the decode/encode delay. If this process goes on for a few blocks, the signal going to the record head will be exactly the same as the pattern already on the tape, so the record head can be switched on at the beginning of an interleave block. Once this has been done, new material can be crossfaded into the sample stream from the advanced replay head, and an edit will be performed.

If insert editing is contemplated, following the above process, it will be necessary to crossfade back to the advanced replay samples before ceasing rerecording at an interleave block boundary. The use of overwrite to produce narrow tracks causes a problem at the end of such an insert.

FIG. 9 shows that this produces a track which is half the width it should be. Normally the error-correction system would take care of the consequences, but if a series of inserts were made at the same point in an attempt to make fine changes to an edit, the result could be an extremely weak signal for one track duration. One solution is to incorporate an algorithm into the editor so that the points at which the tape begins and ends recording change on every attempt. This does not affect the audible result as this is governed by the times at which the crossfader operates.

FIG. 9 When editing a small track-pitch recording, the last track written will be 1.5 times the normal track width, since that is the width of the head. This erases half of the next track of the existing recording.

FIG. 10 The four stages of an insert (punch-in/out) with interleaving: (a) rerecord existing samples for at least one constraint length; (b) crossfade to incoming samples (punch-in point); (c) crossfade to existing replay samples (punch-out point); (d) rerecord existing samples for at least one constraint length. An assemble edit consists of steps (a) and (b) only.

10. Editing in open-reel digital recorders

On many occasions in studio multitrack recording it is necessary to replace a short section of a long recording, because a wrong note was played or something fell over and made a noise. The tape is played back to the musicians before the bad section, and they play along with it. At a musically acceptable point prior to the error, the tape machine passes into record, a process known as punch-in, and the offending section is rerecorded. At another suitable time, the machine ceases recording at the punch-out point, and the musicians can subsequently stop playing.

The problem with an interleaved recording is that it is not possible to point to a specific place on the tape and say that it represents the recording at a particular instant. It is not possible just to begin recording at some arbitrary place, as the interleave structure would be destroyed. Once more, a read-modify-write approach is necessary, using a record head positioned after the replay head. The mechanism necessary is shown in FIG. 10. Prior to the punch-in point, the replay-head signal is de-interleaved, and this signal is fed to the record channel. The record channel re-interleaves the samples, and after some time will produce a signal which is identical to what is already on the tape. At a block boundary the record current can be turned on, when the existing recording will be rerecorded. At the punch-in point, the samples fed to the record encoder will be crossfaded to samples from the ADC. The crossfade takes place in the non-interleaved domain. The new recording is made to replace the unsatisfactory section, and at the end, punch-out is commenced by returning the crossfader to the samples from the replay head. After some time, the record head will once more be rerecording what is already on the tape, and at a block boundary the record current can be switched off. The crossfade duration can be chosen according to the nature of the recorded material. If a genuine silence appears between notes played in a dead acoustic, a rapid crossfade may be optimum. With a large chorus in reverberant surroundings, a long crossfade might go unnoticed. It is possible to rehearse the punch-in process and monitor what it would sound like by feeding headphones from the crossfader, and doing everything described except that the record head is disabled. The punch-in and punch-out points can then be moved to give the best subjective result. The machine can learn the sector addresses at which the punches take place, so the final punch is fully automatic.

Assemble editing, where parts of one or more source tapes are dubbed from one machine to another to produce a continuous recording, is performed in the same way as a punch-in, except that the punch-out never comes. After the new recording from the source machine is faded in, the two machines continue to dub until one of them is stopped. This will be done some time after the next assembly point is reached.

11. Jump editing

Conventional splice handling in stationary-head recorders was detailed in Section 9. An extension to the principle has been suggested by Lagadec [3] in which the samples from the area of the splice are not heard. Instead an electronic edit is made between the samples before the splice and those after.

FIG. 11 Jump editing. (a) Splice approaches, capstan is advanced, and audio is delayed. (b) Splice passes head, and error burst travels down delay. (c) Crossfader fades to signal after splice. (d) Capstan accelerates, and delay increases. When the delay tap reaches the end, the crossfader can switch back ready for the next splice.

In this system, a tape splice is made physically with excess tape adjacent to the intended edit points. The timebase corrector has two read-address generators which can access the memory independently.

It will be seen in FIG. 11 that when the machine plays the tape, the capstan is phase-advanced so that the timebase corrector is causing a long delay to compensate. As the splice is detected, the corruption due to the splice enters the TBC memory and travels towards the output. As the splice nears the end of the memory, the machine output crossfades to a signal from the second TBC output which has been delayed much less. The data in the area of the tape splice are thus omitted. The capstan will now effectively be lagging because the delay has been shortened, and it will speed up slightly for a short period until the lead condition is re-established. This can be done without ill effect since the sample rate from the memory remains constant throughout. Although the splice is an irrevocable mechanical act, the precise edit timing can be changed at will by controlling the sector address at which the TBC jumps, which determines the out-point, and the address difference, which determines the length of tape omitted, and thus controls the in-point. The size of the jump is limited by the available memory.

If only a short section of audio is to be removed, no splice is necessary at all as a memory jump can be used to omit a short length of the recording. Such a system would be excellent for news broadcasts where it is often necessary to remove many short sections of tape to eliminate hesitations and unwanted pauses from interviews. Control of the jumping could be by programming a CPU to recognize timecode or sector addresses and insert the commands, or, as suggested by Lagadec, inserting the jump distance in the reference track prior to the splice. In either case machines not equipped to jump would handle any splices with mechanically determined timing.

Jump editing can also be used in rotary-head recorders such as DAT and the Nagra-D. Rotary-head machines have a low linear tape speed and so can accelerate the tape to omit quite long sections whilst replay continues from memory.

References

1. Todoroki, S., et al., New PCM editing system and configuration of total professional digital audio system in near future. Presented at the 80th Audio Engineering Society Convention (Montreux, 1986), Preprint 2319(A8)

2. McNally, G.W., Gaskell, P.S. and Stirling, A.J. Digital Audio Editing. BBC Research Dept. Report, RD 1985/10

3. Lagadec, R., Current status in digital audio. Presented at the IERE Video and Data Recording Conference ( Southampton, 1984)

Prev. | Next