Home | Audio mag. | Stereo Review mag. | High Fidelity mag. | AE/AA mag.
Departments | Features | ADs | Equipment | Music/Recordings | History |
![]() VIDEO DATA REDUCTION Just a few months ago, I attended a day-long seminar jointly sponsored by the University of Southern California and the Society of Motion Picture and Television Engineers on the subject of data reduction in video transmission. Video data reduction is important as we look forward to digital transmission of video and the coming of the Compact Disc as a video carrier. The small-dish satellite transmission systems that are now available to consumers already make use of video data-reduction techniques. Readers of Audio have been exposed to data reduction as it applies to audio signals through articles on the PASC and ATRAC algorithms that are used in the Digital Compact Cassette and MiniDisc. You will re call that these systems make use primarily of masking effects in which a loud signal in one frequency band masks softer signals in adjacent bands, making it possible to encode them with fewer bits-or in some cases to ignore them altogether. The net result in DCC and MD is a 4- or 5-to-1 data-reduction ratio. Video has its and opportunities for data reduction, and these are driven by the high rate of signal redundancy in normal video transmission. As explained by Charles Poynton, of Sun Microsystems, there are techniques for reducing the data required to transmit a single frame (spatial techniques) as well as techniques for reducing the data that is common to several consecutive frames (temporal techniques). Let us first consider the temporal aspect. Video is transmitted in the U.S. at 30 frames per second, and on the average, the difference be tween consecutive frames is quite small. So here is the first opportunity for data reduction: Transmit only the actual differences between the frames, not each new frame. This technique can be expanded to allow for relatively slow panning of the scene. Here, motion vectors own requirements can be determined by analyzing consecutive frames, and only the new data entering the scene at the leading edges of the picture needs to be encoded. In a fast-moving program, the technique is modified so that every 15 frames, for example, the complete frame is updated. And every time a scene changes, the entire frame must be updated as well. The spatial aspects of data reduction have much to do with psycho logical aspects of vision: What we are most likely to be unaware of and what we are most likely to see. Studies have been conducted to deter mine the number of levels, or shades, of a given color that are necessary to provide the eye with a continuum of response. In most cases, a nonlinear representation of these, levels will offer a better overall effect, enabling fewer bits to be used to encode the entire range. It is use less to provide more information than this. Studies have also been made of just how much sharpness in the picture is necessary. At the seminar, the work of William Glenn was cited to show that the eye is most sensitive to contrast in the range of 2 to 5 cycles (or lines) per degree; above and be low this range, the number of bits assigned to luminance transmission can be reduced accordingly. As Poynton stated, the best part of data reduction is "representing the image in the most efficient way to begin with." We normally think of video as a time-varying signal. However, if we think of it as a frequency-based signal-and analyze it that way--we can more easily carry out some of the visually optimized data-reduction tricks. The key here is to represent the video signal by means of the discrete cosine transform. In applying this, a frame is broken down into many regions, or blocks, each made up of an eight-by-eight set of pixels (a pixel, or picture element, is the smallest element that can be addressed). The transform then makes use of "spatial redundancy, which is the statistical similarity of neighboring samples." Typically, a small number of transform coefficients can be used to describe the block sufficiently. I hope you've by now seen that video data reduction is a collection of techniques operating on different levels and in different domains. In fact, some of the standard systems in use today are open-ended and can operate at several basic levels of data reduction, depending on application. Reduction ratios can run anywhere from 10 to 1 upwards to over 100 to 1. Surprisingly, these systems do very well, and it is rare to see an artifact that is really bad. At the SMPTE/USC seminar, I saw a number of different systems, all operating at different data rates. In rapidly changing scenes at low data rates, there was some evidence of "blocking," the tendency of individual data blocks in the scene to become obvious as such, due to the fact that the system was simply being taxed beyond its limits. This does not happen often, and it does not happen for long. As a fitting close to the seminar, a group of producers and engineers who work in the creative sides of film and video discussed the pros and cons of data reduction. As we have seen in the audio arts, there is a good bit of reluctance to throw out past gains in old technology simply because there is something new. As with analog audio, the art and technology of film and video have co-evolved over many years, and technical limitations were more than once shaped into aesthetic advantages. Digital processing and perceptual encoding are still rather uncomfortable topics for many creative people, primarily because there is a history of standards hastily arrived at and not easily undone. There is also the analog-based conviction that there is something inherently substantial about their medium (film, tape, or vinyl) that shapes the message. All of this is about to change as the economic demands of information and video transmission double or quadruple in the near future. (adapted from Audio magazine, Nov. 1994) = = = = |
Prev. | Next |