|Home | Audio Magazine | Stereo Review magazine | Good Sound | Troubleshooting|
Critical listening—the practice of evaluating the quality of audio equipment by careful analytical listening—is very different from listening for pleasure. The goal isn’t to enjoy the musical experience, but to determine if a system or component sounds good or bad, and what spec characteristics of the sound make it good or bad. You want to critically examine what you’re hearing so that you can form judgments about the reproduced sound. You can then use this information to evaluate and choose components, and to fine-tune a system.
Listening vs. Measurement
Evaluating audio equipment by ear is essential—today’s technical measurements simply aren’t advanced enough to characterize the musical performance of audio products. The human hearing mechanism is vastly more sensitive and complex than the most sophisticated test equipment now available. Though technical performance is a valid consideration when choosing equipment, the ear should always be the final arbiter of good sound. Moreover, the musical significance of sonic differences between components can only be judged subjectively.
Good technical performance can contribute to high-quality musical performance, but it doesn’t tell you what you really want to know: how well the product communicates the musical message. To find that out, you must listen. I have auditioned hundreds of audio products for review and measured their technical performance in a test laboratory. My experience overwhelmingly indicates that much more about the quality an audio component can be learned in the listening room than in the test lab.
Many newcomers to high-performance music reproduction—and even a small fringe group of experienced audiophiles—question the need for listening to evaluate products. They believe that measurements can tell them everything they need to know about a product’s performance. And since these measurements are purely “objective,” why interject human subjectivity through critical listening?
The answer is that the common measurements in use today were created decades ago as design tools, not as descriptors of sound quality. The test data generated by a typical mix of audio measurements were never meant to be a representation of musical reality, only a rough guide when designing. For example, an amplifier circuit that had 1% harmonic distortion was probably better than one with l0% harmonic distortion. It doesn’t follow that a harmonic-distortion specification in any way describes the sound of that amplifier.
A second problem is that audio test-bench measurements attempt to quantify variety of two-dimensional phenomena: how much distortion the product introduces, its frequency response, noise level, and other factors. But music listening is a three-dimensional experience that is much more complex than any set of numbers can hope to quantify. How can you reduce to a series of mathematical symbols the ability of one power amplifier, and not another, to make the hair on your arms stand up? Or the feeling that a vocalist is singing directly to you?
No matter how many measurements are gathered about the product’s technical performance, they still don’t tell you how well that product communicates the music. If I had to choose between two unknown CD players as my main source of music for the next five years, I’d rather have ten minutes with each player in the listening room than ten hours with each in the test lab. Today’s measurements are crude tools that are inferior to the most powerful test instrument ever devised: the human brain.
Introduction to Critical Listening
Knowing what sounds good and what doesn’t is easy; most people can tell the difference between excellent and poor sound. But discovering why a product is musically satisfying or not, and the ability to recognize and describe subtle differences in sound quality are learned skills. Like all skills, that of critical listening improves with practice: The more you listen, the better a listener you’ll become. As your ear improves, you’ll be able to distinguish smaller and smaller differences in reproduced sound quality -- and be able to describe how two presentations are different, and why one is better.
This section defines the language of critical listening, describes what to listen for, and outlines the procedures for setting up valid listening comparisons. It will either get you started in critical listening, or help you become a more highly skilled listener.
A general discussion of audiophile values is important in understanding the next pages this section. Here are some broad statements about what distinguishes good from superlative sound quality, and audiophile values in reproduced music.
Good sound is only a means to the end of musical satisfaction; it is not the end itself. If a neighbor or colleague invites you over to hear his hi-fi system, you can tell immediately whether he’s a music lover or a “hi-fi buff” more interested in sound than in music. If he plays the music very loud, then turns it down after 30 seconds to seek your opinion (approval), he’s probably not a music lover, I however, he sits you down, asks what kind of music you like, plays it at a reasonable volume, and says or does nothing for the next 20 minutes while you both listen, it’s likely that this person holds audiophile values or simply cares a lot about music.
In the first example, the acquaintance tried to impress you with sound. In the second case, your friend also wanted to impress you with his system, but by its ability to express the music not shake the walls. This is the fundamental difference between “hi-fi enthusiasts” and music lovers. (You can use the same test to immediately tell what kind of hi-fi store you’re in. If anyone pulls out a CD of trains, sonic booms, Shuttle launches, or jet takeoffs, run for cover.)
All audio components affect the signal passing through them. Some products add artifacts (distortion) such as a grainy treble or a lumpy bass. Others subtract parts of the signal—for example, a loudspeaker that doesn’t go very low in the bass. (Listening terms are defined later in this section.) A fundamental audiophile value holds that sins of commission (adding something to the music) are far worse than sins of omission (removing something from the music). If parts of the music are missing, the ear/brain system subconsciously fills in what isn’t there; you can still enjoy listening. But if the playback system adds an artificial character to the sound, you are constantly reminded that you’re hearing a reproduction and not the real thing.
Let’s illustrate this sins-of-commission/omission dichotomy with two loud speakers. The first loudspeaker—a three-way system with a 15” woofer in a very large cabinet—sells for a moderate price in a mass-market appliance store; it plays loudly and develops lots of bass. The second loudspeaker sells for about the same price, but is a small two-way system with a 6” woofer. It doesn’t play nearly as loudly, and produces much less bass. While you need a refrigerator dolly to move the first loudspeaker, you can almost hold the second loudspeaker in your outstretched hand.
The behemoth loudspeaker has some problems: The bass is boomy, thick, and overwhelming. All the bass notes seem to have the same pitch. The very prominent treble is coarse and grainy, and the midrange has a big peak of excess energy that makes singers sound as if they have colds.
The small loudspeaker has no such problems. The treble is smooth and clear, and the midrange is pure and open. It has, however, very little bass by comparison, won’t play very loudly, and doesn’t produce a physical sensation of sound hitting your body.
The first loudspeaker commits sins of commission, adding unnatural artifacts to the sound. The bass peaks that make it sound boomy, the grain overlaying the treble, and the midrange colorations are all additive distortions.
The second loudspeaker’s faults, however, are of omission. It removes certain elements of the music—low bass and loud peaks—but leaves the remainder of the music intact. It doesn’t add grain to the treble, thickness to the bass, or colorations to the midrange.
There’s no doubt that the second loudspeaker will be more musically satisfying. The first loudspeaker’s additive distortions are not only much more musically objectionable, they also constantly remind you that you are listening to artificially reproduced music. The second loudspeaker’s flaws are of a nature that allows you to forget that you’re listening to loudspeakers. In the reproduction of music, addition is far worse than subtraction.
Another audiophile value holds that even small differences in the quality of the musical presentation are important. Because music matters to us, we get excited by any improvement in sound quality. Moreover, there isn’t a linear relationship between the magnitude of a sonic difference and its musical significance. A quality difference can be sonically small but musically large.
While reviewing a revelatory new state-of-the-art digital-to-analog converter, I listened to a piece of music I’d heard hundreds of times before. The piece, performed by a five-member group, had vocals and very long instrumental breaks. During the instrumental breaks, the vocalist played percussion instruments. Through lesser-quality digital processors, the percussion had always been just another sound fused into the music’s tapestry; I’d never heard it as a separate instrument played by the vocalist. The group seemed to become a four-piece ensemble when the vocalist wasn’t singing; I never heard the percussion as separate from the rest of the music.
The new digital processor was particularly good at resolving individual instruments and presenting them not as just more sounds homogenized into the overall musical fabric, but as distinct entities. Consequently, when the instrumental break came, I heard the percussion as a separate, more prominent instrument. In my mind’s eye, and for the first time, the vocalist never left—she remained “on stage,” playing the percussion instruments. By just this “small” change in the presentation, the band went from being a quartet to a quintet during the instrumental breaks. The “objective” difference in the electrical signal must have been minuscule; the subjective musical consequences were profound.
This is why small differences in the musical presentation are important—if you care deeply enough about music and about how well it is reproduced. “Small” improvements can have large subjective consequences. This example highlights the inability of measurements alone to characterize audio equipment performance. Measurements on the digital-to-analog converter in question indicated no technical attributes that would have contributed to my perceptions. More fundamentally, how can a number representing some aspect of the digital converter’s technical performance begin to describe the musical significance of the change I heard?
Much of music’s expression and meaning can be found in such minutiae of detail, subtlety, and nuance. When such subtlety is conveyed by the playback system, you feel a vastly deeper communication with the musicians. Their intent and expression are more vivid, allowing you to more deeply appreciate their artistry. For example, if you compare two performances of Vivaldi’s Sonata in D Major for solo violin— one competent, the other superlative—you could say that, on an objective basis, they were virtually identical. Both performers played the same notes at about the same tempo. The difference in expression is in the nuances—the inspired subtleties of rubato, tempo, emphasis, articulation, and dynamics that bring the performance to life and convey the piece’s musical meaning and intent. This example is analogous to the difference between mediocre and superb music playback systems, and why small differences in sound quality can matter so much. High-end audio is about reproducing these nuances so that you can come one step closer to the musical expression.
The sad but universal truth about audio equipment is that, any time you put a signal into an audio component, it never comes out better at the other end. You there fore want to keep the signal path as simple as possible, to remove any unnecessary electronics from between you and the music. This is why inserting equalizers and other such “enhancers” into the signal chain is usually a bad idea—the less done to the signal, the better. The advent of digital technology, however, has made possible some beneficial signal processing. (An example is digital room correction, described in Section 12.)
Downsides of Becoming a Critical Listener:
There are dangers inherent in developing critical listening skills. The first is an inability to distinguish between critical listening and listening for pleasure. Once started on the path of critiquing sound quality, it’s all too easy to forget that the reason you’re involved in audio is because you love music, and to start thinking that every time you hear music, you must have an opinion about what’s right and what’s wrong with the sound. This is the surest path toward a condition humorously known as “Audiophilia nervosa”. Symptoms of Audiophilia nervosa include constantly changing equipment, playing only one track of a CD or LP at a time instead of the whole record, changing cables for certain music, refusing to listen to great music if it happens to be poorly recorded, and in general “listening to the hardware” instead of to the music.
But high-end audio is about making the hardware disappear. When listening for pleasure—which should be the vast majority of your listening time—forget about the system. Forget about critical listening. Shift into critical-listening mode only when you need to make a judgment, or just for practice to become a better listener. Draw the line between critical listening and listening for pleasure—and know when to cross it and when not to cross it.
There is also the related danger that your standards of sound quality will rise to such a height that you can’t enjoy music unless it’s “perfectly” reproduced—in other words, to the point that you can’t enjoy music, period. Although it’s not very high-quality reproduction, I get a great deal of pleasure from my car stereo and iPod—don’t let being an audiophile interfere with your enjoyment of music, anytime, anywhere. When you can’t control the sound quality, lower your expectations.
Sonic Descriptions and their Meanings
The biggest problem in critical listening is finding words to express our perceptions and experiences. We hear things in reproduced music that are difficult to identify and put into words. A listening vocabulary is essential not only to conveying to others what we hear, but also to recognizing and understanding our own perceptions. If you can attach a descriptive name to a perception, you can more easily recognize that perception when you experience it again.
By describing in detail the specific sonic characteristics of how electronic components change the sound of music passing through them, I hope to attune you to recognizing those same characteristics when you listen. After reading this next section, listen to two products for yourself and try to hear what I’m describing. It can be any two products—if you have a portable CD player, hook it up to your system and com pare it to your home CD player. Even comparing a CD and an MP3 file made from that CD will get you started. The important thing is to start listening analytically. If you don’t hear the sonic differences immediately, keep listening. The more you listen, the more sensitive you’ll become to those differences.
I notice this first-hand when I occasionally spend time listening critically in my listening room with visiting manufacturers and designers of high-end equipment— many of them highly skilled listeners. While we share many commonalities in deter mining what sounds good, there is a wide range of perception about what aspects of the presentation are most important.
You should also know that recordings made with audiophile techniques are more revealing of some aspects of reproduced sound than recordings made for mass consumption. For example, a recording of classical music made in a concert hail with very few microphones, a simple signal path, and high-quality recording equipment will likely reveal more about a component’s soundstaging performance than a pop recording made in a studio. Similarly, most mass-market recordings have almost no dynamic range so that they sound “good” on a 4” car-stereo speaker. For these reasons, some of the sonic terms described in this section apply much more to audiophile-quality recordings than to mass-market ones.
It’s also useful to understand the broad terms that describe the audio frequency band. The range of human hearing, which spans ten octaves from about 16Hz (cycles per second) to 20,000Hz, or 20 kilohertz (20kHz), can be divided into the specific regions described below Note that these divisions are somewhat arbitrary; you can’t say specifically that the lower treble begins at 2000Hz and not 2500Hz, for example. The table nonetheless provides a rough guideline for understanding the relation ship between frequency ranges and their descriptive names.
This rough guide will help you understand the following terms and definitions. A full characterization of how a product “sounds” will include aspects of each of the following sonic qualities.
Tone and Balance:
The first aspect of the musical presentation to listen for is the product’s overall tonal balance. How well balanced are the bass, midrange, and treble? If it sounds as though there is too much treble, we call the presentation bright. The impression of too little treble produces a dull or rolled-off sound. If the bass overwhelms the rest of the music, we say the presentation is heavy or weighty If we hear too little bass, we call the presentation thin, lightweight, uptilted, or lean.
A product’s tonal balance is a significant—and often overwhelming—aspect of its sonic signature.
The term perspective describes the apparent distance between the listener and the music. Perspective is largely a function of the recording (particularly the distance between the performers and the microphones), but is also affected by components in the playback system. Some products push the presentation forward, toward the listener; others sound more distant, or laid-back. The forward product presents the music in front of the loudspeakers; the laid-back product makes the music appear slightly behind the loudspeakers. Put another way, the forward product sounds as though the musicians have taken a few steps toward you; the laid-back product gives the impression that the musicians have taken a few steps back.
Another way of describing perspective is by row number in a concert hail. Some products seem to “seat” the listener at the front of the hall—in Row D, say. Others give you the impression that you’re sitting farther back; say, in Row S. Several other terms describe perspective. D generally means lacking reverberation and space, but can also apply to a forward perspective. Other watchwords for a forward presentation are immediate, incisive, vivid, aggressive, and present. Terms associated with laid-back include lush, easy-going and gentle.
Products with a forward presentation produce a greater sense of an instrument’s presence before you, but can quickly become fatiguing. Conversely, if the presentation is too laid-back, the music is uninvolving and lacking in immediacy.
A laid-back presentation invites the listener in, pulling her gently forward into the music, allowing her the space to explore its subtleties. It’s like the difference between having a conversation with someone who is aggressive, gets in your face, and talks too loudly, compared with someone who stands back, speaking quietly and calmly.
In loudspeakers, perspective is often the result of a peak or dip in the midrange (a peak is too much energy, a dip is too little). In fact, the midrange between 1kHz and 3kHz is called the presence region because it provides a sense of presence and immediacy. The harmonics of the human voice span the presence region; thus, the voice is greatly affected by a product’s perspective.
Good treble is essential to high-quality music reproduction. In fact, many otherwise excellent audio products fail to satisfy musically because of poor treble performance.
The treble characteristics we want to avoid are described by the terms bright, edgy, forward, aggressive, hard, brittle, edgy, dry, white, bleached, wiry, metallic, sterile, analytical, screechy, and grainy. Treble problems are pervasive; look how many adjectives we use to describe them.
If a product has too much apparent treble, it overstates sounds that are already rich in high frequencies. Examples are overemphasized cymbals, excessive sibilance (s and sh sounds) in vocals, and violins that sound thin. A product with too much apparent treble is called bright. Brightness is a prominence in the treble region, primarily between 3kHz and 6kHz. Brightness can be caused by a rising frequency response in loudspeakers, or by poor electronic design. Many CD players and solid-state amplifiers that measure as having a flat (accurate) frequency response nevertheless add prominence to the treble.
Tizziness describes too much upper treble (6kHz - 10kHz), characterized as a whitening of the treble. Tizzy cymbals have an emphasis on the upper harmonics, the sizzle and air that rides over the main cymbal sound. Tizziness gives cymbals more of an ssssss than a sssshhhh sound.
Forward, if applied to treble, is very similar to bright both describe too much treble. A forward treble, however, also tends to be dry, lacking space and air around it.
Many of the terms listed above have virtually identical meanings. Hard, brittle, and metallic all describe an unpleasant treble characteristic that reminds one of metal being struck. In fact, the unique harmonic structure created from the impact of metal on metal is very similar to the distortion introduced by a solid-state power amplifier when it is asked to play louder than it is capable of playing.
A particularly annoying treble characteristic is graininess. Treble grain is a coarseness overlaying treble textures. I notice it most on solo violin, massed violins, flute, and female voice. On flute, treble grain is recognizable as a rough or fuzzy sound that seems to ride on top of the flute’s dynamic envelope. (That is, the grain follows the flute’s volume.) Grain makes violins sound as though they’re being played with hacksaw blades rather than bows—a gross exaggeration, but one that conveys the idea of the coarse texture added by grain.
The most common sources of these problems are, in rough order of descending magnitude: tweeters in loudspeakers, overly reflective listening rooms, digital source components (usually the CD player or digital processor), preamplifiers, power amplifiers, cables, and dirty AC power sources.
So far, I’ve discussed only problems that emphasize treble. Some products tend to make the treble softer and less prominent than live music. This characteristic is often designed into the product, either to compensate for treble flaws in other components in the system, or to make the product sound more palatable. Deliberately softening the treble is the designer’s shortcut; if he can’t get the treble right, he just makes it less offensive by softening it.
The following terms, listed in order of increasing magnitude, describe good treble performance: smooth, sweet, soft, silky, gentle, liquid, and lush. When the treble becomes overly smooth, we say it is romantic, rolled-off or syrupy. A treble described as “smooth, sweet, and silky” is being complimented; “rolled-off and syrupy” suggests that the component goes too far in treble smoothness, and is therefore colored.
A rolled-off and syrupy treble may be blessed relief after hearing bright, hard, and grainy treble, but it isn’t musically satisfying in the long run. Such a presentation tends to become bland, uninvolving, slow, thick, closed-in, and lacking detail. All these terms describe the effects of a treble presentation that errs too far on the side of smoothness. The presentation will lack life, air, openness, extension, and a sense of space if the treble is too soft. The music sounds closed-in rather than being big and open.
The best treble presentation is one that sounds most like real music. It should have lots of energy—cymbals can, after all, sound quite aggressive in real life—yet not have a synthetic, grainy, or dry character. We don’t hear these characteristics in live music; we shouldn’t hear them in reproduced music. More important, the treble should Mound like an integral part of the music, not a detached noise riding on top of it. If a component has a colored treble presentation, however, it is far less musically objection able if it errs on the side of smoothness rather than brightness.
Stereophile’s magazine founder and the father of observational audio equipment evaluation, once wrote, “If the midrange isn’t right, nothing else matters.”
The midrange is important for several reasons. First, most of the musical energy is in the midrange, particularly the important lower harmonics of most instruments. Not only does this region contain most of the musical energy, but the human ear is much more sensitive to midrange and lower treble than to bass and upper treble. Specifically, the ear is most sensitive to sounds between about 800Hz and 3kHz, and to small changes in both volume and frequency response within this band. The ear’s threshold of hearing—i.e., the softest sound we can hear—is dramatically lower in the midband than at the frequency extremes. We’ve developed this additional midband acuity probably because the energy of most of the sounds we heard every day for hundreds of thousands of years—the human voice, rustling leaves, the sounds of other animals—is concentrated in the midrange.
Midrange colorations can be extremely annoying. Loudspeakers with peaks and dips in the mids sound very unnatural; the midrange is absolutely the worst place for loudspeaker imperfections. Confining our discussion to loudspeakers for the moment, midrange colorations overlay the music with a common characteristic that emphasizes certain sounds. The male speaking voice is particularly revealing of midrange anomalies, which are often described by comparisons with vowel sounds. A particular coloration may impart an aaww sound; a coloration lower in frequency may emphasize ooohhh sounds; a higher-pitched coloration may sound like eeeee; another coloration might sound hooty.
Some midrange colorations can be likened to the sound of someone speaking through cupped hands. Try reading this sentence while cupping your hands around your mouth. Open and close your hands while listening to how the sound of your voice changes. That’s the kind of midrange coloration we sometimes hear from loud speakers—particularly mass-market ones.
In short, if recordings of male speaking voice sound monotonous, tiring, and resonant, it’s probably the result of peaks and dips in the loudspeaker’s frequency response. (These colorations are most apparent on male voice when listening to just one loudspeaker.)
Terms to describe poor midrange performance include peaky, colored, chesty, boxy, nasal, congested honky, and thick. Chesty describes a lower-midrange coloration that makes vocalists sound as though they have colds. Boxy refers to the impression that the sound is coming out of a box instead of existing in open space. Nasal is usually associated with an excess of energy that spans a narrow frequency range, producing a sound similar to talking with your nose pinched. Honky is similar to nasal, but higher in frequency and spanning a wider frequency range.
As described previously under “Perspective,” too much midrange energy can make the presentation seem forward and “in your face.” A broad dip in the midrange response (too little midrange energy over a wide frequency span) can give an impression of greater distance between you and the presentation.
When choosing loudspeakers, be especially attuned to the midrange colorations described. What is a very minor—even barely noticeable—problem heard during a brief audition can turn into a major irritant over extended listening.
The preceding descriptions apply primarily to midrange problems introduced by loudspeakers. Expanding the discussion to include electronics (preamps and power amps) and source components (LP playback or a digital source) introduces different aspects of midrange performance that we should be aware of
An important factor in midrange performance is how instrumental textures are reproduced. Texture is the physical impression of the instrument’s sound—its fabric rather than its tone. The closest musical term for texture is timbre, defined “the quality given to a sound by its over tones; the quality of tone distinctive of a particular singing voice or instrument.” Sonic artifacts added by electronics often affect instrumental and vocal textures.
The term grainy, introduced in the description of treble problems, also applies to the midrange. In fact, midrange grain can be more objectionable than treble grain. Midrange grain is characterized by a coarseness of instrumental and vocal textures; the instrument’s texture is granular rather than smooth.
Midrange textures can also sound hard and brittle. Hard textures are apparent on massed voices; a choir sounds glassy, shiny, and synthetic. This problem gets worse as the choir’s volume increases. At low levels, you may not hear these problems. But as the choir swells, the sound becomes hard and irritating. Piano is also very revealing of hard midrange textures, the higher notes sounding annoyingly brittle. When the midrange lacks these unpleasant artifacts, we say the textures are liquid, smooth, sweet, p and lush.
Bass performance is the most misunderstood aspect of reproduced sound, among the general public and hi-fi buffs alike. The popular belief is that the more bass, the better. This is reflected in ads for “subwoofers” that promise “earthshaking bass” and the ability to “rattle pant legs and stun small animals.” The ultimate expression of this perversity is boom trucks that have absurd amounts of extraordinarily bad bass reproduction.
But we want to know how the product reproduces music, not earthquakes. What matters to the music lover isn’t quantity of bass, but the quality of that bass. We don’t just want the physical feeling that bass provides; we want to hear subtlety and nuance. We want to hear precise pitch, lack of coloration, and the sharp attack of plucked acoustic bass. We want to hear every note and nuance in fast, intricate bass playing, not a muddled roar. If Ray Brown, Stanley Clarke, John Patitucci, Dave La Rue, Dave Holland, or Eddie Gomez is working out, we want to hear exactly what they’re doing. In fact, if the bass is poorly reproduced, we’d rather not hear much bass at all.
Correct bass reproduction is essential to satisfying musical reproduction. Low frequencies constitute music’s tonal foundation and rhythmic anchor. Unfortunately, bass is difficult to reproduce, whether by source components, power amplifiers, or— especially—loudspeakers and rooms.
Perhaps the most prevalent bass problem is lack of pitch definition or articulation. These two terms describe the ability to hear bass as individual notes, each having an attack, a decay, and a specific pitch. You should hear the texture of the bass, whether it’s the sonorous resonance of a bowed double bass or the unique character of a Fender Precision. Low frequencies contain a surprising amount of detail when reproduced correctly.
When the bass is reproduced without pitch definition and articulation, the low end degenerates into a dull roar underlying the music. You hear low-frequency content, but it isn’t musically related to what’s going on above it. You don’t hear precise notes, but a blur of sound—the dynamic envelopes of individual instruments are completely lost. In music in which the bass plays an important rhythmic role—rock, electric blues, and some jazz—the bass guitar and kick drum seem to lag behind the rest of the music, putting a drag on the rhythm. Moreover, the kick drum’s dynamic envelope (what gives it the sense of sudden impact) is buried in the bass guitar’s sound, obscuring its musical contribution. These conditions are made worse by the common mid-fl affliction of too much bass.
Terms descriptive of this kind of bass include muddy, thick, boomy, bloated, tubby, soft, congested, loose, and slow.
Terms that describe excellent bass reproduction include taut, quick, clean, articulate, agile, tight, and precise. Good bass has been likened to a trampoline stretched taut; poor bass is a trampoline hanging slackly.
The amount of bass in the musical presentation is very important; if you hear too much, the music is overwhelmed. Excessive bass is a constant reminder that you’re listening to reproduced music. This overabundance of bass is described as heavy.
If you hear too little bass, the presentation is thin, lean, threadbare, or over-damped. An overly lean presentation robs music of its rhythm and drive—the full, purring sound of bass guitar is missing, the depth and majesty of double bass or cello are gone, and the orchestra loses its sense of power. Thin bass makes a double bass sound like a cello, a cello like a viola. The rhythmically satisfying weight and impact of bass drum are reduced to shadows of their former power. Instruments’ harmonics are emphasized in relation to the fundamentals, giving the impression of well-worn cloth that’s lost its supporting structure. A thin or lean presentation lacks warmth and body. As described earlier in this section in the discussion of audio sins of commission and omission, an overly lean bass is preferable to boomy bass.
Two terms related to what I’ve just described about the quantity of bass are extension or depth. Extension is how deep the bass goes—not the bass and upper bass described by lean or weighty but the very bottom end of the audible spectrum. This is the realm of kick drum and pipe organ. All but the very best systems roll off (reduce in volume) these lowermost frequencies. Fortunately, deep extension isn’t a prerequisite to high-quality music reproduction. If the system has good bass down to about 35Hz, you don’t feel that much is missing. Pipe-organ enthusiasts, however, will want deeper extension and are willing to pay for it. Reproducing the bottom octave correctly can be very expensive.
Much of music’s dynamic power—the ability to convey wide differences between loud and soft—is contained in the bass. Though I’ll discuss dynamics later in this section, bass dynamics bear special discussion—they are that important to satisfying music reproduction.
A system or component that has excellent bass dynamics will provide a sense of sudden impact and explosive power. Bass drum will jump out of the presentation with startling power. The dynamic envelope of acoustic or electric bass is accurately conveyed, allowing the music full rhythmic expression. We call these components punchy and use the terms impact and slam to describe good bass dynamics.
A related aspect is speed, though, as applied to bass, “speed” is somewhat of a misnomer. Low frequencies inherently have slower attacks than higher frequencies, making the term technically incorrect. But the musical difference between “slow” and “fast” bass is profound. A product with fast, tight, punchy bass produces a much greater rhythmic involvement with the music. (This is examined in more detail later.)
Although reproducing the sudden attack of a bass drum is vital, equally important is a system’s ability to reproduce a fast decay; i.e., how a note ends. The bass note shouldn’t continue after a drum whack has stopped. Many loudspeakers store energy in their mechanical structures and radiate that energy slightly after the note itself. When this happens, the bass has overhang, a condition that makes kick drum, for example, sound bloated and slow. Music in which the drummer used double bass drums l particularly revealing of bass overhang. If the two drums merge into a single sound, overhang is probably to blame. You should hear the attack and decay of each drum as distinct entities. Components that don’t adequately convey the sudden dynamic impact of low-frequency instruments rob music of its power and rhythmic drive.
Soundstaging is the apparent physical size of the musical presentation. When you close your eyes in front of a good playback system, you can “see” the instrumentalists and singers before you, often existing within an acoustic space such as a concert hail. The soundstage has the physical properties of width and depth, producing a sense of great size and space in the listening room. Soundstaging overlaps with imaging, or the way instruments appear as objects hanging in three-dimensional space within the recorded acoustic. As mentioned previously in this section, a large and well-defined soundstage is most often heard when playing audiophile-grade recordings made in a real acoustic space such as a concert hall or church.
The most obvious descriptions of the soundstage are its physical dimensions—width and depth. You hear the musical presentation as existing beyond the left and right loudspeaker boundaries, and extending farther away from you than the wall behind the loudspeakers.
Of all the ways music reproduction is astounding, soundstaging is without question the most miraculous. Think about it: The two loudspeakers are driven by two- dimensional electrical signals that are nothing more than voltages that vary over time. From those two voltages, a huge, three-dimensional panorama unfolds before you. You don’t hear the music as a flat canvas with individual instruments fused together; you hear the first violinist to the left front of the presentation, the oboe farther back and toward the center, the brass behind the basses on the right, and the tambourine behind all the other instruments at the very rear. The sound is made up of individual objects existing within a space, just as you would hear at a live performance. Moreover, you hear the oboe’s timbre coming from the oboe’s position, the violin’s timbre coming from the violin’s position, and the hail reverberation surrounding the instruments. The listening room vanishes, replaced by the vast space of the concert hall—all from two voltages.
A soundstage is created in the brain by the time and amplitude differences encoded in the two audio channels. When you hear instrumental images toward the rear right of the soundstage, the ear/brain is synthesizing those aural images by processing the slightly different information in the two signals arriving at your ears. Visual perception works the same way: there is no depth information present on your retinas; your brain extrapolates the appearance of depth from the differences between the two flat images.
Audio components vary greatly in their abilities to present these spatial aspects of music. Some products shrink soundstage width and shorten the impression of depth. Others reveal the glory of a fully developed soundstage. We find good soundstage performance crucial to satisfying musical reproduction. Unfortunately, many products destroy or degrade the subtle cues that provide soundstaging.
Terms descriptive of poor soundstage width are narrow and constricted—the music, squeezed together between the loudspeakers, does not envelop the listener. A soundstage lacking depth is called flat, shallow, or foreshortened. Ideally, the soundstage should maintain its width over its entire depth. A soundstage that narrows toward the presentation’s rear robs the music of its size and space.
The illusion of soundstage depth is aided by resolution of low-level spatial cues such as hail reflections and reverberation. In particular, the reverberation decay after a loud climax followed by a rest helps define the acoustic space. The loud signal is like a flash of light in a dark room; the space is momentarily illuminated, allowing you to see its dimensions and characteristics.
Now that we’ve covered space and depth, let’s discuss how the instrumental images appear within this space. Images should occupy a specific spatial position in the soundstage. The sound of the bassoon, for example, should appear to emanate from a specific point in space, not as a diffuse and borderless image. The same could be said for guitar, piano, sax, or any other instrument in any kind of music. The lead vocal should appear as a tight, compact, definable point in space exactly between the loud speakers. Some products, particularly large loudspeakers, distort image size by making every instrument seem larger than life—a classical guitar suddenly sounds ten feet wide. A playback system should reveal somewhat correct image size, from a 60’-wide symphony orchestra to a solo violin. I say “somewhat” because it is impossible to recreate the correct spatial perspectives of such widely divergent sound sources through two loudspeakers spaced about 8’ apart. Although image size and placement are characteristics inherent in the recording, they are dramatically affected by components in the playback system.
Terms that describe a clearly defined soundstage are focused, tight, delineated and sharp. Image specificity also describes tight image focus and pinpoint spatial accuracy. A poorly defined soundstage is described as homogeneous, blurred confused, congested, thick, and lacking focus.
Some products produce a crystal-clear, see-through soundstage that allows the listener to hear all the way to the back of the hail. Such a transparent soundstage has a lifelike immediacy that makes every detail clearly audible. Conversely, an opaque soundstage is thick or murky, with less of an illusion of “seeing” into space. Veiling is often used to describe a lack of transparency.
Finally, superb soundstaging is relatively fragile. You need to sit directly between the loudspeakers, and every component in the playback chain must be of high quality. Soundstaging is easily destroyed by low-quality components, a bad listening room, or poor loudspeaker placement. This isn’t to say you have to spend a fortune to get good soundstaging; many very-low-cost products do it well, but it is more of a challenge to find those bargains.
The dynamic range of an audio system isn’t how loudly it will play, but rather the difference in level between the softest and loudest sounds that the system can reproduce. It is often specified technically as the difference between the component’s noise level and its maximum output level. A symphony orchestra has a dynamic range of about 100 decibels, or dB; a typical rock recording’s dynamic range is about 10dB. In other words, comparatively speaking, the rock band is always loud; it has little dynamic range.
Dynamics are a very important part of music reproduction. They propel the music forward and involve the listener. Much of music’s expression is conveyed by dynamic contrast, from pp (pianissimo) to fff (triple forte).
There are two distinct kinds of dynamics. Macro-dynamics refers to the presentation’s overall sense of slam, impact, and power—bass-drum whacks and orchestral crescendos, for examples. If the system has poor macro-dynamics, we say the sound is compressed or squashed.
Micro 4ynamics occur on a smaller scale. They don’t produce a sense of impact, but are essential to providing realistic dynamic reproduction. Micro-dynamics describe the fine dynamic structure in music, from the attack of a triangle or other small percussion instruments in the back of the soundstage to the suddenness of a plucked string on an acoustic guitar. Neither sound is very loud in level, but both have dynamic structures that require agility and speed from the playback system.
Products with good dynamics—macro and micro—make the music come alive, allowing a vibrancy and life to emerge. Dynamic changes are an important vehicle of musical expression; the more you hear the musicians’ intent, the greater the musical communication between performers and listener. Some otherwise excellent components fail to convey the broad range of dynamic contrast.
These characteristics are associated with transient response, a system’s ability to quickly respond to an input signal. A transient is a short-lived sound, such as that made by percussion instruments. Transient response describes an audio system’s ability to faithfully reproduce the quickness of transient signals. For example, a drum being struck produces a waveform with a very steep attack (the way the sound begins) and a fast decay (the way a sound stops). If any component in the playback system can’t respond as quickly as the waveform changes, a distortion of the music’s dynamic envelope occurs, and the steepness is slowed. Audio components described as quick or fast reproduce the suddenness of transient signals.
But just because a component or system can reproduce loud and soft levels doesn’t necessarily mean it has good dynamics. We’re looking for more than a wide dynamic range. The system must be capable of expressing fine gradations of dynamics, not just loud and soft. As the music changes in level (which, except in many rock recordings, it’s doing most of the time), you should hear loudness changes along a smooth continuum, not as abrupt jumps in levels.
Detail refers to the small or low-level components of the musical presentation. The fine inner structure of an instrument’s timbre is one kind of detail. The term is also associated with transient sounds (those with a sudden attack) at any level, such as those made by percussion instruments. A playback system with good resolution of detail will infuse music with that sense that there is simply more music happening.
Assembling a good-sounding music system or choosing between two components can often be a tradeoff between smoothness and the resolution of detail. Many audio components hype detail, giving transient signals an etched character. Etch is an unpleasant hardness of timbre on transients that emphasizes their prominence. Sure, you can hear all the information, but the presentation becomes too aggressive, analytical, and fatiguing low-level information is brought up and thrust at you, and you feel a sense of relief when the music is turned down or off—not a good sign.
Components that err in the opposite direction don’t have this etched and analytical quality, but neither do they resolve all the musical information in the recording. These components are described as overly smooth, or having low resolution. They tend to make music bland by removing parts of the signal needed for realistic reproduction. These kinds of components don’t rivet your attention on the music; they are uninvolving and dull. You aren’t offended by the presentation, as you are with an analytical system, but something is missing that you need for musical satisfaction.
It is a rare product indeed that presents a full measure of musical detail without sounding etched. The best products will reveal all the low-level cues that make music interesting and riveting, but not in a way that results in listening fatigue—that sense of tiredness after a long listening session. The music playback system must walk the very fine line between resolution of real musical information and sounding analytical.
Finally, we get to the most important aspect of a system’s presentation—musicality. Unlike the previous characteristics, musicality isn’t any specific quality that you can listen for, but the overall musical satisfaction the system provides. Your sensitivity to musicality is destroyed when you focus on a certain aspect of the presentation; i.e., when you listen critically. Instead, musicality is the gestalt, the whole of your reaction to the reproduced sound. We also use the term involvement to describe this oneness with the music. A sure indication that a component or a system has musicality is when you sit down for an analytical listening session and minutes later find yourself immersed in the music and abandoning the critical listening session. This has happened many times to me as a reviewer, and is a good measure of the product’s fundamental “rightness.” Ultimately, musicality—not dissecting the sound—is what high-end audio is all about.