Home  Audio Magazine  Stereo Review magazine  Good Sound  Troubleshooting 
 Allan R. Broadhurst and Donald K. Darnell From Allan R. Broadhurst and Donald K. Darnell, "Introduction to Cybernetics and Information Theory," Quarterly Journal of Speech, 1965, 51, 442453. Reproduced with permission of the authors and publisher. 1 Norbert Wiener, The Human Use of Human Beings: Cybernetics and Society (New York, 1954), p. 16. 2. Wayne N. Thompson, "A Conservative View of a Progressive Rhetoric," QJS, XLIX (February 1963), 5. 3. F. H. George, Automation, Cybernetics, and Society (New York, 1959), pp. 4546. 4. Wiener, p. 17. It no longer seems strange for rhetoricians, experimental psychologists, communication theorists, and communication engineers to work side by side on common problems. One of the principal contributions in the last two decades to the field of communications has been in the area of cyberneticsfrom which information theory is derived. Cybernetics is a field of study which seems destined more and more to find its way into the periodicals and literature of the speech scholar. Many such scholars, with a background emphasizing classical rhetoric, have neither the mathematical sophistication nor the unlimited time necessary to master a total comprehension of this new field. This paper is an attempt to provide fundamental information for those who have only a general interest in "cybernetics" and "information theory," and to prepare those with an abiding interest for the more thorough, detailed works. The late Norbert Wiener, professor of mathematics at the Massachusetts Institute of Technology, first coined the term "cybernetics" from the Greek word kubernetes, or "steersman," the same Greek root from which we get our word "governor." Though the word was first used publicly by Wiener in 1948, it has now been used retrospectively to cover the whole field of communication and control which originated years before. Cybernetics is a philosophy which insists that, from the point of view of communication, the human organism is not essentially different from a machine. It emphasizes the resemblances between living organisms and manconstructed machinery, and points out that even though the components differ, in theory their operation is essentially the same. In effect, this means that the scientist can treat the human communication process as if it were being conducted by machines, and he is concerned with the building of machines that can "think," "learn," and "communicate." " As Norbert Wiener commented, "When I give an order to a machine, the situation is not essentially different from that which arises when I give an order to a person. In other words, as far as my consciousness goes I am aware of the order that has gone out and of the signal of compliance that has come back. To me, personally, the fact that the signal in its intermediate stages has gone through a machine rather than through a person is irrelevant and does not in any case greatly change my relation to the signal." 4 It is not surprising, then, that Mr. Wiener thought of messages be tween "man and machines," "machines and man," and "machine and machine" as playing an everincreasing part in our society. With any new word or idea, widespread recognition is slow in being achieved. This is particularly true in regard to the term "cybernetics" because we are disposed to think in emotional ways about the words "machine" and "mechanical." In the past "machine" has applied merely to the simple constructions of man. Thus, bicycles, automobiles, airplanes, industrial equipment, and motors are quickly brought to mind when the term "machine" is used. "Mechanical" is a word "used to describe actions that are automatic and unthinking, precisely in opposition to the human characteristics of reflection, thoughtfulness, insight, and so on." 5 Therefore, to say that a machine or mechanical object can think and communicate is a barrier to many that is difficult to overcome. However, cyberneticians do see the similarity between machines and organ isms as being so great that the specifications for one appear to include the specifications for the other. They mean the word "machine" now to apply to some thing far more complex than any existing machine. In effect, a better way to put it would be to say that they feel organisms are essentially constructible. As F. H. George puts it, "There is no reason, in principle, why we should not ourselves construct human beings. Needless to say, many cyberneticians would stop short of this claim, and their work does not in any way depend on it. What is important is that we can build machines that will perform all the more mechanical tasks that humans perform. This is the extent to which automation is fostered, but cybernetics casts its net much further afield than automation. It implies a whole philosophy of science and, taken in the broader social context, a human philosophy of life." The Second World War highlighted the need for extensive research in the area of cybernetics. The war provided a series of problems never before encountered. For the first time in the history of warfare missiles, rockets, and planes were approaching and passing the speed of sound. The main problem became one of rangefinding for antiaircraft guns in highspeed aerial warfare. The older systems of manuallycontrolled buttons, gauges, and calculations were totally in adequate in this new war. The need now was for speed and accuracy in the over all tracking of objects speeding through the airpredicting their direction, height, and velocity so that this information could aid in their destruction. To meet this need, machines were programmed in such a way that they could make decisions and instruct the various parts of the antiaircraft guns to operate on the basis of their decisions. In other words, machines were "thinking" and "communicating." George, p. 46. 6 Ibid. The advancements made during the war undoubtedly led to the postwar boom in computing machines. Computers are one of the chief reasons for the cybernetics hypothesis about machinery and organisms. These and similar machines are capable of making deductive inferences, solving mathematical problems, classifying information, and making predictions that are based on inductive reasoning. In effect, they are being taught to "think," "learn," and "communicate." Thus, one notes that a new theory of information was being developed. The theory was concerned with the problem of defining the quantity of information contained in a message to be transmitted, and how to go about measuring the amount of information communicated by a system of electronic or machinelike signals. Largely through the independent efforts of Norbert Wiener and Claude Shannon, this new theory of information has been applied to a mathematical theory of communication. This mathematical theory of communication refers to every conceivable kind of transmission of informationfrom the first words of a baby to the complicated theories of an atomic scientist. The units of information are numerousmuscle contractions, sine waves, phonemes, morphemes, syllables, words, phrases, clauses, sentences, paragraphs, letters of the alphabet, numbers, parts of speech, et cetera. The only real restriction is that one must be able to recognize the unit whenever it occurs so it can be mathematically programmed and fed into the machine.7 The work of the information theorist can be likened to that of the mapmaker who presents a traveler with a record of the important towns, highways, and sites of historical interest. But the towns are only dots and the rivers are only lines and all the exciting adventures along the way are missingthe interesting details and beautiful scenery are deliberately omitted. In a similar way information theory does not involve the valuejudgments of the human element. The engineer who designs a telephone system does not care whether this link is going to be used for transmission of gossip, for stock exchange quotations, or for diplomatic messages. The technical problem is always the sameto transmit the information accurately and correctly, whatever it may be. At present, then, information theory treats information as a physically measurable quantity, but it cannot distinguish between information of great importance and a piece of news of no great value for the person who receives it.° What then is this "physically measurable quantity," and how is it achieved? It is what is frequently referred to by communication theorists as a "bit" of information and is achieved through the use of a binary coding system. Gustav Herdan states in his book TypeToken Mathematics that "our brain and nervous sys tem work in such a way that a nerve cell is either excited or not, which means that it can assume either of two mutually exclusive states, but not both at the same time." ° In other words, the nerve cell can make a YesNo decision, AllorNothing decision, or an 01 alternative. This is in accordance with the principle known in logic as that of Contradiction, and, "since language is, in the last re sort, only the formulation of logical relations in terms of linguistic forms, it seems sensible to conclude that a dual or dyadic symbol system will be appropriate for language." 7. George A. Miller, Language and Communication (New York, 1951), p. 82. 8. Leon Brillouin, Science and Information Theory (New York, 1956), p. xi. 9. Gustav Herdan, TypeToken Mathematics, A Textbook of Mathematical Linguistics (The Hague, Netherlands, 1960), p. 173. 10. Ibid. A coding system with only two symbols falls neatly into this YesNo principle. An application is the Morse Code of telegraphy with its DotDash symbol system. If for a number system of this kind one chooses 0 and 1 as symbols, one has the socalled Dual, Dyadic or Binary number system in which all numbers are written by means of 0 and 1 only. In contrast to this, one may consider our decimal number system which makes use of ten digits, 09 inclusive. The bi nary system, however, can be used as the basis for all number codes or alphabets. For example, a possible binary system to replace our decimal system could be as follows: Decimal system Binary system 0 0 1 1 2 10 3 11 4 100 5 101 6 110 7 111 8 1000 9 1001 etc. etc. A possible binary code for the letters of the English language could be: Letter Binary code e 100 t 1000 o 1100 a 10000 n 10100 i 11000 ? 11100 a 101000 h 101100 d 110000 etc. etc. Any determined units of information could be signified by such a binary coding systemwhether the units be sine waves, phonemes, syllables, paragraphs, or letters of the alphabet. The only real restriction, again, is that the units must be defined in such a way that they can be recognized whenever they occur so they can be mathematically programmed and fed into the machine. There are also experimental devices which give the possibility for using positive or negative pulses, in addition to no pulse. This is the case for magnetic tape and for systems using positive and negative current. These systems lead to the possibility of a ternary code which is based on 1, 0, +1. One of the three signals (1, for example) may be used for the letter space, thus leaving the two remaining signals to be used as binary coding for the letters." The coding would look like the following: Letter Code e 0,1 00,1 o 01,1 a 10,1 11,1 000,1 001,1 etc. etc. Used in the programming of information, mathematics becomes a precise and wellstructured language. It is a language which seems to be quite basic to our descriptions of the world. It is a language that picks out the most general structural relations in any situation capable of description. But it can also be an abstract language capable of dealing with structural relations that exist only as we define themsuch as morphemes, phonemes, and syllables. Information theory, therefore, is not concerned with information at allnot in the common meaning of the term "information." Information theory does not deal with meaning, with message content, with knowledge about a subject. Why, then, is information theory so important to communication? It is because the transmission of "information," eliciting meanings in others, requires a codea set of symbols and a set of rules for combining themand information theory is concerned with codes and the capacities of channels. 11. Brillouin, p. 53. 12. Claude E. Shannon and Warren Weaver, The Mathematical Theory of Communication (Urbana, Ill., 1949), p. 103. Information is something we need when we face a choice, and the amount of information needed to make a decision depends on the complexity of the choice involved. If we face many different equally likely alternativesin other words one is just as apt to happen as the otherthen we need more information than if we faced a simple choice between just two alternatives, either this or that. As Claude Shannon uses the term, information refers to knowledge that one does not have about what is coming next in a sequence of symbols. He thus associates information with entropy, which is nothing more than randomness, or the lack of predictability. Warren Weaver referred to entropy in the physical sciences as being a measure of the degree of randomness or "shuffledness" in a situation. 12 He goes on to point out that there is a tendency for physical systems to become less and less organized, and for entropy to increase. As Norbert Wiener stated it, "The commands through which we exercise our control over our environment are a kind of information which we impart to it. Like any form of information, these commands are subject to disorganization in transit. They generally come through in less coherent fashion and certainly not more coherently than they were sent. In control and communication we are always fighting nature's tendency to degrade the organized and to destroy the meaningful; the tendency, as Gibbs has shown us, for entropy to increase." 13 The relationship between the two terms "information" and "entropy" can perhaps be made clear by the following example. Consider a source who is successively selecting discrete symbols from a set of symbols; if those symbols are independent and equally probable, that source has a maximum freedom of choice, the uncertainty about what the next symbol will be is maximum (there fore, there is maximum entropy in the situation) and X bits of information are required to make the next symbol predictable. Once the source has made his choice there is no longer freedom of choice, no uncertainty or randomness (thus, no entropy), and it is a short leap to the assumption that X bits of information have been transmitted (or received). That assumption, however, is the trigger to a painful and frustrating semantic trap. Having made that assumption, one be gins to talk (and think) about information transmitted, and slowly, but almost inevitably, he loses the ability to discriminate between information and "information." A useful memory device is to think of the information of Claude Shannon as coming in units referred to as "bits." The word "bit" as in a "bit of information" is not to be confused with the popular use of the term. "Bit" in information theory is a combination of two shortened words, binary digit, and has a precise mathematical value; whereas, ordinary information comes in "pieces" which are vague undefinable chunks. To demonstrate how important this difference is, one may consider a source who is going to draw a card from a shuffled deck of playing cards. The deck of playing cards represents a set of fiftytwo equally probable, discrete symbols. How many pieces of "information" would an astute observer obtain by watching the drawing of one card? It might be said that he obtains one piecethe card which was drawnor fiftytwo pieces, because he knows not only the one card that was drawn but the fiftyone cards that were not drawn. He might also ob serve that the card just drawn is a "higher card" than one drawn previously, that our source is the "winner," that $10.00 will now change hands, and so on. For now, the important point is that information theory does not concern itself with this kind of information. It is more concerned with "bits." 13 Wiener, p. 17. Now, how many bits of information are there in this situation? A bit is de fined as that amount of information required to halve the alternatives; i.e., the number of bits of information in a set of equally probable alternatives is equal to the number of times the set must be divided in half to leave only one alternative. If there were two choices the set could be halved once; thus, there would be one bit of information. If there were four choices the set would have to be cut in two twice to reduce the alternatives to one; that is, two bits of information are required if the source has four equally probable choices. In the deck of cards, fifty two equally probable choices exist, and it becomes difficult to obtain the number of bits by the halving procedure. There is, however, a simplified procedure for obtaining the answer desired. We find that two to the first power is two, and that two squared is four. We know (through our definition above) that a twochoice situation contains one bit of information, and a fourchoice situation contains two bits, so we could infer that the power of two required to produce the number of alternatives is the number of bits of information in the system. As it works out, the power function is an inverse logarithmic function. That is, the logarithm to the base two (log.) of two is one, and log. 4 is 2. So, we find that for a set of equally probable alternatives the log2 of the number of alternatives is the number of bits of information required to predict the one that will be chosen. Since tables are available for this logarithm, one can easily find that log. 52 is 5.70044, and that approximately 5.7 bits of information are required to predict the card that will be drawn from a shuffled deck of playing cards. To put this another way, one would have to ask, on the average, 5.7 questions to be answered "yes" or "no" by some allknowing power before he could predict with complete certainty the next card to be drawn; and each question asked would be designed to halve the alternatives. Thus, he might first ask: "Is it red?" The answer will convey to him one bit of information. If the answer is "yes," he obviously knows the card is either a heart or a diamond. If the answer is "no," he knows it is a club or a spade. For purpose of the example, let us suppose that the answer is "yes." He then would ask "Is it a heart?" the reply is "no"; therefore, two bits of information have been conveyedand he now knows the card is a diamond. He knows there are thirteen diamonds in a deck and now has the task of finding, with as few questions as possible, which diamond it is. The third question would be "Is the card lower in rank than number eight?" The answer is "no" (third bit). He knows the card is between eight and the ace. Again he halves these alternatives by the question "Is it lower in rank than Jack?" The answer "no" provides him with his fourth bit of information. The fifth question would be "Is it lower in rank than King?" and the reply is "yes" (fifth bit). He now knows the card is either the Jack or the Queen of Diamonds. By now asking "Is it the Jack of Diamonds?" he can be certain by the "yes" or "no" reply (sixth bit) what it really is. If the answer is "no," then he knows it is the Queen of Diamonds. If the answer is "yes," then he obviously knows it is the Jack of Diamonds. Thus, six bits of information have carried him to the solution. With favorable answers he can get it in fivewhich reduces the average to 5.7. It can be seen, then, that bits of information are not the same as pieces, and it can be observed that the bit is an arbitrarily defined unit which serves to quantify the uncertainty in predicting, or the information needed to predict, the next symbol to be drawn from a set of symbols. So far, we have considered only that situation in which the alternatives are equally probable (i.e., the maximum entropy situation). We should emphasize that the 5.7 bits mentioned above in relation to the deck of cards is only accurate if the fiftytwo alternatives are equally probable as in a shuffled deck of cards. We have observed that the number of bits of information is reduced as the number of alternatives is reduced (from fiftytwo to four to two). Now, let us observe what happens when the probabilities are shifted. To simplify the computations in the coming discussion, let us assume that we are interested only in the four suits in our deck of cards. That is, we now de fine our situation as having four alternatives each with a probability of 0.25. We could say from the previous discussion that in this situation, with four equally probable choices, two bits of information are required to predict the suit that will be drawn. To simplify further our computations, let us make one other observation. Four (the number of alternatives) times .25 (the probability of each alternative) times the logo of .25 equals two. That is, P1 log2 Pl plus P2 log2 P2 plus P3 logo P3 plus P4 log2 P4 equals 2, or, to state the general case pi log2 p i equals the number of bits of entropy in a given set of symbols. (Actually, the logarithm of any number less than one is a negative number so we need a minus sign somewhere in our formula to make the result positive.) The formula for information, entropy, uncertainty, can now be stated (letting H stand for information and i for any alternative) : H = Spi log2 p i . With this formula and a table of values for P logo P we can now compute the entropy in any set of alternatives with any set of probabilities." Suppose we now take our deck of cards and withdraw four spades and two clubs and add six hearts. Again we are concerned with predicting the suit that will be exposed on the next draw. The probability that the next card will be a heart is approximately 0.37, a diamond .25, a club .21, and a spade, 0.17. If we insert these values in the formula given above we find: .37 log2 .37 = .5307 .25 log2 .25 = .5000 21 log2 .21 = .4728 .17 log., .17 = .4346 Total 1.9381 14. Edwin B. Newman, "Computational Methods Useful in Analyzing a Series of Binary Data," American Journal of Psychology, LXIV (April 1951), 252262. Thus, the fourchoice situation with the probabilities .37, .25, .21, and .17 contains approximately 1.94 bits of information, entropy, uncertainty. We could also show by working some more examples that the more disparate the probabilities become the less entropy there is in the system, the less uncertainty there is in predicting the next symbol. If, for example, the probabilities were .75, .10, .08, and .07, only 1.2 bits of information are required to reduce the uncertainty to zero. The next thing we might want to know is, "How does this absolute value computed for a specific set of alternatives with a specific set of probabilitiescompare with the maximum value that could obtain if the set of alternatives were equally probable?" We know that the maximum value for a four. choice set is 2 bits, and we have just computed an absolute value of 1.94 bits. If we divide 1.94 by 2.00 we will obtain .97. This value we call the relative entropy or relative uncertainty, and we can say that a fourchoice situation which yields an absolute uncertainty value of 1.94 bits is .97 (97 percent as uncertain as it might be (i.e., as it would be if the alternatives were equally probable). With the value we have called relative entropy we can compare systems with different numbers of alternatives and with different probabilities, but it should always be kept in mind that the relative entropy of a system of symbols (or the relative uncertainty of a source who is selecting symbols from the set) defines a relationship between the absolute and maximum entropy of a given set. Thus, a set of sixtyfour alternatives and a set of eight alternatives could both have a relative entropy of .50, while the absolute entropy of the first set is twice that of the second. One other concept from information theory can now be meaningfully de fined, and that is relative redundancy. Relative redundancy is, simply, one (1) minus the relative entropy. Thus, in the example above where the absolute entropy was 1.94 bits, the maximum 2 bits, and the relative entropy .97, the relative redundancy would be .03. The redundancy figure represents the degree to which the next symbol in a sequence is determined, or the degree of certainty we might have about what the next card is going to be. To summarize: given a source who is successively selecting discrete symbols from a set of symbols; if the symbols in the set have equal probabilities of being chosen next that source has maximum uncertainty, or there is maximum entropy in the situation. Maximum entropy is defined for a given set of symbols as the logarithm to the base two of the number of alternatives (log2 n). If there is some dispersion in the probabilities of the alternatives, the absolute entropy is computed by the formula >.:pi log., p i . In the equally probable set both formulas produce the same result. Relative entropy is obtained by dividing the absolute entropy by the maximum entropy and has a value of one (1) when the alter natives are equally probable and a value of zero (0) when any alternative has a value of unity. One minus the relative entropy is the relative redundancy, which is an index of the predictability of any given symbol drawn from the set. When the set of symbols with which one is dealing is the alphabet or vocabulary of a natural language, and the task of the source is the composition of a message in that natural language, it is readily observable that the probabilities of the alternative fluctuate as the source proceeds with his sequential selection. The fact that choices made in sequence may not be independent requires that for the computation of the uncertainty involved in any particular choice the probabilities of the various alternatives must be determined taking into account all that has gone before. For example, it has been estimated that the average redundancy of English is approximately 50 percent. However, given the letter "q" the probability of the next letter being "u" is 1.00; therefore, the redundancy of that particular choice is 1.00. This situation, in which the probabilities involved in a given choice are dependent on the previous choices, is called a Marko§ process. It is a term frequently encountered in the study of information theory. In addition to the problems associated with the coding of the message, the information theory engineer has the problem of determining the channel capacity. He looks on information theory as a tool for dealing with the efficiency of coding and code transmission. Let us consider a source capable of transmitting n symbols per second; and let us assume that this source is selecting his symbols from a set of k equally probable symbols. We can characterize this source as having a capacity for handling (log2 k) n bits of information per second. We can perhaps simplify the definition somewhat by recognizing the fact that log., k represents the maximum entropy case for a given set of symbols; that is, each symbol in this set represents m bits of uncertainty, and C (the capacity of a channel for handling bits of in formation) is equal tom times n; (C = mn). Let us now suppose that n is a constant; if the statistical properties of the symbol set are such that the average relative entropy of the set is less than one (1), the peak of operation level of our source in bits per second handled is necessarily less than mn or rmn (where r stands for the average relative entropy of the symbol set). We can say that the maximum rate at which this source, with this symbol set, can handle information is rmn bits per second, and we redefine C as rmn. Perhaps this would all be more meaningful with a real live example. Sup pose we have a typist that is capable of typing 300 symbols per minute or 5 symbols per second, and let us further suppose that this typist is composing a message in English using the 26letter alphabet. If those letters were equally probable there would be 4.7 bits of information associated with the selection of each symbol (1 less than in the deck of cards since 26 is exactly half of 52), and at a rate of 5 symbols per second, we would say that this typist can handle 23.5 bits per second, or 1,410 bits per minute. We would say that 23.5 bits per second is the capacity C of this typist. However, if we recognize that the letters of the English alphabet are not equally probable in "real" English, and if our typist is limited to 5 symbols per second, the performance level measured in bits per second would necessarily be less than 23.5. If we accept Shannon's suggestion that the relative entropy of English is about .50 we could say that the average uncertainty associated with the selection of each symbol is only 2.35 bits, and with this symbol set, our 5symbolpersecond typist could handle no more than 11.75 bits per second (i.e., if m = 4.7, n = 5, and r = .5, then C = rmn = 11.75). Thus, given a source or channel that can handle n symbols per second: if the symbols are equally probable maximum entropy prevails, and the source can handle m bits of information per symbol or mn bits per second. We find, how ever, that we can remove the assumption of equal probability of symbols by multiplying by the relative entropy (r) of the symbol set and define the capacity, C, for the general case as rmn. C now accounts for the symbol transmission capacity, the size of the symbol set, and considers the distribution of the probabilities of the symbols. The above refers to the formula for the capacity of the source or channel for handling "bits" of information. We have assumed in this example that the symbol transmission rate n is a constant, but there is reason to believe that the capacity C is more stable in human communicatorsthat n actually de creases as rm increases. One other concept that should be treated is the concept of noise. There is no manmade or natural communication system which does not have in it the potentialities for error. The electronic signal, the written word, or the spoken word all admit the possibility of foreign elements which will get in the way of the in ended meaninga cough, an illegible handwriting, random fluctuations or perturbations in the mechanical signal. These interferences are referred to as "noise." Noise in its simplest form is the addition or omission of a symbol in the communication chain which results in a discrepancy between the message trans mitted and the message received (or in more human terms, the message intended and the message perceived). The fact that communication is carried on through channels in which noise is possible makes redundancy useful. In the definition of a channelcapacity redundancy appeared to be of negative value, acting as a limiting factor on the channel efficiency. However, if we consider a source or channel which is subject to error it is quite evident how redundancy can be beneficial. For example, if we have a source that tends to omit every tenth symbol, but has an average relative redundancy equal to or greater than 10 percent, the receiver can replace the missing symbols with a rather high probability of success. The exact amount of redundancy necessary to fully compensate for a given amount and kind of noise must be determined for each new channel, source, and purpose, but perhaps the general relation which we have suggested is sufficient for this introductory paper. By way of conclusion, let us at this point review some of the major concepts discussed and speculate about their application in the social sciences. "Cybernetics" as a word can be compared to the term "behaviorism." It is a method of approach rather than a subject matter treated by the method. The method encompasses the fields of language, logic, mathematics, biology, psychology, physiology, anthropology, and sociology. Physiology, psychology, and the biological sciences have taken notice of cybernetics to investigate human behavior and general physiological function from the machine point of view. They have been able to do this because of a new "theory of information" which is based on a precise mathematical concept. This new theory of information is more concerned with the technical problem of transmitting signals accurately than the semantic problem of the precision with which transmitted signals convey a desired meaning or the effectiveness with which the received meaning affects the conduct of the recipient. Its appeal is due, in part, to the fact that variables can now be quantified which up to this time had defied quantification. Therefore, to some degree it can be stated that a scientific model of communication is replacing "intuitive" or "clinical" models of communication, and predictions of human behavior are being based on "sound thinking" rather than "mere intuition." The needs and complexities of today's societies emphasize more and more man's dependency on information. "To live effectively," as Norbert Wiener has put it, "is to live with adequate information." Information theory is an attempt to help man understand and provide "adequate information." The theory is so general that it is applicable to any type of communicationwhether written letters or words, spoken words, musical notes, symphonic music, pictures, or electrical impulses. It is an imaginative theory that attempts to get at the real inner core of communication problems"with those basic relationships which hold in general, no matter what special form the actual case may take." 15 It is a theory which bases its design on the statistical character of the source, and its aim is to compromise both rationally and profitably the excessive redundancy or bulk on the one hand and excessive sensitivity to noise on the other. In view of this theory we should note that in the sciences, highly accurate quantitative information often brings to light important qualitative information. For example, it was the highly accurate measurements of the masses of isotopes which brought to light the mass defects, which in turn provided the starting point for the study of nuclear energies.' 6 It is an accepted fact that the theory contributes to the study of cryptography and the problems of translation from one language to another. Similarly, this theory contributes to a better understanding of computer design. Warren Weaver suggests that "this analysis has so penetratingly cleared the air that one is now, perhaps for the first time, ready for a real theory of meaning." 17. 15. Shannon and Weaver, pp. 114115. 16. Stanford Goldman, Information Theory (New York, 1953), p. 64. 17. Shannon and Weaver, pp. 116117. Can we further generalize with the concepts herein discussed? Let us take an individual who is concerned with predicting an eventany event. Isn't his ability to predict dependent on the number of possible events and his knowledge of their probabilities? And, does it not seem reasonable to say that his uncertainty is greatest when the possible events are equally probable, and least when one event has a probability of one? What about an individual who is forced to make a series of predictions, all of them under maximum uncertainty conditionswould this lead to a state of anxiety or depression? Is there an optimum level of "redundancy" under which an individual or a society functions most efficiently? Can information theory help to explain human behavior? We think the answer to all of these questions is a qualified "yes." But, the important thing as far as rhetoric or human communication is concerned is that information theory provides a basis for a comprehensive theory of organization. That theory is not yet fully elaborated, but the lines of development seem clear. Let us assume that the purpose (or function) of the "patterns of organization" consistently recommended by rhetoricians to beginning speakers is to in crease the predictability (redundancy) of the message. That is, a "wellorganized" message is one that, when transmitted at a normal rate, is redundant enough not to exceed the "channel capacity" of the receiver. In this sense, good spelling or articulation, acceptable grammar, and a "logical order" of ideas are all part of the same systemall indicate compliance on the part of the source with a set of restrictions which are imposed on the source. Thus, any set of restrictions on the source will serve to reduce the "freedom of choice" of the source, but only those restrictions which are familiar or can be explained to the receiver will reduce his "uncertainty." From this it follows that a message that is "organized" for one person may not be "organized" for a second person. There probably are, however, some patterns of restrictions that are more likely to produce organization for whatever audience may receive the message. Following the thread, it seems reasonable to say that titles, introductions, orienting material, subject sentences, as well as patterns of main heads serve both to restrict the production of the source and to make the audience aware of existing restrictions, either of which would tend to increase the ability of the audience to predict what the speaker is going to say nextto reduce the receiver's uncertainty about what to expect. From this rudimentary extension of information theory it should be clear that the patterns of organization we teach may not work outside the classroom; they may not be necessary "among friends"; and there may be other ways of accomplishing the same ends that we have not considered. It is our hope that through this paper enough recruits can be enlisted to at tack the problems suggested with some probability of success. Also in Part 2:

Top of Page  Index  Home 