ISSA Proceedings 2002 – Reasoning In Listening

1. Introduction
Our main thesis is that reasoning plays a different role in understanding oral discourse than it does in understanding written discourse [i]. In particular, this seems to be the case for listening to lectures, speeches, and other forms of monologue, as opposed to reading comparably long texts. The reason for this difference, as we shall see, is that listening takes place in “real time,” in the sense that one is not free to look ahead or back as one is in reading (We shall not deal explicitly with dialogue, which is the other main form of oral discourse, except to note here that it has a written counterpart, viz., the internet medium of “Instant Messenger” (IM), which is a kind of hybrid, in that, while it takes place in real time, it does permit the user to look backwards, though not forwards).
If listening does make different demands on reasoning than reading does, this may account for some of the differences between oral and literate cultures. It is sometimes assumed that oral cultures are generally less sophisticated than literate ones, but this assumption can hardly survive exposure to history. Havelock, writing about Greece in the time of Homer, offers an admittedly speculative corrective to such a view:
We can hazard the guess, in short, that that specific and unique Hellenic intelligence, the source or cause of which has baffled all historians, received its original nurture in communities in which the oral technique of preserved communication threw power and so prestige into the hands of the orally more gifted. It made the competition for power, endemic among all human beings, identifiable with the competition for intelligence. The total nonliteracy of Homeric Greece, so far from being a drawback, was the necessary medium in which the Greek genius could be nursed to its maturity. (Havelock, 1963, 127)

The classical civilizations retained an oral character long after the development of literacy. A modern listener would find it difficult to follow the oratory of Cicero, with its long sentences, or periods, characterized by subordinate clauses, often nested within one another. In The Art of Memory, Frances Yates describes the elaborate methods employed by ancient orators to commit their speeches to memory. But it is unlikely, to say the least, that the short-term memory of ancient listeners was more capacious than our own. Cognitive scientists have found severe and apparently universal limits on short-term memory. Consequently, if ancient listeners were more proficient at processing complex oral communications, it is probably because they employed different strategies than we are accustomed to. When Mark Twain made fun of Germans waiting with rapt attention for the verb at the end of a sentence, he was, of course, exaggerating for comic effect. But apprehending the ornate periods of a Cicero in real time must have involved the sort of suspense Twain describes.

Literacy expanded in stages. Readers of the Confessions will recall the surprise of Augustine and his companions at St Ambrose’s uncanny ability to read silently. It seems likely that the advance of literacy resulted in a certain atrophy of the strong listening ability manifested by ancient audiences. By way of compensation, it permitted a deeper level of understanding than listening made possible. Avicenna reported having read Aristotle forty times before he began to understand him, and then only with the aid of a book by Alfarabi. Snow, Burns, and Griffin (1998, 64) cite empirical studies indicating that reading comprehension exceeds listening comprehension for college-age students but not for younger students. They propose to demarcate the boundary between mature and immature listening “when the advantage of listening over written comprehension disappears, in seventh or eighth grade.” It is extremely hard to follow an intricate argument or proof presented orally without visual aids. We believe that this is chiefly owing to the difficulty of recalling individual propositions, let alone sentences from hearing them. One of our central claims is that a successful listener discards sentences and propositions once they have played their role in updating an internal model of the subject matter of a discourse

2. Basic Differences Between Listening and Reading
Listening and reading are of course both acts of decoding messages to extract their meaning. They thus involve many of the same underlying abilities. Consequently, many of the kinds of questions we would ask to determine if someone is a good listener or a good reader will be the same. For example, we would want to know whether the listener or reader grasped the discourse’s main point. In the case of discourse with intellectually demanding material, we would want to know whether the listener or reader was able to follow an extended argument.
But apart from the obvious difference in sense modality, the most fundamental difference between listening and reading would seem to be their relation to time. It is what chiefly accounts for the fact, widely acknowledged in the literature (e.g., Bostrom, 1984 or Richards, 1983), that they place very different demands upon memory. Listening takes place in ‘real time’ in the sense that the listener is not free to look ahead or back the way a reader is. It is a commonplace that reading is not always a very linear task. Listening also places a higher premium on the ability to anticipate. Both reading and listening involve reasoning in that they involve the framing of hypotheses about the direction in which the discourse is headed, the construction of a kind of theory of the discourse. But since a reader can skip ahead to see if his or her hypothesis is correct, and can also look back to see both where a disconfirmed hypothesis went wrong and how to replace it with a better one, the reader’s stake in getting it right the first time is not as great as that of the listener, who runs the risk of getting hopelessly lost. Suppose that, in a book on vector spaces, we read the following:
Let W be a subspace of vector space V. We show that every basis for W is a subset of some basis for V. (Adapted from Geroch, 1985, 56.)
(Familiarity with vector spaces is not required, or perhaps even desirable, for understanding this example. You can treat the unfamiliar terms as dummy variables.)

We may immediately feel uneasy because of the potential quantifier ambiguity. Is the author offering to prove that there is some basis X for V such that every basis for W is a subset of X, or only that for every basis Y for W there is some basis or other for V of which Y is a subset? Assailed by this doubt we will probably scan the proof to see which of these propositions it establishes, if either. Then we will go back and read the proof from the beginning, secure in our knowledge of what it is about. Contrast this with the case in which the sentence appears in a lecture. We may be able to ask the speaker for clarification, but in some formal lectures this is not permitted. Then our best strategy may be to hold both meanings in suspension until the ambiguity is resolved. The greater demands placed by listening not only upon memory but on active hypothesis formation account for some of the typical differences between written and spoken discourse, for example, between a paper read aloud and a good lecture. The lecture has to incorporate both redundancy and explicit signposts of the direction in which the talk is going.

It is no doubt in large part because of the above differences that oral language itself is different from written language. For one thing, it is usually syntactically simpler. In particular, it has often been pointed out (for example, by Richards, 1983) that the basic unit of oral speech is the clause, rather than the sentence, with the listener being left to infer connections that would be made explicit in written prose. Colloquialisms are tolerated and such devices as contractions are actually preferred. Even incomplete sentences are common. Rubin and Rafoth (1984, 17, cited in Rhodes, Watson, and Barker, 1990, 72) go so far as to deny that the medium of delivery is what is essential: “oral language is not defined by the channel in which a message happens to be transmitted, but rather by specific syntactic and text-level features and by its power to evoke a sense of situation.” Written language, by contrast, is not designed to be processed aurally, as anyone can testify who has tried to follow a paper read aloud.

3. Mental Models
Cognitive psychologists have discovered that verbatim memory of a sentence typically persists only for a few seconds, in what is called short-term memory (Witkin, 1990, esp. 13-14). Once its meaning has been apprehended the sentence is discarded like a booster rocket. When you are asked about the contents of a passage or talk some time after reading or hearing it, your sentences rarely stand in a one-one correspondence to those of the original. Instead, you rely on an internal representation of the content. This representation is constructed incrementally as the discourse unfolds.
For consider that short-term memory has limited capacity. In a famous paper, Miller (1956) summarized a body of research that showed that it can hold only about seven items at a time, though this limitation can be overcome to some extent by “chunking,” that is, encoding several items into a single item that can later be decoded. Acronyms are simple examples of such chunking. The limitations of short-term memory also apply to mental activities that depend upon it, such as inference, which typically requires mentally juggling several items at once. Hence, another term for short-term memory is ‘working memory’, which is intended to suggest that it is the scratchpad, as it were, on which conscious work is carried out.
Because short-term or working memory holds what is needed for a current task, most people do not even think of it as memory. What most people call ‘memory’ is really long-term memory (Bostrom, 1990, 6). Information undergoes a transformation before being stored in long-term memory. Barring conscious memorization, which usually involves extensive repetition, we do not typically recall the exact words in which information comes to us. “Permanent, or long-term memory works with meaning, not with form. The propositional meaning of sentences is retained, not the actual words or grammatical devices that were used to express it” (Richards, 1983, 221). There is usually a time lag of 60 seconds or more between the presentation of a stimulus and the activation of long-term memory, which may depend on rehearsal in the meantime, and “entry into long-term memory may be dependent on both rehearsal and organizational schemes” (Bostrom, 1990, 6).

Yet the world itself is not a set of propositions; it can be more accurately regarded as a system of objects having various properties and standing in various relations to each other. For example, in the sentences ‘Venus is the second planet from the sun’, ‘Venus is approximately the same size as the earth’, and ‘Venus is covered with dense clouds’ the name ‘Venus’ occurs three times as subject, each time with a different predicate. In most theories, a similar subject-predicate structure can be defined for the corresponding propositions, even though these propositions are not themselves linguistic objects. But in the solar system Venus ‘occurs’ only once, replete with all its properties and its nexus of relations to other heavenly bodies. Johnson-Laird (1983) has argued that much of our knowledge is represented in the mind in a form that corresponds more closely to structures in the world itself than to the discursive propositions and sets of propositions we use to communicate that knowledge. He refers to such representations as “mental models”:
Unlike a propositional representation, a mental model does not have an arbitrarily chosen syntactic structure, but one that plays a direct representational role since it is analogous to the structure of the corresponding state of affairs in the world – as we perceive or conceive it. However, the analogical structure of mental models can vary considerably. Models of quantified assertions may introduce only a minimal degree of analogical structure, such as the use of separate elements to stand for individuals. Alternatively, models of spatial layouts such as a maze may be two- or three-dimensional; they may be dynamic and represent a sequence of events; they may take on an even higher number of dimensions in the case of certain gifted individuals. One advantage of their dimensional structure is that they can be constructed, and manipulated, in ways that can be controlled by dimensional variables. But a propositional representation, as Simon (1972) pointed out, can be scanned in only those directions that have been encoded in the representation. Simon also drew attention to the fact that people who know perfectly how to play noughts-and-crosses (tic-tac-toe) are unable to transfer their tactical skill to number scrabble, a game that is isomorphic to noughts-and-crosses. Just as they can scan an external noughts-and crosses array, so they can scan its internal representation, but that process is irrelevant to the game of number scrabble (156-157).

The main thrust of Johnson-Laird’s work concerns inference. He provides a substantial body of argument and empirical evidence that even simple syllogistic inferences proceed by manipulating mental models rather than by combining propositions by means of rules of inference, which is more abstract. Johnson-Laird is not committed to the claim that mental models constitute an irreducible level of representation. On the contrary, he acknowledges the possibility that, just as higher-level languages in a computer are ultimately realized as strings of 0’s and 1’s in machine code, so all mental representations, including mental models, may be realized in a mental ‘machine code’, which may, for all we know, consist of finite strings of symbols (155). His point is just that at some level we manipulate mental models and their contents as such.
Johnson-Laird gives an example from a Sherlock Holmes story (158-160) to show how a certain kind of question about a passage can be answered only by constructing a mental model, and not by employing a purely propositional representation of the content. In the story, Holmes and Watson break into the house of a blackmailer. Their progress through the house is described in some detail. The question is whether they proceeded from right to left or from left to right. The passage does not say explicitly. If one makes a mental model of the house as one reads the passage, especially with the question in mind, one can answer the question fairly easily, though it would take many steps to derive it logically.

It seems a promising hypothesis that both reading and listening comprehension rely on mental models. Evidence is provided by the fact that we do not typically recall the exact propositions making up a passage or talk, but we can often reconstruct the content. It is plausible that good readers and listeners are ones who constantly update their mental models of the content by integrating new propositional information into them. But listening would seem to be more dependent on such models than reading, because a listener, unlike a reader, cannot look back to recover the exact propositional content of the stimulus.

The listener also seems to have a greater need for coping strategies, for example, in cases of indeterminacy, where the discourse allows too many possible models, and of inconsistency, where there is no possible model. There are two approaches to indeterminacy. One is to represent an indeterminate object by a set of completely determinate ones; the other is to embrace partiality. The former approach is adopted by possible worlds semantics in its identification of propositions with sets of possible worlds, the latter by situation theory, which is basically a theory of partial worlds. Which of these strategies we actually employ on a given occasion is to some extent a topic for empirical research. But it seems unlikely that we are able to entertain complete mental models of any but the simplest states of affairs, so that much of the time our models must of necessity be incomplete. And it also seems unlikely on the face of it that we can entertain more than a couple of models, however simple, at the same time. Inconsistency comes in two strengths. The weaker occurs when we have opted for a particular model which is ruled out by the subsequent direction taken by the discourse. In that case, we may have to replace it with a model, if one is available, that is compatible with the new information (Follesdal discusses cases like this in connection with Husserl, saying that in this case what Husserl calls the “noema” explodes and is replaced with another one). The stronger kind of inconsistency occurs when the discourse is actually self-contradictory. In this case, of course, it has no model. But if the contradiction is not central we will not be prevented from forming a partial model of the discourse. For example, the author may carelessly attribute two different eye colors to a character without seriously impairing the integrity of the narration.

4. Pragmatics
The assumption that every sentence expresses a determinate proposition is of course an oversimplification (e.g., Perry, 1977). The same sentence can express different propositions in different contexts of use. This is because sentences often contain so-called ‘indexical’ elements, such as pronouns and tense. The study of such contextual aspects of language is called ‘pragmatics’. The scope of pragmatics is rather broad, since it encompasses all the facts surrounding an utterance, including the speaker and addressees. Pragmatics is generally taken to include speech act theory, which concerns itself with the so-called ‘illocutionary force’ of utterances, namely, the kinds of acts utterances are used to perform (Searle, 1969). Most or all languages grammatically mark the distinction between declaratives, interrogatives, and imperatives. But this distinction corresponds only somewhat loosely to illocutionary force. For example, the sentence ‘Could you pass the salt?’, which is grammatically a yes-or-no question, is more commonly used to make a request than to elicit information. Simply answering ‘Yes’ would be inappropriate. Moreover, the distinction between assertions, questions, and commands only scratches the surface. Stalnaker (1972, 178) gives an idea of the kinds of problems involved:
Assertions, commands, counterfactuals, claims, conjectures and refutations, requests, rebuttals, predictions, promises, pleas, speculations, explanations, insults, inferences, guesses, generalizations, answers, and lies are all kinds of linguistic acts. The problem of analysis in each case is to find necessary and sufficient conditions for the successful (or in some cases normal) performance of the act. The problem is a pragmatic one since these necessary and sufficient conditions will ordinarily involve the presence or absence of various properties of the context in which the act is performed, for example, the intentions of the speaker, the knowledge, beliefs, expectations, or interests of the speaker and his audience, other speech acts that have been performed in the same context, the time of utterance, the effects of the utterance, the truth-value of the proposition expressed, the semantic relations between the proposition expressed and some others involved in some way.

Pragmatic concerns loom much larger in standard or paradigmatic listening situations than they do in paradigmatic reading situations. This is true because in the standard listening situation, the speaker and audience are in the same place at the same time; the speaker can thus exploit this shared context in ways that a writer cannot. In understanding oral discourse then, the task for the listener is to use this shared context or (following Barwise and Perry, 1983) “discourse situation” to determine the types of speech acts and the interpretations of indexical elements of the discourse. For example, consider a speaker who in a talk uses the word ‘here’ in its nondemonstrative sense. It is commonly accepted that the meaning of this word, as for any indexical, consists of a rule that specifies the referent for any utterance of it: it refers to the place of its utterance (e.g., Kaplan, 1989; Plumer, 1993). The reasoning task for the listener is the simple one of using Universal Instatiation in applying the rule to determine the referent. In contrast, a writer typically refers to places by using proper names, descriptions, or some spatial coordinate system. And if, say, the author of a travelogue uses an indexical such as ‘here’, the referent will have been previously established by one or more of these means.

The logic of indexicals includes certain straightforward validities such as the sentence ‘I am here now’ (which may be regarded as analytically true): every possible utterance of the sentence is true, unlike for the sentence formed by replacing ‘I’ with any proper name or definite description (in which no indexical is used) (cf. Kaplan, 596; Plumer, 203). But the logic also includes some rather complex reference-fixing rules. Consider this proposed statement of the rule for ‘here’ in its demonstrative sense (which is more or less equivalent to ‘there’): an utterance of it refers by relating a place to the place that would have been referred to had ‘here’ in its nondemonstrative sense been uttered instead, where this relating is accomplished through an act of ostension or focusing of sensory attention carried out by the utterer (adapted from Plumer, 205). Nevertheless, there are some very unsophisticated or basic elements of this, viz., an act of ostension or focusing of sensory attention. To a large measure these define the meaning of any demonstrative, yet they are certainly something of which nonhuman animals are capable.
In a linguistic study it may be difficult, as Stalnaker puts it, “to find the necessary and sufficient conditions for the successful… performance of the act,” but this does not mean that for the user of the language the rules are difficult to assimilate or apply. For example, with respect to the speech act of promising, Searle argues that one of the necessary conditions for “sincerely and nondefectively” performing it is “It is not obvious to both S [the speaker] and H [the hearer] that S will do A [the action] in the normal course of events ” (1969, 57, 59). Given the right information, it may be easy to see whether this condition is instantiated in the particular case. Yet the information may unfold or be revealed in quite different ways in a listening as compared to a reading situation, as for example where S is a speaker giving a talk or a character in a novel, respectively. Typically, the latter is through description, but in a listening situation much information is implicit or inherent in the context as events occur, as in the case of a verbose speaker’s promise to finish on time.

5. Conclusions
Both reading and listening involve the construction of a theory or model of the underlying discourse. They are thus far from passive, but involve active reasoning, consisting notably in the forming and testing of hypotheses at every stage. But because listening takes place in real time, it places a greater premium on flexibility. Moreover, the reasoning that takes place in listening is likely to be more semantic in nature, consisting in the manipulation and updating of mental models, not in the combining of sentences or even propositions. Because even relatively formal listening is situated in a context, this context can typically be exploited to relieve some of the burden on mental representation. This, too, is a kind of reasoning, though apparently of a low level. It would thus appear that reasoning plays a greater role in basic listening comprehension than in basic reading comprehension. This is perhaps especially true of comprehension of discourse that is not itself a record of reasoning. But writing can record much more complex chains of reasoning than speech can, and the comprehension of such texts involves, as Brouwer pointed out, the recreation in the mind of the reader of the reasoning they record.

NOTES
[i] We are grateful to Lori Davis for help with this paper.

REFERENCES
Barwise, J. & Perry, J. (1983). Situations and attitudes. Cambridge, MA: MIT.
Bostrom, R.N. (1984). Conceptual approaches to measuring listening behavior. Typescript.
Bostrom, R.N. (1990). Listening behavior: Measurement and applications. New York: Guilford.
Geroch, R. (1985). Mathematical physics. Chicago: The University of Chicago Press.
Havelock, E.A. (1963) Preface to Plato. Cambridge, MA: Belknap.
Johnson-Laird, P.N. (1983). Mental Models. Cambridge, MA: Harvard.
Kaplan, D. (1989). Demonstratives: An essay on the semantics, logic, metaphysics, and epistemology of demonstratives and other indexicals. & Afterthoughts. In: J. Almog, J. Perry & H. Wettstein (Eds.), Themes from Kaplan (pp. 481-614). Oxford University Press.
Miller, G.A. (1956). The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological review, 63, 81-97.
Perry, J. (1977). Frege on demonstratives. Philosophical review, 86 (4), 474-497.
Plumer, G. (1993). A here-now theory of indexicality. Journal of philosophical research, 18, 193-211.
Richards, J.C. (1983). Listening comprehension: Approach, design, procedure. TESOL quarterly, 7, 219-40.
Rhodes, S.C., Watson, K.W., & Barker, L.L. (1990). Listening assessment: Trends and influencing factors in the 1980s. Journal of the international listening association, 62-82.
Rubin, D. & Rafoth, B. (1984). How to recognize oral prose with your eyes closed. Paper presented at the 29th annual convention of the International Reading Association, Atlanta. (Cited in Rhodes, Watson, & Barker.)
Searle, J. (1969). Speech acts: An essay in the philosophy of language. Cambridge: Cambridge University Press.
Simon, H.A. (1972). What is visual imagery? An information processing interpretation. In: L.W. Gregg (Ed.), Cognition in learning and memory. New York: Wiley.
Snow, C.E., Burns, M.S., & Griffin, P., Eds. (1998). Preventing reading difficulties in young children. Washington: National Academy Press.
Stalnaker, R.C. (1972). Pragmatics. In: A.P. Martinich (Ed.), The philosophy of language (2d ed.). Oxford: Oxford University Press, 1990.
Witkin, B.R. (1990). Listening theory and research: The state of the art. Journal of the international listening association, 4, 7-32.
Yates, Frances A. (1974). The art of memory. Chicago: The University of Chicago Press.