How We Talk

The Inner Workings of Conversation


By N. J. Enfield

An expert guide to how conversation works, from how we know when to speak to why huh is a universal word

We all had teachers who scolded us over the use of um, uh-huh, oh, like, and mm-hmm. But as linguist N. J. Enfield reveals in How We Talk, these “bad words” are fundamental to language.Whether we are speaking with the clerk at the store, our boss, or our spouse, language is dependent on things as commonplace as a rising tone of voice, an apparently meaningless word, or a glance — signals so small that we hardly pay them any conscious attention. Nevertheless, they are the essence of how we speak. From the traffic signals of speech to the importance of um, How We Talk revolutionizes our understanding of conversation. In the process, Enfield reveals what makes language universally — and uniquely — human.





Here are some facts about how we talk:

• The average time that people take to respond to a question is about the same time that it takes to blink the eye: 200 milliseconds.

• A "no" answer to a question will come slower than a "yes" answer, no matter which language is spoken.

• There is a standard one-second time window for responding in conversation: It helps us gauge whether a response is fast, on time, late, or unlikely to arrive at all.

• Every 84 seconds in conversation, someone will say "Huh?," "Who?," or something similar to check on what someone just said.

• One out of every 60 words we say is "um" or "uh."

I want to argue that these facts, and others like them, take us to the core of what defines our species' unique capacity for language. This claim may seem surprising, given the more bookish concerns of mainstream research on language, such as the meanings of words and the rules of grammar. But if the fine timing of answering questions or the functions of "mm-hmm" and "Huh?" seem trivial, let me borrow from Charles Darwin's remarks on the habits of earthworms: "The subject may appear an insignificant one, but we shall see that it possesses some interest."1 Darwin is being coy. He knows the importance of his earthworm observations—worms are essential plowers of the land—and by the end of the book he does not hold back: "It may be doubted whether there are many other animals which have played so important a part in the history of the world, as have these lowly organized creatures."2 I feel this way about the "lowly organized" elements of language that are the topic of this book: the rules we follow when taking turns in conversation, the on-the-fly ways we deal with errors and misunderstandings, and the functions of little utterances such as "uh," "mm-hmm," and "Huh?"

Researchers in disciplines from philosophy to psychology to anthropology to linguistics have long aimed to uncover the properties of the human mind that make language possible. They have focused on trying to understand how language works: what it's like, how children learn it, how it is processed in the mind. But they have had surprisingly little to say about what language is like in the back-and-forth of everyday conversation. This makes little sense, given that conversation is where language lives and breathes. Conversation is the medium in which language is most often used. When children learn their native language, they learn it in conversation. When a language is passed down through generations, it is passed down by means of conversation. Written language is many a researcher's first point of reference, but it should not be: Most languages do not have a written form at all, and in any case, written forms—from blogs to street signs to instruction manuals—are ultimately derived from the spontaneous, self-organizing system of dialogue that we call conversation.

This means that our current scientific knowledge of language, with its emphasis on decontextualized words, phrases, and sentences, is badly out of kilter. I want to show you some of what has been overlooked or set aside in the mainstream science of language. I will argue that the inner workings of conversation have their rightful place at the center of the language sciences.

It may seem strange that linguistics—the line of research responsible for understanding language—is not the source of many of the findings I will describe in this book. In its long history, linguistics has produced extensive and reliable information for many—but not all—features of language. Linguists can say a lot about things we might observe in written documents and in monologues, such as the formal structure of sentences. But for other types of information—especially those features of language that are seen only in the wilds of interaction—surprisingly little reliable information is available in linguistic reference books.3

In my own research on the Lao language, I often go to my bookshelf and pick out the authoritative two-volume Lao-English Dictionary compiled by Allen D. Kerr and published in 1972. This wonderful book has more than 1,200 pages filled with detailed entries on Lao words, including many infrequent words with meanings like "exhalation," "necessities," "collapse," and "custard apple." But there is no entry for the word "Huh?," even though this is one of the most frequent words in spoken Lao (it occurs once every six minutes in Lao conversation). This is not the author's fault. Dictionaries and grammar books on most languages tend not to record the so-called imperfections of spoken language.4

When I look up a word in Kerr's dictionary, it takes me a few seconds. But as Kerr explains in his preface to the work, it took him twelve years—from 1960 to 1971—to write the book. To reduce the labor of secondary researchers like me to a few seconds each time we have a question about a Lao word, Kerr invested years of his life. Now if it happens that I reach for Kerr's dictionary but cannot find what I am after—in this case, how do speakers of Lao say "Huh?"—then I'm out of luck. I would have to go out and find a living speaker of Lao. This might be feasible if I happen to live in Fresno, California, or Sydney, Australia. I could visit a Lao restaurant and talk to the cook or the waiter. Otherwise, I'm going to have to go all the way to Laos to ask my question.

In linguistics, we rely heavily on the published findings of long-term fieldwork by predecessors. This is no different from the situation in research on any other biological phenomenon, for example the behavior of earthworms in Darwin's careful studies. Darwin not only gathered observations and reports of earthworms' behavior; he also carried out firsthand systematic experiments on his worms to find out things that could not have been known by simply looking at them. When he wondered whether worms had a sense of hearing, he did experiments on them: "They took not the least notice of the shrill notes from a metal whistle, which was repeatedly sounded near them; nor did they of the deepest and loudest tones of a bassoon."5 His experiments continued, and he found that his earthworms were highly sensitive to vibration: "When the pots containing two worms which had remained quite indifferent to the sound of the piano, were placed on this instrument, and the note C in the bass clef was struck, both instantly retreated into their burrows. After a time they emerged, and when G above the line in the treble clef was struck they again retreated."6

Darwin's systematic experiments embody the controlled hypothesis testing that is standard in behavioral science. A necessary prerequisite to devising these tests was careful and long-term observation of worms in their natural environment. Darwin's book is full of reports of what he and many others had observed of worms in their natural habitat. In this way, for every species, and for every kind of behavior, a period of close observation and description must come first. So it is with the human behavior known as language.

In some lines of linguistic research, researchers are lucky that others have already gone out and done the required years of work, writing dissertations on, or even devoting careers to, questions that others may later want answers to. When we want those answers, we go to the library. But anyone hoping to find reliable data on aspects of human conversation in the library will encounter two major problems.

The first problem is that many descriptions of languages lack any information about things like turn taking, "repair,"7 and timing in conversation. These aspects of language are often regarded as incidental to the core concerns of linguistics. The seemingly messy back-and-forth of conversation is thought to show only imperfections or perturbations of language, without intrinsic structure or merit. Here is a famous 1965 passage by Noam Chomsky: "Linguistic theory is concerned primarily with an ideal speaker-listener, in a completely homogeneous speech-community, who knows its language perfectly and is unaffected by such grammatically irrelevant conditions as memory limitations, distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of the language in actual performance."8 This proclamation effectively ruled out the study of topics such as conversational repair in linguistics for decades, with the result that even the most accomplished linguists have little to say about how language is used in its natural habitat.

A second problem is that when information about features of conversation is actually offered by linguists, the information is notoriously unreliable. This is because linguistic researchers do not often base their research on systematic observation of firsthand recordings of free-flowing conversation. It is difficult to collect conversational data, and even when one has such data, they are difficult to study. Moreover, people often have poor intuitions about what actually happens in language. People's beliefs about language are tainted by values instilled from formal education and by social stereotypes about what is good language and what is bad. A language teacher might say that "Huh?" is not used, or should not be used, but rather one should say "Pardon?" or "Excuse me?" But these are prescriptive statements about English, not about what actually happens in language. They are about what somebody thinks should or should not happen in language. When we obtain a firsthand recording of informal conversation in the language, we hear these things within a few minutes.

The upshot is that if you want to work on aspects of language that are informal or conversational, you can't rely on dictionaries and grammar books for the data you need. To find out how people really talk, a researcher needs a special kind of direct access to language in its wild environment. The findings and insights that we will discuss in this book are possible because of the use of sound and video recordings of social interaction in everyday life. With these recordings, we can slow conversation down, look at it repeatedly, and catch every otherwise fleeting detail. Only then do we notice the defining components of language in the wild. This is one reason why "mm-hmm" and "Huh?" have not been widely studied. Another reason is that they are not the kinds of words that many linguists have recorded and studied in detail. These words tend not to occur in more formal registers of language. They rarely occur in writing. And they tend not to be taken seriously by both scholars and native speakers. Like slang words, they are often not even considered to be real language.

We need to study conversation seriously in the science of language.

An individual's ability to learn and process language is an unbeatable skill in the animal world, but it is the teamwork of dialogue that reveals the true genius of language. Even the simplest conversation is a collaborative and precision-timed achievement by the people involved. As we shall see in this book, when two people talk, they each become an interlocking piece in a single structure, driven by something that I will call the conversation machine.9

The conversation machine consists of a set of powerful social and interpretive abilities of individuals in tandem with a set of features of communicative situations—such as the unstoppable passage of time—that puts constraints on how we talk. We will look closely at how people talk, and we will see the conversation machine in operation.

Most researchers who have studied conversation have done so from outside of linguistics,10 yet their findings suggest good answers to the deepest question that linguists have asked: What is it that humans have, and that animals lack, that explains why only our species has language? The conversation machine provides an answer to this question. The research findings reviewed in this book show that the concept of a human conversation machine defines a universal core for language, cutting across the great variety of structure in languages worldwide.

People often say that styles of social interaction differ greatly around the world, so the claim that a universal core of language is seen in conversation might seem unlikely. But I will argue that reports of radical cultural differences in how people talk have been exaggerated, at least with respect to the essential workings of conversation. We shall see that while cultural differences in conversational style can seem striking from our subjective point of view, objectively they are minimal. Differences in, say, the way in which conversational turn-taking is organized across languages are trivial in comparison to the radical differences between languages in formal structure, at every level from sound to vocabulary to grammar.

Chomsky suggested that if a Martian scientist were to study human communication, this observer would conclude that "Earthlings speak a single language."11 I think this is the correct conclusion, but for reasons completely different than Chomsky's. His idea was that the Martian would detect underlying commonality in the structure of grammatical phrases, despite the fact that the world's languages organize their grammatical structures in a bewildering variety of ways. But it seems obvious that this bewildering variety would be more striking to our Martian observer: Languages sound (and look, in the case of sign languages) very different from place to place, with more than 6,000 distinct tongues spoken around the world. An abstract deep structure of grammar is not where our Martian would likely detect a single "Earthling language." Instead, the Martian scientist would readily observe that from Cape Horn to Siberia, from Tasmania to Tierra del Fuego, language is strikingly similar in the back-and-forth of conversation.

Our Martian would see the hallmarks of conversation in the same form everywhere: a rapid system of turn-taking in which, mostly, one person is talking at a time; an exquisite sensitivity to the passage of time in dialogue, with a universal one-second window defining subtle distinctions between being early, on time, or late to respond; and a heavy reliance on small utterances such as "mm-hmm," "um," and "Huh?" to orchestrate the proceedings. And while our Martian scientist would see these features in all human conversation, were this observer to look for these features in communication among other species, it would not find them.

Happily, we do not need to imagine what an interplanetary observer would see when looking at language in the wild. A growing number of Earthlings are studying the conversation machine in action. We now know a lot about the fine timing of behavior in conversation and about the meanings and functions of many informal words that are crucial if conversation is going to stay on course. And our knowledge of what makes conversation universal goes deeper than these surface observations, into the shared cognition that people bring to social interaction.

Language would not be what it is without our species' highly cooperative and morally grounded ways of thinking. For the conversation machine to operate, humans apply high-level interpersonal cognition: We infer others' intentions beyond the explicit meanings of their words (in ways animals can't manage), we monitor others' personal and moral commitment to the interaction and if necessary hold them to account for that commitment, and we cooperate with others by opting for the most efficient, and usually most helpful, kinds of responses. We help each other, where necessary and possible, to stay on track in conversation. This requires not only a good deal of attention and effort; it also requires social cognitive skills that are unique to our species.

The cognition that people need for language must of course be found in the head, and in that sense, cognition for language is located in individuals. But much research on how the mind works has shown that cognition is radically distributed.12 Much of our thinking and reasoning is not done solely between our ears. When we use our brains, we often hook them up to external systems. These may be physical objects, such as pencil and paper or smartphones, that supersize our capacities for memory and reasoning. In conversation, the external systems to which we hook ourselves up are the bodies and minds of other people.

The cognition needed for language is especially attuned to what others think, feel, and mean, and it is oriented to what the members of the social unit (the "us" currently having a conversation) are collectively doing or at least trying to do. Cognition for language is intrinsically dialogic. This point is crucial to understanding the idea of a conversation machine at the heart of language. When we talk, we do not drive the conversation machine. The conversation machine drives us.

In the chapters that follow, we will learn what humans have that makes us able to carry out the remarkable feats of everyday dialogue. We will find out what the conversation machine is, and what it does. A good place to begin is with the idea that conversation has rules, of a kind that demands a unique brand of morally grounded social cognition.



At school we learn that language has rules. There are subjects and objects, conjugations and declensions, phrases and sentences. We know thousands of words, but alone they are not enough: We have rules for taking those words and combining them into sentences. These are the rules we refer to as grammar. Most people are not able to state many of these rules explicitly, yet people everywhere subconsciously follow the rules closely when speaking, making only occasional errors.

Besides grammar, there is another dimension to the rules that guide language: the norms of conversation. So, when someone asks a question, you should answer it. If you can't answer it, you should still respond (e.g., give a reason why you can't answer). If a third person had been asked the question, you shouldn't answer for them.

We think of these not so much as rules but as simple good manners. Yet they are more than this. These are not rules for how a person should act. They are rules for how a team player should act. The rules make sense if you think of their function as regulating the flow of conversation as a kind of group activity. In conversation, everyone involved has a set of implicit rights and duties in the interaction. This is because conversation is inherently cooperative. It is a form of joint action.

Humans' capacity for cooperative joint action is one of the defining capacities of our species' form of social life. When we cooperate, we enter into a (usually unspoken) pact to join forces toward a common goal. Through this pact, we become morally accountable to that commitment and to seeing that commitment through. Joint action is not just a way of behaving; it implies a special way of thinking. The philosopher John Searle1 imagines a scene in which a number of people are running from different directions to take cover under a shelter in the middle of a park. He suggests two scenarios in which this might happen. In the first scenario, it has just started to rain. The people are unrelated individuals running to take shelter. Each of them is motivated to run to the same place for the same reason, and while their behavior appears coordinated, they are in fact behaving independently. They are not acting as a group. In the second scenario, the people are members of an outdoor ballet troupe, and they are engaged in a public performance that calls for them to converge on this same spot at a chosen moment. The key difference between the two scenarios has to do with what the individuals think they are doing. In the first case, the appropriate thought is that "I" am running to the shelter (and, incidentally, others happen to also be doing so). In the second case, "we" are doing it. This distinction might seem academic, but it has an important consequence: It introduces a moral commitment among the people involved.

The philosopher Margaret Gilbert explored this moral consequence of joint action in one of the simplest examples she could think of: going for a walk together.2 Her interest was in social phenomena in general, and she considered the scenario of going for a walk together to be paradigmatic of all social phenomena. Again, we can contrast two situations that look similar on the surface: Two people are walking along side by side. In one case, the two people just happen to be going in the same direction at the same time, as on any busy city street. In the other case, the two people have agreed to go together for a walk.

Gilbert points out an important difference between these two scenarios. Suppose that one of the two walkers speeds up a bit and draws ahead of the other. In the first scenario, this might not be noticed at all. But in the second scenario, the person who is walking ahead might be in for what Gilbert calls a mild rebuke: "You are going to have to slow down, I can't keep up with you!"

By definition, joint action introduces rights and duties.3 As Gilbert says of the two people on a walk together, "each has a right to the other's attention and corrective action."4 Each person has a moral duty to ensure that they are doing their part appropriately so that if, for example, one person draws ahead, the other may hold them to account. The duty to stay involved in joint action and the corresponding right to rebuke those who do not stay involved underlie the ground rules of language. Let us look at some examples.

Questions are a universal feature of human language. The specific grammatical rules for how questions are formed—both of the yes/no type and the who/where/what type—can vary widely from language to language. Here I am focusing not on how questions are grammatically constructed but on the ways in which questions function in social interaction. As with any joint action, questions create commitments and associated moral duties.

Suppose that I say to you "What time is it?" You are suddenly saddled with moral obligations. The first is that you can't just stay silent. Whether or not you know the answer, you should respond. Or at least, if you do not respond, you will do so with the awareness that I have a right to rebuke you, at least mildly.

In an example from a recorded telephone call, a grandmother concerned for her granddaughter's health is urging the granddaughter to go and see a doctor. The grandmother says, "I can't stand idly by and see you destroy yourself." Here is an extract from the rest of the conversation (GM = grandmother, GD = granddaughter):5

1. GM: Now will you do that for me?

2.      (silence for 2.5 seconds)

3. GM: Honey?

4. GD: What.

5. GM: Will you do that?

6. GD: Well—Grandma it's gonna be so expensive to go talk to some dumb doctor.

After the grandmother asks a direct question, there is no response. But the granddaughter has a duty to respond, so the grandmother is entitled to pursue a response, which she does, until the granddaughter has fulfilled her duty. The grandmother does not explicitly rebuke her granddaughter for failing to respond, but her pursuit can be interpreted as one. When the granddaughter does respond, in line 6, she does not directly answer the question, but she does fulfill her duty as a questionee. This is similar to what happens when we are asked the time: We are not obliged to know the time, but we should state it if we do know, and if we do not know, then we should say so.

In another example,6 Person A is within their rights to follow up on their question not once but twice before getting the answer that they required from Person B:

1. A: Is there something bothering you or not?

2.      (1 second silence)

3. A: Yes or no.

4.      (1.5 seconds silence)

5. A: Eh?

6. B: No.


