Embodied Social Cognition

logisticslilacAI and Robotics

Feb 23, 2014 (7 years and 5 months ago)


Social Cognition
Peter Timmerman
A thesis submitted in partial fulfillment of the requirements
for the Research Master Philosophy
Dr. Fred Keijzer
Prof. Dr. Jeanne Peijnenburg
Prof. Dr. Martin van Hees
Department of eoretical Philosophy
Faculty of Philosophy
University of Groningen
How do we understand each other?  In philosophy and developmental psychology
there are at present two dominant answers. One holds that we understand each
other by theorizing (theory theory), the other that we do so by simulating the
other’s mind (simulation theory). Embodied cognition, I argue in this thesis,
provides a promising alternative. Embodied cognition sees cognition first and
foremost as an ability that has evolved in order to enable an organism to cope better
with its environment, not as a capacity  to solve abstract problems. Similarly, it
suggests that social cognition is first and foremost an ability to interact 
rather than to reason 
Whereas theory theory and simulation theory both characterize social
cognition  as a highly intellectual and reflective enterprise, an embodied approach
points out that we usually understand others without any effort or thought.
Whereas the traditional accounts frame our understanding of others in terms of
explaining and predicting, an embodied approach suggests these activities do not
play a key role in our normal interactions with others. In interacting with others we
usually immediately 
  what the other is doing, intending or feeling, rather
than that we 
  mental states on the basis of observations. eory theory and
simulation theory both suppose that minds are hidden behind behaviour, but from
an embodied perspective this doesn’t seem right. 
Embodied cognition thus construes social cognition as a more basic,
sensorimotor and interactive faculty than the traditional approaches do. But it also
has implications for the way in which  we explain the advanced social cognitive
activities on which theory theory and simulation theory focus. Reasoning about
others is, just like reasoning in general, rooted in the tuning of basic responses to a
real world that enables an organism to sense, act, and survive. Relying on the basic
capacities that enable a perceptual understanding of others and our sensorimotor
engagement with the world, we gradually develop, in interaction with others and
cultural ‘tools’ like natural language, the ability to reason about others. Social
cognition is embodied, not only in its basic, but also in its more advanced forms.
Hoe begrijpen we elkaar? In de filosofie en de ontwikkelingspsychologie van sociale
cognitie zijn er op het moment twee dominante theorieën. Volgens de een, de
zogenaamde theorietheorie, begrijpen we elkaar aan de hand van een interne
theorie over menselijk gedrag; volgens de ander, de simulatietheorie, doen we dit
door de ander te simuleren in onszelf. In deze scriptie beargumenteer ik dat een
belichaamde of
benadering van cognitie een veelbelovend alternatief biedt
op deze twee theorieën. Belichaamde cognitie stelt dat cognitie in de eerste plaats is
geëvolueerd zodat organismen zich beter aanpassen aan hun omgeving, om een
lichaam effectief te laten bewegen en handelen, en niet om abstracte problemen op
te lossen. Op vergelijkbare wijze ziet het sociale cognitie als een vermogen dat er in
de eerste plaats is om
anderen om te kunnen gaan, in plaats van om

anderen na te denken.

Terwijl zowel theorietheorie als simulatietheorie sociale cognitie karakteriseren
als een zeer intellectuele en reflectieve bezigheid, benadrukt een belichaamde
benadering dat we elkaar normaal gesproken zonder enige moeite begrijpen. Terwijl
de traditionele benaderingen ons begrijpen van de ander beschrijven in termen van
het generen van verklaringen en voorspellingen van het gedrag van de ander, spelen
deze activiteiten volgens een belichaamde benadering geen centrale rol spelen in
onze normale interacties met anderen. In interactie met andere nemen we over het
algemeen direct waar wat de ander voelt, doet, of van plan is om te doen, in plaats
van dat we op de basis van gedragsobservatie mentale toestanden afleiden en
toeschrijven aan de ander.
Belichaamde cognitie stelt dat sociale cognitie een meer basaal, sensomotorisch
en interactief vermogen is dan theorietheorie en simulatietheorie hebben
verondersteld. Maar een belichaamde benadering hee ook implicaties voor de
wijze waarop we de geavanceerde sociaal cognitieve activiteiten verklaren waar deze
traditionele benaderingen zich op hebben gericht. Redeneren over anderen is, net
als redeneren in het algemeen, geworteld in sensomotorische en emotionele
mechanismen die een organisme in staat stellen om in de wereld waar te nemen, te
handelen, en te overleven. Voortbouwend op de basale basale capaciteiten die
interactie met anderen mogelijk maken ontwikkelen we, stap voor stap, in interactie
met anderen en culturele ‘tools’ zoals de natuurlijke taal, het vermogen om over
anderen na te denken. Sociale cognitie is niet alleen belichaamd in haar basale,
maar ook in haar meer geavanceerde vormen.
is thesis is about embodied cognition and about social cognition, and in
particular about social cognition from the perspective of embodied cognition. As I
shall explain in the introduction, embodied cognition is an approach to the study of
cognition that rejects several widely held assumptions and ideas about the mind and
proposes an alternative, and according to many
, perspective on cognition.
Embodied cognition caught my interest during my first course on philosophy of
mind, when Hans Dooremalen introduced me to Andy Clark’s view that the human
mind is not just inside the human skull but has the potential to extend into the
world, and became established in subsequent classes that I did with Fred Keijzer,
who supervises this thesis. e choice to investigate what embodied cognition
means for the study of social cognition has much to do with my interest in what
implications embodied cognition has for ethics and political philosophy, as
assumptions about our capacities to perceive, interact with and think about others
are prevalent in moral theories. e question of what embodied cognition means
for ethics and political philosophy is the topic of my PhD project.
is thesis has been written under somewhat unusual circumstances. My
interest in embodied cognition brought me to the Scottish city of Edinburgh, where
the earlier mentioned Andy Clark heads a master specialisation on this topic. When
I was about to leave to Edinburgh, I had in mind to postpone the second year of my
research master in Groningen with a year, and finish it somewhere in 2009 aer
which I would try to find a PhD position. Jeanne Peijnenburg however persuaded
me to change my plans. She informed me about the existence of a grant that would
enable me to pursue a PhD project right aer I would finish my studies, and
convinced me I would be crazy not to try to get it. e catch was, or actually, turned
out to be in October 2007, that I had to complete my research master programme
before the end of September 2008, and that I thus had to write two master theses in
one summer.
Looking back, I think I initially agreed to this because there was a big chance I
wouldn’t make it through one of the earlier selection rounds for the grant, which
would mean I didn’t have to finish the Groningen-thesis in September aer all,
while by trying to get the funding I would avoid that annoying feeling of having let
an opportunity slip. But I turned out to be in luck, and spend my summer in
Edinburgh with writing. (I should note that for one who wants to get much work
done in the summer, Edinburgh is a great choice, as one will seldom feel the urge to
leave one’s work and go out to enjoy the weather. I should also note I’m very happy
with my PhD position and grateful to Jeanne Peijnenburg for, among many things,
her powers of persuasion.)
Although my thesis for the University of Edinburgh, which is called
Cognition and Simulative Mindreading
, and the present one are different in many
aspects –this one is more than twice as long, for example– there are clear
commonalities, as they both deal with embodied social cognition. In particular, the
sections 2.1 and 2.2 are based on sections 2.1 and 2.2 of my Edinburgh-thesis,
chapter 3 is a strongly improved version of the third chapter of my thesis for
Edinburgh, and section 5.1 contains arguments from the sections 4.1-4.4 from my
other thesis.
Many people have helped me in one way or another in writing this thesis, of
which I would like to mention some in particular. First of all I want to thank my
supervisors Fred Keijzer and Jeanne Peijnenburg for their flexible tele-supervision,
continuously providing me with helpful comments and intelligent advice over
phone, email and even skype. Fred I also want to thank for letting me do a tutorial
on social cognition with him, just before I started working on my theses. I want to
thank Martin van Hees for his interest in my work and his willingness to be my
referee. My gratitude goes to Julian Kiverstein, who supervised my thesis in
Edinburgh, for inspiring conversations about social cognition and embodied
cognition. Lars Marstaller has helped me deciding on my topic – thanks for that,
and many other things. To Han omas Adriaenssen I am indebted for advice,
support, and ‘being in the same boat’, or at least a similar one. A special thank-you
for Trijnie Hekman and Arnold Veenkamp for explaining time and again the rules
and dates that I had to conform with, and for taking care of the administration
concerning my graduation. anks to Daan Franken and Barteld Kooi for,
respectively, advising me on what fonts to use and reminding me that a thesis needs
a preface. I want to thank Jan Degenaar for commenting at least a dozen time on my
abstract and my front-page, and apologise for keeping him of his work. I am
grateful to Reanne and Niek for providing me with a place to stay in Groningen
whenever I needed it and for making sure that my supervisors and my referee
received hard-copies of the semi-final version of this thesis. Finally, I want to thank
my parents and my emos, for everything.
bstract i
1 Introduction
2 Primary intersubjectivity
2.1 Perceiving others and what they see
2.2 Understanding actions and emotions
2.3 Social cognition in infants and nonhuman animals
2.4 On the prevalence of second-person interaction
2.5 To conclude: Some objections to TT and ST
3 Hurley’s Shared Circuits Model
3.1 From instrumental control to simulative mirroring
3.2 Mirroring, action understanding and mindreading
3.3 From simulative mirroring to strategic deliberation
4 Secondary intersubjectivity
4.1 Shared situations
4.2 Embodied cognition and conceptual thought
4.3 Language and reasoning about others
4.4 To conclude: Advanced mindreading is embodied, too
5 Reconciling simulation, theory, and

embodied social cognition
5.1 ‘Simulation’ reconsidered
5.2 eory of mind for embodied social cognition
5.3 ‘ST versus TT’ revisited
6 Conclusions
Literature 78
What enables us to understand each other? e dominant theory in both
philosophy and developmental psychology holds that we rely on an internal theory
of mental states and behaviour. is
theory of mind
is taken to be a data structure
consisting of psychological laws that mediates between our observations of
behaviour in particular circumstances and our predictions and explanations,
connecting for example what
does with what
believes and what
(Ravenscro, 2004). Some argue that this theory is innate, others hold that children
develop it in their first years of life.
Critics of
theory theory
(TT), as this dominant theory is usually called, have
argued it is uneconomical. Proponents of
simulation theory
(ST) hold that we do not
need a special theory of mind to understand others. According to ST, we can make
sense of the mental states and behaviour of another person by imaginatively ‘putting
ourselves in the other’s shoes’. To predict another person’s behaviour, for example, I
pretend to be in the same initial states as the other and then make the decision
myself, as if I was the other person. I then predict that the outcome of my
simulation is the decision that the other is going to make. I can understand the
other’s mind by
my own mental processes of the same type as the other’s,
rather than theorising
mental processes.
Proponents of TT and ST have been arguing for a little over two decades about
who provides the best account of our ability to understand each other. Elements in
the debate have been among other things the earlier mentioned parsimonious
advantage of ST over TT, which of the two accounts explains most successfully the
evidence on the development of social cognition and the brain areas involved with
thinking about others, which of the two fits best with the errors that people make
when trying to understand others, the question of how one learns about oneself and
whether this is prior to understanding of others or not, and the question of whether
ST ultimately collapses into TT or not. e debate leaves one with the impression
that TT and ST are the only games in town. But that’s not the case, I will argue here.
Embodied cognition provides an alternative approach to social cognition.
approach has economical virtues similar to ST, can explain a wider range of social
cognitive phenomena than both ST and TT, and, importantly, clarifies how our
Some theorists that I would place in this approach are Susan Hurley, Shaun Gallagher, Daniel Hutto, Vittorio
Gallese, Dan Zahavi and maybe Robert Gordon.
ability to understand each other relates to other cognitive abilities and to adaptive
behaviour in general. Or so I will argue.
As an introduction to this alternative approach to social cognition, it is
instructive to shortly consider some similarities between the supposed rivals TT
and ST.
(1) TT and ST both assume that we do not have direct access to the minds
of others; that their mental states are hidden away behind behaviour (Gallagher,
2001; 2005). As Shaun Gallagher (2005) points out, they take the problem of
understanding one another to be the problem of other minds, of how we are able to
represent the mental states of another given that these are unobservable. (2) Both
assume that we do this by
their overt behaviours in terms
of mental states. (3) Both assume that such
, which is the commonly
used term for representing another’s mental states, involves an inference; either a
theoretical inference whereby a mental state is postulated, or an analogical inference
whereby a mental state that’s found in one’s own head is projected to the other’s
mind. (4) ey thus both construe making sense of others as a highly intellectual
affair. Abilities to infer, to theorise, and to frame hypotheses, standard elements of
TT, are quite advanced. Similarly, pretending and imagining, which are essential to
ST, are impressive developmental achievements (Hobson, 2007). Furthermore, TT
and ST agree that mindreading depends on the acquirement of concepts of mental
states. (5) ey also both describe making sense of another predominantly as a
third-person activity. (6) And both assume that there is a single, unitary capacity of
mindreading that enables understanding of others; according to TT it is theorising,
according to ST it is simulation (Morton, 2007).
An embodied approach to social cognition rejects all six assumptions. More
precisely, it rejects the conception of mind that
them. For a great part of
the twentieth-century, cognitive scientists have taken the mind to be an internal
information processor, mainly concerned with solving abstract problems. As Andy
Clark (1997) writes, “we imagined the mind as a kind of logical reasoning device
coupled with a store of explicit data – a kind of combination logic machine and
filing cabinet” (p. 1). On this model, cognition is distinct and separated from
perception and action, such that perceptual and motor systems might be reasonable
objects of inquiry in their own right but are not considered relevant to
understanding cognition (Wilson, 2002). Hurley (1998a; 2001) has aptly dubbed
When I talk about ST and TT here and in the remainder of the text, I refer to mainstream versions of both
approaches. It is likely that there are versions of both approaches to which certain criticisms do not apply. In
particular, Robert Gordon’s brand of ST seems to avoid many of the problems that other accounts of ST face.
this model of the mind
the classical sandwich
: it takes “perception as input from
world to mind, action as output from mind to world, and cognition as sandwiched
between” (2008, p. 2). Cognition is central, and interfaces between peripheral
devices for perception and action. e sandwich is classical as cognition is
construed as the formal manipulation of representations in an internal domain.
Embodied cognition takes a different theoretical starting point. Cognition
evolved not to solve abstract problems but, paraphrasing Andy Clark (1997), ‘to
make things happen’: to guide action, to enable more effective coping with the
environment. e mind is first and foremost an organ for controlling the biological
body, rather than a disembodied logical reasoning device (Clark, 1997). Aer
introducing embodied cognition somewhat further, I shall explain why it provides a
different approach to social cognition from TT and ST.
Because perceptual and motor systems are central to adaptive behaviour in the
world, proponents of embodied cognition take issue with the idea that perception
and action are merely peripheral. In contrast, much of our cognitive activity appears
to rely on continuous perception-action coupling with the world. ink about
running aer a ball, driving a car, writing on a piece of paper, or working with a
computer – in all cases the task at hand depends on task-relevant input from the
world and task-relevant motor behaviour from the agent, in continuous and
dynamic loops. Because of continuous interactions between perception and action,
through external as well as internal feedback loops, proponents of embodied
cognition also reject the idea that perception and action are independent (Hurley,
1998a; 2001; Clark, 1999).
Embodied cognition does not only relax internal boundaries –between the
processes involved with perception, cognition, and action– but also boundaries
between organism and environment. Cognition is taken to be essentially
. In
contrast to human designers, who will usually build any required functionality
directly into a device for solving a given problem, the evolutionary process is not
constrained by boundaries between an organism and the environment (Clark,
1997). Proponents of embodied cognition hold that, in Hurley’s (to appear) words,
“evolution, development and learning structure mature cognitive capacities in ways
that are not inherently domain-general and content-neutral, but rather depend on
interactions of individuals with relevant natural and social environment, which
‘scaffold’ their internal processes” (p. 11). ey thus take the environment to have a
prominent place in explanations of intelligent behaviour.
Critics of embodied cognition tend to respond that even if embodied cognition
is right about the importance of perception and action during some cognitive
activities, and about a tight relation between world and mind for some cognitive
mechanisms, there is a domain of cognition that doesn’t have anything to do with
sensorimotor behaviour or with the ‘outer’ world. Hallmarks of human cognition
like imagination, reflection, planning, calculating in the head, and reasoning are
central cognitive activities that can be operated
without any relevant input or
output, the argument goes. Embodied cognition might be right that sensorimotor
skills are very important for acting in the world, but
central cognition, the kind
that cognitive science was about in the first place, is separated from input-output
devices. e classical sandwich, the critic would conclude, is the right model for
advanced cognition.
Proponents of embodied cognition however reject the classical sandwich also
for advanced cognition. As Clark (1997) writes, embodied cognition holds that
“intelligence and understanding are rooted not in the presence and manipulation of
explicit, language-like data structures, but in something more earthy: the tuning of
basic responses to a real world that enables an embodied organism to sense, act, and
survive” (p. 4). Advanced cognitive abilities should be expected to be built on, or to
exploit, more primitive mechanisms and skills for adaptive sensorimotor behaviour.
In Wilson’s (2002) words, “even when decoupled from the environment, the activity
of the mind is grounded in mechanisms that evolved for interaction with the
environment – that is, mechanisms of sensory processing and motor control” (p.
626). is idea has in recent years gained much support. ere is strong evidence
that offline abilities like motor and visual imagery, the use of memory, and internal
problem solving are enabled by the (subpersonal) re-use of sensorimotor processes
(cf. Wilson, 2002; Hesslow, 2002; Grush, 2004; Svensson & Ziemke, 2004; Svensson,
Lindblom, & Ziemke, 2007; Barsalou, 2008). I take it to be a methodological claim
of embodied cognition that for a full understanding of higher cognitive faculties we
must understand how these relate to more basic abilities for adaptive behaviour.
What does social cognition without a classical sandwich look like? What does
embodied cognition have to say about social cognition, how does it differ from ST
and TT, and which elements does it retain from these traditional approaches? ose
are the questions this thesis aims to answer. In the remainder of this introduction I
set out the main differences between an embodied approach and traditional
approaches to social cognition and describe the structure of this text.
An important difference between traditional frameworks of social cognition
and an embodied approach to social cognition is that the latter places more
emphasis on basic skills for social cognition. An embodied approach to cognition
expects our brains and bodies to incorporate cheap and efficient solutions to the
problems that organisms have/had to deal with, rather than centralised solutions.
Instead of domain-general processors that operate on amodal information
transduced from perception, we should expect to find domain-specific, highly
reactive and environmentally driven heuristics, adapted by evolution, development
and/or learning to specific environments (Clark, 2001b; Hurley, to appear).
Embodied cognition, we saw, holds that higher cognitive functions must be
continuous with mechanisms for adaptive behaviour. is counts also for social
cognition. Advanced social cognition can be better understood when we realise how
it relates to more basic mechanisms. Furthermore, embodied cognition provides the
expectation that basic social cognitive mechanisms could turn out to do much
social cognitive work.
Evidence on social cognition of infants and nonhuman animals suggests that
basic social cognitive skills might indeed play an important role in making sense of
others. Infants and nonhuman animals are able to gain a certain understanding of
others, but do not seem to have the cognitive machinery required for theorising or
simulation. ey are able to make sense of others through basic,

social cognitive skills (Gallagher, 2005). I will use the word non-mentalistic for a
skill or an understanding that does not rely on explanation, prediction, inference or
concepts of mental states. If basic social cognitive skills enable infants to understand
others in certain ways before they are able to theorise or to simulate, one would
expect that they do not suddenly stop doing that when infants grow up.
A next difference between traditional approaches and an embodied approach to
social cognition is that the latter emphasises that social cognition occurs usually in
second-person, situated interactions with others. eorising and simulating might
be what we do when we think
others from a third-person perspective, but it is
less clear whether these capacities underlie the understanding of the other that we
obtain when interacting with him or her. On the basis of phenomenological
arguments, Shaun Gallagher (2001; 2005; Gallagher & Hutto, 2008) has argued that
theory and simulation do not play a primary role in understanding others in
interaction. Instead, in many cases the basic social cognitive capacities that emerged
already in infancy might be sufficient. Gallagher’s claim thus provides another
reason to take basic social cognitive mechanisms seriously.
e next chapter is about the role of basic social cognitive mechanisms in
understanding others. e central claim is that a set of skills already available to the
one year old infant, which includes agency detection, eye-tracking, imitation and
the perception of actions and emotions, constitutes what the developmental
psychologist Colwyn Trevarthen has called
primary intersubjectivity
– a non-
mentalistic and direct understanding of others. Furthermore, it is argued that these
skills remain important for understanding others in adult social cognition. e
shared assumptions of TT and ST are thus called into question. e existence of
primary intersubjectivity and the prevalence of second-person interaction shows
that (6) our understanding of others does not depend on a unitary mechanism, (4)
is not an essentially intellectual affair that always involves (2) explanation and
prediction and (3) inferences, and that understanding others does not
predominantly occur from the third-person perspective (5). It will also be argued
that TT and ST are wrong to suppose that (1) the mental states of others are hidden;
due to our primary intersubjectivity, we have to a certain extent direct access to the
minds of others.
As said above, these shared assumptions of TT and ST seem to have their origin
in a shared commitment to the classical sandwich. Both approaches suppose that
mental states are unobservables that the mindreader has to discover through central
cognitive activity. ey thus separate behaviour and mental states, body and mind
(Zahavi, 2007). Perceptions of the behaviour of the other are regarded as inputs that
are fed into theory of mind modules or simulation routines. Whereas TT
characterises the mental operations involved as a kind of rule-based symbolic
inferences, simulation is usually taken to involve the creation of pretend states that
should match those of the other, the inhibition of one’s own mental states, an
introspective act to read one’s own mental state that results aer a simulation, and
an inference from self to other. As Gallagher (2005) writes, on both accounts, “one’s
understanding involves a retreat into a realm of
, into a set of
internal mental operations that come to be expressed (externalized) in speech,
gesture or action” (p. 212). Understanding another is characterised as an internal
intellectual affair, and sensorimotor processes are usually ignored. All this is
reminiscent of the classical sandwich.
An aim of the next chapter is to show that findings on basic social cognitive
skills are not easily accommodated by the classical sandwich, and to put forward
some support for the claim that an embodied approach to social cognition has the
resources to provide a better model. An embodied approach to social cognition
rejects the classical sandwich however not only for basic social cognition, which has
clear sensorimotor aspects, but also for explanations of advanced social cognition.
To resist the critic’s objection that embodied cognition has nothing to say about
advanced offline cognition, a properly embodied account of advanced social
cognition needs to be given, one that doesn’t presuppose the classical sandwich. An
attempt to this will be made in two chapters. Chapter 3 discusses Susan Hurley’s
Shared Circuits Model, which explains how a basic form of mindreading and
strategic deliberation can be built on resources for perception and action. Chapter 4
goes one step further, and sets out to show how an embodied approach to social
cognition can accommodate advanced mindreading like reasoning about others.
is fourth chapter is called
secondary intersubjectivity
, Trevarthen’s name for the
stage that children enter when they become able to share attention and cooperate
with others, as it shall be argued that these capacities play a fundamental role in
acquiring the ability to reason about others. Let me note that I do not identify
mindreading, defined as representing the mental states of others, with social
cognition; the latter refers to a broader class of processes and capacities involved
with perceiving and understanding others and thus includes mindreading.
Mindreading is a relatively advanced social-cognitive capacity, though I will take it
to come in gradations: for example, action understanding involves a basic
mindreading (see §3.2), and reasoning about others is an advanced form of
mindreading (see chapter 4).
Even though an embodied approach to social cognition criticises elements of
the traditional approaches TT and ST, it by no means has to reject them completely.
Embodied cognition constraints theorising about cognition so to develop
biologically realistic theories of cognition. For the same reason, it relaxes some of
the boundaries (i.e., perception-action, perception/action-cognition, organism-
world) that traditional theorists have postulated. Just as embodied cognition places
thought in a body that has to evolve, develop and survive in a world in order to
arrive at a better understanding of cognition, embodied social cognition places our
ability to perceive and understand others in a context of adaptive, embodied and
situated behaviour to improve the study of social cognition. As the comments on
the social cognitive powers of infants and the prevalence of second-person
understanding suggests, an embodied approach to cognition reveals that traditional
accounts of social cognition have focused on too narrow a set of social cognitive
phenomena. But even if theorising or simulation are not as pervasive as TT and ST
suppose, they might still be important methods of understanding others. Chapter 5
argues that, if they are interpreted under a certain light, an embodied account of
social cognition can accommodate capacities for simulation and theory.
Interestingly, theory and simulation re-appear in an embodied approach to social
cognition not as rival explanations of our ability to understand another, but as
complementary abilities. An embodied approach to social cognition turns out to
bring a new perspective to the debate between TT and ST.
2 Primary intersubjectivity
What does understanding another person involve? As said, traditional approaches
to social cognition have focussed mainly on predicting and explaining the
behaviour of others in terms of mental states. We observe certain behaviour of the
other, and we explain that behaviour by postulating mental states like desires and
beliefs. Or we predict future behaviour on the basis of postulated mental states.
Although TT and ST disagree about how we come to positing a certain mental state
–TT says it’s through theorising about the other’s mind, ST says it’s through
simulating the other’s mind– they agree that making sense of another is mostly a
question of explanation and prediction.
Predicting and explaining the actions of another is undoubtedly an important
and pervasive social cognitive activity. Nevertheless, TT and ST might have
overstated their importance in understanding others. Human infants and
nonhuman animals are able to make sense of others, even though they do not seem
to have the cognitive instruments that are required for explaining and predicting in
terms of mental states. Furthermore, as Shaun Gallagher has stressed, even aer the
required cognitive abilities are developed our understanding of others appears to
occur usually in second-person interactions where explanations and predictions do
not play a major role.
e current chapter discusses basic social cognitive capacities and the role they
play in social cognition. e first section is about agency detection and eye-
tracking, the second about emotion and action understanding. It will be argued that
these non-mentalistic capacities are not readily accommodated by the classical
sandwich, whereas an embodied approach to social cognition expects their
existence. e third section argues that these capacities together already enable an
important form of understanding of others, both in infants and nonhuman animals.
In the case of humans, they constitute what Trevarthen has called primary
intersubjectivity, an immediate, non-mentalistic mode of interaction (Gallagher,
2005). e central claim of section 4 is that the capacities that constitute primary
intersubjectivity are not replaced by theorising or simulation in later stages of
development, but instead continue to play an important role in understanding
others in second-person interactions.

2.1 Perceiving others and what they see
e biggest threats that animals face come from other animals. From a designer’s
stance, one would thus expect animals to be equipped with a rather basic devices to
quickly detect other agents. is appears indeed to be the case. Many animals,
including humans, are very skilled at distinguishing agents from non-agents.
Heider and Simmel (1944) were the first to study
agency detection

experimentally. ey created a short movie containing interacting simple
geometrical objects to investigate the perception of “apparent behavior”. To get an
idea of what happens in the movie, see the following description of the first three
1. T moves toward the house [a rectangle of which part of one side could open
in a door-like way], opens door, moves into the house and closes door.
2. t and c appear and move around near the door.
3. T moves out of the house toward t.
4. T and t fight, T wins: during the fight, c moves into the house.
Independent of what instructions participants received before watching the movie,
they used intentional terms to describe the ‘behaviour’ of the simple geometrical
objects, creating stories about love, jealousy and infidelity. e participants
interpreted the objects as agents with intentions. Many studies have replicated the
results of this now classical study. e fact that the geometrical objects used in the
2D animations employed in most experiments on agency detection are very simple
suggests it isn’t features of the objects but of their movements in relation to one
another that make us perceive agency.
Although the mechanism underpinning agent detection yields impressions of
agency that are usually considered characteristic of higher-level cognitive
processing, theorists have concluded that it has, as Scholl and Tremoulet (2000)
write, the marks of a
mechanism – it is “fairly fast, automatic, irresistible
and highly stimulus-driven” (p. 299). When an object that we observe moves in a
specific way –it changes its speed, or it changes its direction, for example– it is
to us as an agent, rather than that we have to
infer from what we perceive
that we are
dealing with an agent.
Stated differently, the attribution of agency to a given object
does (usually) not occur by a theoretical or inferential step on behalf of the observer
on the basis of what he perceives; instead, the observer directly perceives the object
as an agent.

I use the word ‘inference’ here and in the remainder of this text as a personal-level term.
One can object that the participant reports from the simple-object-studies do
not support such a conclusion, because, as Mar and Macrea (2007) point out,
“individuals may not be
agency, but merely reporting the observation of
intentional behaviour as a result of other factors such as demand characteristics and
calculated inference” (p. 112). is problem has been largely circumvented by
means of infant studies, which do not make use of introspective reports but of the
attention time that the infant pays to cues (Mar & Macrea, 2007). Infants are able to
distinguish between inanimate objects and agents on the basis of their movements
(Johnson, 2003; Gallagher, 2001). Indeed, infants, as well as other animals, are
skilled agent detectors even though they are presumably bad at inferences. Brain
imaging studies provide additional support for the perceptual character of agent
detection: Blakemore and colleagues (2003), for example, have shown that the
detection of agency in studies that follow the Heider and Simmel paradigm is
neurally underpinned mainly by parietal networks dedicated to complex visuo-
spatial detection. Furthermore, I think we should take the phenomenology seriously
here: we experience agency, not inferring agency.
e claim that agency detection is perceptual rather than inferential should not
be taken to mean that there aren’t underlying subpersonal functional and neuronal
processes that enable agency detection. It is not my aim to re-invoke the myth of the
given, that is, the idea that input from the environment is passively received from
the mind. Rather, it means that (social) perception is richer than the classical
sandwich allows for. e classical sandwich depicts perception as a passive process
in which the world is translated into percepts that are subsequently transduced to
cognitive mechanisms, where understanding obtains. As Scholl and Tremoulet
(2000) point out, on such a model one would expect that the attribution of agency
occurs by an inferential process
perception has provided the necessary data to
central cognitive mechanisms. e findings on agency detection do not support this
view, as they suggest that the attribution of agency occurs already in perception;
others are
presented to us
as agents, in a personal-level sense, rather than that we
have to infer from what we perceive that they are agents. Scholl and Tremoulet
(2002) suggest that this shows that the visual processing that takes place in
perceptual modules is more elaborate than was previously supposed. Such a
response allows them to uphold a firm distinction between perception, cognition
and action; processing that was thought to occur in cognitive modules turns out to
take place in perceptual modules.
e finding
s can however also be interpreted as suggesting that the strict
separation of perception, cognition and action needs to be relaxed. Proponents of
embodied cognition tend to adhere to a horizontally modular model of the mind,
rather than a vertically modular view like the classical sandwich (see figure 1)
(Hurley, 1998a; 2001). On such a view, the mind’s architecture consists of content-
specific subpersonal modules or layers that loop through sensory and motor
processes (as well as through the environment). Horizontal modules are dedicated
to tasks, rather than to broad functions, in contrast to the vertical modules in the
classical sandwich model. Each subpersonal layer is a complete input-output loop
that is essentially continuous and dynamic, in the sense that it involves external as
well as internal feedback.
As Susan Hurley has stressed in many of her writings, a horizontally modular
picture does not, in contrast to the classical sandwich, presuppose isomorphism
between subpersonal and personal levels (Hurley, 1998a; 2001). e distinction
between the personal level and the subpersonal level is a distinction between
descriptions of contentful actions and mental states of
and descriptions of
informational and neural processes. e subpersonal level of description is the level
of information-processing and of dynamic causal interactions within the organism
and between organism and environment (Hurley, 2008). But, as many philosophers
Figure 1: vertical versus horizontal modularity.
Adapted from Hurley (1998a, p. 407).
have stressed, such processes are not correctly attributed to persons. Persons see,
want, think and act; they do not integrate visual information into objects, extract
semantic information from signs, or execute motor signals. Although subpersonal
descriptions explain how personal-level phenomena become possible or, to use
Hurley’s term for this, how they are
, they by no means need to share
structure with personal-level descriptions. Perception and action do not map
respectively on input and output, but can depend on complex relations between
input and output. is means, among other things, that, with the right feedback
relations in place, motor and extra-sensory processes can play a role in perception.
What does this mean for the detection of agency? On a horizontally modular
model of the mind, agency might be perceived directly by the organism, even
though subpersonal cognitive processing outside of visual areas is involved; the
enabling subpersonal dynamics does not need to respect traditional boundaries
between sensory, cognitive and motor processes. Admittedly, for agency detection
the choice between ‘smart’ perceptual modules and horizontal modules isn’t
straightforward, but other social cognitive mechanisms, we’ll see below, are better
accommodated by a horizontally modular view than by the classical sandwich.
What is important for now is that the findings on agency detection show that
perception is richer than the classical sandwich allows – complex social phenomena
like agency are perceived rather than that they have to be inferred in a later
processing stage through central cognitive processes.
In the introduction we saw that embodied cognition holds that evolution and
development tend to select cognitive mechanisms that are domain-specific, highly
reactive and environmentally driven. Special purpose routines and heuristics will
enhance an organism’s coping with a real-time environment for cheap as they
enable the mind to make maximal use of the structure of information in the
environment (Clark, 2001b). e discussed mechanism of agent detection clearly
fits the bill. It responds to specific movements that are characteristic of those of
biological agents, but not to those characteristic of non-animate objects. It does so
automatically and irresistibly, without requiring effort on the side of the observing
Once something has been detected as an agent, it would –for many animals, in
many circumstances– be helpful to learn more about the detected agent. A basic
mechanism that appears to play an important role in gathering information about
the other is eye-tracking. Humans, as well as many other animals, have a strong
disposition to focus on the eyes when observing another. Infants show a preference
for faces and respond in a distinctive way to human faces, different from how they
respond to other objects (Tomasello et al., 2005). Infants as young as 2 months look
almost as long at the other’s eyes as they look at the other’s whole face, but spend
significantly less time at other parts of the face (Maurer, 1985). Such a face-and-eye
focus makes perfect sense from the current perspective on cognition, as these areas
tend to provide information that one wants to gather as quickly as possible, like
information about the other’s intentions and emotions. Furthermore, following the
gaze of another allows the child to see (1) that the other person is looking in a
certain direction and (2) that the other person sees what she is looking at (Baron-
Cohen, 1995; Gallagher, 2001).
Some researchers (e.g. Baron-Cohen (1995)) suggest that an inference is
required to understand that an observed person actually
what she is looking at.
e infant needs to develop a theoretical understanding of the difference between
seeing and not-seeing, on the basis of its own experience, and to generalise this
understanding to other agents by means of analogy. Gallagher (2001) points out
that such an inference is unnecessary: “on the face of it, that is, at a primary
(default) level of experience, there does not seem to be an extra step between
looking at something and seeing it” (p. 89). Before learning that someone can be
looking in a certain direction and not seeing something that is located in that
direction, the child does not differentiate between looking at and seeing. Just as an
infant can distinguish agents from non-agents without any inferential capacities, it
has the non-mentalistic understanding that the other can see what she is looking at.
Eye-tracking is a domain-specific mechanism that works fast, automatically,
and effortlessly, and is thus the kind of special-purpose mechanism that is expected
by an embodied approach to cognition. Because of this character, it could be argued
that it must be enabled by a perceptual module, a specialised part of the human
visual system, just as Scholl and Tremoulet (2000) suggested that intentionality
detection is enabled by a perceptual module. However, eye-tracking seems not only
to involve activity of the visual system, but also activity of the motor system: the
observer detects the eyes of the other, and follows the other’s gaze, by means of eye
movements, to find out what they are directed at. e subpersonal processes that
enable eye-tracking seem thus to include a complete input-output loop, wherein eye
movements are effects of sensory input from the environment (i.e. the eyes of the
other) as well as causes for new sensory input (i.e. what the other is looking at). As
we saw, such distributed mechanisms are expected by a horizontally modular model
of the mind. In contrast, as a vertically modular model separates perception from
action, and posits that they need to be mediated by distinct cognitive processes, it is
unclear how the classical sandwich would account for eye-tracking.
2.2 Understanding actions and emotions
e next step in the development of useful social cognitive mechanisms, aer being
able to detect other intentional creatures and being able to see what they’re seeing,
would be the ability to perceive meaning in their behaviour. In contrast to the
movements of non- agents, like for example trees that are moving back and forth
because of the wind, movements of humans as well of those of many other animals
are intentional and expressive. It would clearly be adaptive to recognise these
aspects in the behaviour of others. An approaching animal that intends to eat with
you should be responded to differently than an animal that intends to eat you.
Recognising the emotions of another can help you to appraise events (I might not
have seen the tiger that you did see, but respond adequately nevertheless because of
the fear that I perceive in your expressions), and enables us to respond better to the
other’s needs (if someone you try to approach looks scared you know that some
initial comforting is required). As information about the emotions and intentions of
others is required as fast as possible in a dynamic and potentially hostile
environment, and embodied approach to cognition expect the existence of domain-
specific, automatic, irresistible and environment-driven processing for recognising
these states. ere is converging and convincing evidence for the existence of such
Before discussing this evidence, I should explain how I will use the words
action, intention, and action understanding. Analytic philosophers have stressed
that the difference between a
behaviour and an action is that the latter is goal-
directed. As such, it is a behaviour that is under the control of an intention, which is
oen taken to be a state of the organism that includes a goal as well as a means to
arrive at that goal. Because of this close relation between action and intention, to
understand another’s action is to understand another’s intention. A complication
here is that actions have an hierarchical structure. Say I put the light off. In this case,
my goal is to put the lights off and my means is pressing a switch. But pressing a
switch is also a goal-related action, under control of a lower-level or ‘motor’
intention (Jacob & Jeannerod, 2005). And my action of putting the lights off could
be the means for a higher-level goal, like keeping the mosquitoes away. By pressing
the switch I can perform all three actions.
An observer that sees me doing this does
not necessarily grasp
the goals that I try to satisfy with my movements. But this
doesn’t change the fact that by understanding an action of another one also grasps
an intention.
Following Hurley (2008), I will use the term
action understanding
short for understanding behaviour as goal-directed.
Humans are, from a very early age onwards, skilled at understanding the
actions of others. Infant-studies show that by 10 months of age, infants (or rather,
their brains) divide streams of continuous behaviour into units that correspond to
what we would recognise as separate goal-directed acts (Baldwin & Baird, 2001). 9
to 18 months old infants show more impatience when an adult keeps a toy for
himself than when he was making a good effort to give it over but fails to do so
(Tomasello, Carpenter, Call, Behne, & Moll, 2005). Infants of 9 months are able to
distinguish purposeful actions from accidental behaviour, and by 14-15 months
they tend to imitate the former but not the latter (Carpenter, Akhtar & Tomasello,
1998). Infants of that age can also, at least sometimes, discern the intention of
another’s failed action as testified by them going on to re-enact a successful version
of that action (Meltzoff, 2005).
Action understanding has close relations to the mechanisms presented in the
previous section, as well as to other social cognitive mechanisms that are geared
towards the other’s behaviour. Concerning the former, action understanding
appears to build on skills for agents detection and eye-tracking. Understanding an
agent’s goal-related actions presupposes the observer’s understanding of her as an
agent as well as the observer’s understanding of her as seeing things, like objects to
which she has goals, obstacles to potential goals, and results of actions (Tomasello et
al., 2005). As Hurley (2008) explains, action understanding has also close relations
to social learning skills like stimulus enhancement, movement priming, goal
emulation and imitation. In stimulus enhancement the action of a conspecific draws
the observer’s attention to a stimulus which triggers a response, either innate or
previously learned. In movement priming the observer copies the movements of the
other, but not as a means to a goal. Both stimulus enhancement and movement
priming do not require the observer to understand the behaviour of the other as
goal-directed. Goal emulation does seem to require action understanding, as it
ese complications do not arise to the same extent for emotions, as emotions do not have an instrumental
structure (although different descriptions of the same expressions might be possible, e.g. mourn and grief are
at the same time sadness).
Although intentions are analytically related to beliefs and desires, it seem to be possible to represent someone’s
intention without representing his beliefs and desires (Jacob & Jeannerod, 2005).
involves an observer adopting an observed goal of another. Stimulus enhancement,
movement priming and goal emulation have been observed in nonhuman animals
(Hurley, 2008), suggesting that they are relatively basic and non-mentalistic.
Whether other animals can imitate has been a topic of debate, but the present
consensus seems to be that other great apes
do it, even though their capacity for
imitation is less complex and used less frequently. Together with the fact that young
infants surely do it, this suggests imitation is non-mentalistic as well. It is also
debated whether imitation precedes action understanding or comes aer it. As
Hurley (2008) writes, imitation requires “the flexible interplay of copying ends and
copying means; a given movement can be used for different ends and a given end
pursued by various means” (p. 4). It thus seems to presuppose action
understanding, many theorists have urged (cf. Hurley & Chater, 2005; Tomasello et
al., 2005). But at the same time many theorists have the intuition that imitation is
more primitive than understanding the goals that drive another’s behaviour. Hurley
(2008) presents an elegant solution to the issue of what’s first: imitation requires an
understanding of the intentional, means/end structure of observed behaviour, she
suggests, and thus a basic form of action understanding, but can precede more
complex forms of action understanding. is points attention at the fact that action
understanding is a graded phenomenon.
e posture, movements, (facial) expressions and gestures of others provide
besides information about the actions and intentions of others also information
about their emotions. Moore, Hobson and Lee (1997), using actors with point-lights
attached to various body joints, report that subjects are able to identify emotions in
a darkened room, apparently only on the basis of their movements. As with
intentions, it seems we have a grasp of emotions from an early age. Infants
affectively coordinate their gestures and expressions with those of the caregivers
with whom they interact (Gallagher & Hutto, 2008). Gopnik and Meltzoff (1997)
found that infants by their second or third month already “vocalize and gesture in a
way that seems ‘tuned’ [affectively and temporally] to the vocalizations and gestures
of the other person” (p. 131).
Hobson (2007) reports that twelve months old infants
that find themselves on a visual cliff tend to spontaneously look at their mother’s
face when they notice the drop-off; if the mother poses a happy face, most infants
will cross to the other side, while a fearful expression on the mother’s face causes
most infants to freeze or even actively retreat.
Cited from Gallagher (2001, p. 90).
In recent years, much research has been done to understand the mechanisms
that enable action and emotion understanding. As the choice to discuss them in one
section suggests, there is an interesting similarity between the two: both seem to be
enabled by
. Neuroscientists Keysers and Gazzola (2006) observe that
oen when we see someone else undergoing an emotion it feels like we share this
state with the other. For example, we oen experiences sadness ourselves when we
see another person cry. is feeling, Keysers and Gazzola (2006) argue, is more or
less correct: when observing an emotion occurring to another, it is re-activated or
mirrored in ourselves. At least to some extent the behavioural and physiological
responses that are peculiar to particular emotions or sensations occur also when
merely observing them in others (Preston & De Waal, 2002, p.14). Moreover, brain
areas involved with the experience of particular emotions become active when we
observe others undergoing emotions of that type. Concerning disgust, it has been
found that the insula is involved both with feeling disgusted by something and
perceiving disgust in someone else. e intensity of the activation of the insular
cortex is proportional to the degree of disgust observed (Rizzolatti & Sinigaglia,
2008). Not only does the insula light up in fMRI studies both when experiencing
disgust and observing someone else being disgusted (Wicker et al., 2003); people
with a damaged insula loose their ability to experience disgust as well as their ability
to recognise it in others (Calder et al., 2000; Adolphs et al., 2003). Similarly findings
have been reported concerning fear and the amygdala: it is active both when
participants are experiencing fear and when recognising fear in others, and both
capacities are impaired when the amygdala is damaged.
A paired deficit in emotion
production and (face-based) recognition has also been found for anger (Goldman,
e evidence on shared circuits for emotions suggests that we understand the
emotions of others through a functional process that is usually called mirroring
(Keysers & Gazzola, 2006; Goldman, 2006; Rizzolatti & Sinigaglia, 2008). When we
observe someone exhibiting the signs related to a particular emotion, emotional
circuits that are active when we have that type of emotion ourselves are
rough such re-activation of our own relevant emotional circuits, visual stimuli
appear to be coded in a particular meaningful way (Rizzolatti & Sinigaglia, 2008).
is doesn’t mean that we couldn’t learn how to identify emotions in others without
Although I should note there is currently some controversy about the exact role of the amygdala is in the
experience and observation of pain. See for discussion Keysers and Gazzola (2006) and Goldman (2006;
section 6.2).
such re-activation of our own emotional circuits, Rizzolatti & Sinigaglia (2008)
stress. But, quoting William James, they argue that our perception of emotions
would be reduced to a perception “purely cognitive in form, pale, colourless,
destitute of emotional warmth” (p. 189).
Mirroring of emotional processes enriches
our perception such that we can directly
the emotions of others. Similarly,
mirroring of motor processes seems to enable us to perceive the actions of others.
In the early nineties, the so-called ‘Parma team’ lead by the Italian
neuroscientist Giacomo Rizzolatti discovered a peculiar class of neurons in the
brain of the macaque monkey which became known as ‘mirror neurons’ (MNs).
Although MNs are located in what is usually classified as a motor area of the brain,
they have visual properties besides the expected motor properties. ey discharge
when an agent is performing a specific motor action, but respond just as well in the
sight of object-related actions performed by others. Importantly, MNs exhibit
congruency in how they respond to actions of self and other: for example, neurons
that fire when I grasp an object will fire as well when I see someone else grasping an
object, but not –or to a lesser extent– when I throw an object. Interestingly, as
Fadiga et al. (1995) report, this activation coincides with facilitation of the same
muscle groups in the observer as in the acting agent. In normal agents this muscle
activation is covert, and does not result in actual (replicated) movements. But in
patients who show ‘imitation behaviour’ this is different. Without good reason, they
copy the actions of people who they observe. is appears to be the result of
damage to the prefrontal areas, which govern inhibitory control of, among other
things, motor schemas. Taken together, this evidence suggests that MNs are
involved with the copying of motor plans of others, but that these motor plans are
usually inhibited so not to result in overt action. ey appear to mirror the actions
of others, representing these as-if they were performed by the observer herself.
Although mirror neurons were discovered through single-cell recording studies on
monkey brains, there currently is rich evidence that a system consisting of mirror
neurons exists in the human brain (according to the ‘Parma team’, it is widespread
and centred on the inferior parietal lobule and the premotor cortex, which includes
Broca’s area (Gallese, 2005; Iacoboni, 2005; Rizzolatti, 2005; Rizzolatti & Sinigaglia,
MNs show different degrees of selectivity and congruence. If we focus on their
visual properties, MNs can be subdivided in ‘grasping-mirror-neurons’, ‘holding-
From James (1890, p. 450).
mirror-neurons’, etcetera; that is, MNs respond to the sight of a specific type of
action. However, some MNs respond less selectively, and discharge during the
observation of two, or even three different types of motor actions. If we now
introduce motor properties, we see different degrees of congruence. For strictly
congruent MNs the correspondence between the activity elicited by observed and
that elicited by executed actions is nearly identical. However, about 70% of the MNs
are merely broadly congruent: acts coded by the neuron in visual and motor terms
are clearly connected, though not identical . e link can present different levels of
generality: there are MNs that respond to two or three observed acts, but just to one
performed act; MNs that respond both when monkey observes grasping with
precision and with whole grip, but only to precision when it concerns execution;
and even MNs that respond to one kind of action visually, but a different (but
correlated) action when performed (Rizzolatti & Sinigaglia, 2008, pp. 83-84). e
observation that most of the MNs are broadly instead of strictly congruent have lead
some researchers (e.g. Csibra, 2004) to object that MNs do not really ‘mirror’
actions of others. is criticism seems to be based on a misunderstanding of how
MNs mirror neurons are supposed to represent actions. As Susan Hurley writes in
response to Gergely Csibra,
it is important to see that “for mirror neurons to play
an important role in implementing subpersonal simulation functions or the
capacity to understand observed actions, they don’t need to do so individually –
don’t need to be ‘grandmother cells’ of action understanding”. It is more plausible
that MNs “participate in distributed mirror systems that enable aspects of social
cognition”, like action understanding. If an action is represented by a network of
MNs, the findings on congruency do not pose a problem but are rather to be
Researchers of the Parma Team as well as many other theorists now believe that
MNs are fundamental to action understanding. As said, when I observe an action,
my MNs get activated in a way that is similar to when I would perform that action
myself. at is, a visual event is coded in terms of the corresponding motor event,
thereby instantly related to my own motor repertoire. rough this link with my
own motor system, I have information about the causes of the other’s action.
According to Rizzolatti and Sinigaglia (2008), one thereby “immediately

the meaning of these ‘motor events’ and interprets them in terms of an intentional
act” (p. 98, my emphasis). How much information about the intentions of others
On the interdisciplines forum,
can be perceived in this way –whether it are only low-level goals closely related to
the observed movements or also higher-level goals to which the observed
movements are a means– is addressed in the next chapter.
e automatic and instantaneous activation of one’s own motor or emotional
areas when one observes another respectively acting or expressing an emotion
provides one reason to hold that actions (plus involved intentions) and emotions
can be directly perceived and thus do not have to be inferred. Another reason is
provided by the fact that infants and some nonhuman animals can grasp these
states, even though they do not have advanced cognitive abilities required for
inferences (see next section). But phenomenological evidence is also an important
source for this claim. Phenomenologists have long stressed that a subject’s affective
and intentional states are not merely internal subjective phenomena, but are given
in the subject’s behaviour, and thereby visible to others. As Wittgenstein (1980)
We do not see facial contortions and make the inference that he is feeling joy,
grief, boredom. We describe a face immediately as sad, radiant, bored, even
when we are unable to give any other description of the features. (Sect. 570).
Similarly, when I see you walking towards the direction of the music installation, I
see that you’re going to put some music on rather than that I have to infer this from
your movements. We do not have to infer to these mental states from what we
perceive, because we already perceive them in the other’s behaviour. Mirroring
mechanisms, it seems, inform our perception of others.
More so than the findings on agency detection and eye-tracking, the findings
on MNs present a problem for the classical sandwich. Mirror neuron activity
supports the idea that perception and action share a common code of
representation in the brain (Hurley & Chater, 2005; Preston & De Waal, 2002).
Observation of the behaviour of another appears to automatically activate one’s own
representations for that behaviour. Gallagher (2005) speaks in this respect of “a
common bodily intentionality that is shared between the perceiving subject and the
perceived other” (p. 225). Wolfgang Prinz has supported this idea in many studies
(e.g. Prinz, 1997; 2005), showing that perceiving someone performing a certain
action facilitates one’s own performance of this action. e common-coding
hypothesis explains why newborns are able to imitate the facial expressions of
others without being able to use visual feedback of their facial movements (Hurley
Quotation is from Zahavi (2007, p. 30).
& Chater, 2005). It can also explain the automatic, unconscious chameleon effects
and related robust priming effects that have been extensively reported by John
Bargh and colleagues (Hurley, 2008). However, the idea that actions of self and
other are represented in the same structures conflicts with the classical sandwich’s
assumption that perception and action are independent. If perceiving actions and
executing actions share subpersonal processes, the classical sandwich cannot
accommodate action understanding.
In the next chapter we’ll see that a
horizontally modular model of the mind, on the other hand, can easily
accommodate common-coding and the findings on mirroring.
2.3 Social cognition in infants and nonhuman animals
Findings in the previous two sections suggest that before their first birthday,
children are able to distinguish agents from non-agents, to understand that others
see what they are looking at, and to understand actions and emotions of others. It
seems implausible that theorising or simulating plays a role here, as these abilities
are supposed to develop years later. Postulating or projecting a mental state to
another requires a subject to possess concepts of mental states and to use inferences,
and it is widely supposed that this goes beyond the powers of the one year old.

Instead, these early social cognitive skills constitute a primary intersubjectivity, an
immediate an non-mentalistic understanding of others that arrives far before the
child can predict or explain the behaviour of others (Gallagher, 2005).
In fact, primary intersubjectivity seems to be required for the development of
more advanced social cognition (Gallagher, 2001). According to Shaun Gallagher
(2001; 2005), to develop the ability to simulate or to theorise the child requires
already to have (a) an understanding of what it means to be an experiencing subject,
(b) an understanding of what it means that certain things are such subjects whereas
others aren’t, and (c) an understanding that these other subjects are in some
respects similar and in some respects different from oneself.
He suggests that the
child acquires this “massively hermeneutic background” through second-person
Note that the classical sandwich can already not accommodate the idea that motor processes affect perceptual
Proponents of ST hold that an inference on the basis of presumed similarity is needed to project the mental
state that results from a simulation to the simulated person. Gordon (1995a; 1995b) is a notable exception.
e word understanding should here not be interpreted too intellectually; Gallagher is talking about a
practical understanding.
interactions with others that are made possible by the child’s primary
e claim that a basic understanding of the other does not require theorising or
simulation is also supported by findings on the social cognitive capacities of
nonhuman animals. Capacities for agency detection and eye-tracking appear to be
widespread through the animal kingdom, and some nonhuman animals appear to
be able to understand intentional action in terms of goals and perceptions. For
example, Tomasello and colleagues (2005) report a study in which a human began
giving food through a Plexiglas wall to an chimpanzee, but then in some cases
brought out a piece of food and either attempted to give it unsuccessfully or else
refuse to give it to the ape. Similar to 9 to 12 months-olds, chimpanzees responded
more agitated in the latter case but tended to wait patiently when the experimenter
appeared to be unable to give the food, suggesting that they recognised the
experimenter’s goal. Chimps also seem to understand that what others can see
affects what they do. In another study reported by Tomasello and colleagues (2005),
a dominant and a subordinate ape were placed into competition with each other
over food. Some pieces of food were visible to both of them, whereas others were
only visible to the subordinate chimpanzee. e subordinate ape tended to pursue
the pieces that were hidden from the dominant ape’s view. e subordinate ape did
thus not only seem to know what the other could see, but also what this perception
meant for how the dominant ape would act.
e fact that infants and nonhuman animals have an understanding of others
even though they presumably do not have the cognitive abilities required for
theorising and simulation supports the claim that there is a non-mentalistic
understanding of others. From this it however doesn’t directly follow that adults
can understand others in such a non-mentalistic way. It could be argued that once
we have developed a theory of mind or the ability to simulate the mind of the other,
this new feat becomes our default way of understanding others and replaces the
infant’s primary intersubjectivity. Basic social cognitive capacities are on this view
mere precursors of mature social cognition, that do not play a significant role in the
adult’s understanding of others. Such a position might motivate TT’s and ST’s
narrow focus on advanced mindreading.
e argument should be resisted, however. e mechanism involved with
agency detection, eye-tracking and the mirroring of emotions and intentions are
environment-driven, content-specific, automatic and fast, we saw. e information
they process affects how we perceive others. We usually directly perceive another
human or animal as an agent, see what the other can see and what he can’t see, and
perceive intentions and affections in behaviour. It seems implausible that the
acquirement of theory or simulation, which are taken to involve higher-cognitive
processing, can change much about how these mechanisms function, given their
heuristic-like characteristics. It is even more implausible that they would just
replace them, as these mechanisms seem to provide a kind of perceptual
information that is different from the information that theorising and simulating
can generate. Instead, the ability to theorise about the other and simulate the other’s
mind appears to complement these basic skills, and possibly draw on information
provided by the latter (see chapter 3 and 5). Basic social cognitive capacities, I
suggest, remain doing their job even aer the child has developed more advanced
abilities. e next section will make the stronger claim that they provide in many
cases sufficient understanding of others.
2.4 On the prevalence of second-person interaction
Proponents of TT and ST usually describe our understanding of others as
something that occurs from a third-person perspective. For example, in an
influential paper in the TT-ST debate Stich and Nichols (1992) give an illustration
of a ‘typical’ case of interpersonal understanding: “Suppose, for example, you want
to predict what a certain rising young political figure would do if someone in
authority tells him to administer painful electric shocks to a person strapped in a
chair in the next room” (p. 38). TT suggests we predict this person’s behaviour
through a theoretical inference; ST holds we put ourselves in this situation and than
project our own imaginary behaviour to him. As Shaun Gallagher (2005) writes,
when applied to a scenario in which we are thinking about another person from a
third-person perspective, like the example case of Stich and Nichols, both accounts
might provide adequate descriptions of what we do. But making sense of others
does not always happen from a third-person perspective. In fact, such third-person
deliberations about others are outnumbered by the second-person interactions that
we have with others. And third-person prediction and explanation is not what we
do when we are making sense of others in second-person interactions, Gallagher
Gallagher’s objection to the idea that theorising or simulation plays an
important role in second-person interactions is in the first place phenomenological.
He writes:
in a second-person conversational situation, although we may indeed tacitly
follow certain rules of conversation, our process of interpretation does not
seem to involve a detached or abstract, third-person quest for causal
explanation. Nor does it seem to be a theory-driven interpretation that takes
the other person’s words as evidence for a mental state standing behind what
he has just said. Even if we are trying to read ‘between the lines’ and we reach
the conclusion that the person we are conversing with beliefs the wrong thing
concerning the other person, our understanding of this is poorly described as
resulting from formulating a theoretical hypothesis or running a simulation
routine about what he believes. (Gallagher, 2005, p. 211).
Our experience does not contain any evidence for the use of theory or the use of
simulation to understand the other, at least not in most cases. Because of this, there
is no reason to suppose that we do in fact understand others through theorising or
simulating, Gallagher’s argument goes.
Defenders of TT and ST could respond that phenomenology is not relevant
here, as most of the processes involved with theorising and simulating can be
unconscious. Is that right? TT holds that aer a behaviour has been observed, the
person goes through a series of inferences and postulates on the basis of that mental
states to the other. ST holds that to predict the behaviour of another, one pretends to
have the same initial states –for example, the same desires and beliefs– and then
makes a decision given those initial pretend states, which is read through
introspection and then projected to the other. Both accounts describe explaining or
predicting the other’s behaviour as personal-level processes that involve several
stages and several advanced operations from the side of the subject. Does it make
sense to say that one can do all that without being conscious of it? Can one theorise
about the other or put oneself in the shoes of another without noticing it? I tend to
answer both questions negatively. As Gallagher and Hutto (2008) write: “if in fact
such processes are primary, pervasive, and explicit, they should show up in our
experience –in the way that we experience others– and they rarely do” (p. 2).
Besides this phenomenological argument, an embodied approach to social
cognition provides also a practical objection against the idea that theorising about
the other’s mind or modelling it on our own mind is our default way of
understanding the other. Social cognition is usually situated. It takes place during
interaction with others, in which the behaviour of the other continuously influences
Maybe theorising and simulation are meant to be subpersonal processes. I do not think that such an
interpretation is plausible, as ‘explanation’, ‘prediction’, ‘theorising’, ‘simulating the other’, ‘inferring’ ,
‘pretending’ and ‘introspecting’ are personal-level terms, and these are the terms that are pervasive in
traditional accounts of mentalising.
you and you continuously influence the behaviour of the other.
occurs in conversation, during a tennis match, or while having sex. In such situated
second-person interactions we are able to understand what others are saying, grasp
their feelings, laugh with them, play with them, cooperate, coordinate our
behaviours, etcetera. What we can’t easily do in those interactions, or what at least
would take quite some effort, to the extent that the interaction could get disturbed,
is actively reflect on the other. We can ask ourselves ‘Why did he say that?’ or ‘Why
did she respond with that look on her face?’, but will oen do so only aer we’ve
ceased to directly interact with the relevant other. It is doubtful, to say the least,
whether during intensive second-person interaction one has the required time and
resources to create a model of the other’s mind or theorise about it.
Another point that becomes salient when we focus on real-time, real-world
social cognition is that that our interactions with others usually take place in
contexts in which we are already involved in other cognitive activities. ink about
when you go to work: you leave your house, buy a coffee at Starbucks, take a bus,
and greet some people in your office before you start working. Although you
probably weren’t actively thinking about any of the persons you met, you were
perfectly able to act in company and in co-ordination with other people. Besides the
phenomenology, the lack of time and cognitive resources suggests that theorising or
simulation does not play a role in such interactions.
But it might be more important to ask whether we
to predict or explain
the other’s behaviour by means of theory of simulation to make sense of the other in
second-person interactions? Second-person interaction is traditionally
characterised as the transformation of thoughts, feelings, intentions, and beliefs
from one mind to the other. e actual behaviour in social interaction in separated
from the beliefs, intentions and other states of the individual agents, which are
taken to be internal to their minds (Lindblom & Ziemke, 2008). It is thus
characterised, to paraphrase Shaun Gallagher (2001), as a passive process between
two “Cartesian minds”. Such a view on social interaction is enforced by the classical
at what the other does and says affects how we respond consciously to him or her is quite obvious; that
there are also unconscious effects is more surprising. Persons who are interacting with each other tend to
synchronise their postures (Barsalou et al., 2003). Furthermore, during interaction people oen copy actions
or behaviour of their interaction partners - for example, Chardtrand and Bargh (1999) found that when an
experimenter either rubbed her nose or shook her foot, participants tended to follow the experimenter and
perform the same behaviour. Such contagiousness effects can also be found on the emotional level, as the
emotions of others tend to influence yours (later more). As people tend to like people similar to them more,
these effects might be a form of unconscious manipulation of the other. Another possibility is that they occur
because they enhance understanding of the other, by becoming more like him or her. But they could also be a
by-effect of the mirroring mechanisms that help understand other (see §2.2).
sandwich, which separates action and perception from the mind, and takes the
latter to be an internal domain to which others do not have any direct access.
Because the other’s mental states are hidden behind her overt behaviour, cognitive
operations in terms of theorising or simulation are required for mindreading – for
representing the mental states of the other.
But once we drop the classical sandwich, such operations do not seem to be
necessary to learn something about the other’s mind. Due to the skills that
constitute primary intersubjectivity, we do not perceive the behaviour of the other
as mere movements; we can perceive them as intentional and emotional. Once we
relax the distinctions between cognition, perception and action, direct access to the
mind of another becomes a possibility. is opens up a view on social interaction
according to which “communicative interaction [is] accomplished in the very action
of communication, in the expressive movement of speech, gesture, and the
interaction itself ” (Gallagher, 2005, p. 212).
ere is nothing mysterious about this direct understanding of others. Just as
we perceive the world in 3D, full-colour and as packed with objects, we can perceive
other persons and their states. And just as complex subpersonal visual and cognitive
processes enable us to perceive the world as such, complex dynamic subpersonal
processes –amongst others the processes underlying agency detection, eye tracking
and mirroring–enable us to perceive others and their mental states. As Shaun
Gallagher (2008) writes in a recent paper: “
my perception is already informed by by the relevant sub-personal processing.
I don’t first perceive and then add memory in order to recognize my car. My
perception, in this sense, is direct even if the sub-personal sensory processing
that underpins it follows a complex and dynamic route. (p. 537)
Embodied cognition rejects the classical sandwich and its conflation of input with
perception and output with action. We do not need to perceive first, and then use
higher-cognitive processing. Rather, perception is itself rich, because it is informed
by relevant subpersonal processing.
It might be objected that behaviour is oen ambiguous, and that others oen
hide their mental states. Both claims seem to be right, although the first needs to be
qualified. Concerning the first, it must be stressed that our interactions with others
are always contextualised: they take place in a meaningful context (Gallagher, 2001;
Gallagher & Hutto, 2008). Surrounding physical objects help us understand the
behaviour of others. For example, if you see your partner walking toward the music
installation you will understand that he or she intends to put some music on. Or, if
there is already music playing, to put it off or change it. Furthermore, the context
gives information about norms and rules of behaviour. When you see a person start
singing a happy tune in a Greek wedding, you’d think he wants to show his
happiness for the newly weds. When you see him singing the same tune in a
funeral, you’d think he’s crazy. It is in the second case that you will reflect on what
he’s doing, whereas in the first case such an act is unnecessary. e context thus
reduces ambiguity. Again, it seems that usually contextual information affects how
we perceive others and their behaviour rather than that disambiguation takes place
aer perception. You perceive someone who’s shouting at you in a busy an noisy
metro station very differently from someone who does this in a church.
Nevertheless, sometimes another’s behaviour is genuinely puzzling, in the sense
that one does not have sufficient information perceptually to understand what the
other is doing, feeling, or thinking. Moreover, as said above, we tend not to reveal a
significant amount of our thoughts, emotions and intentions. Both points suggest
that in some cases we will have to use something similar to theory or simulation to
‘discover’ what’s going on with the other (without guaranteed success). is is
however not a problem for the account sketched here. e claim is merely that, due
to the basic social cognitive abilities that constitute primary intersubjectivity,
combined with verbal understanding and implicit knowledge of contextual norms,
second-person interactions with others our perception of them is
sufficiently rich
to make sense of them.
is implies that prediction and
explanation by theorising or simulating is not necessary to make sense of another
person. Combined with the lack of phenomenological evidence for such activities
and the strain that theorising and simulating appears to put on our mental
resources, there is good reason to think that in many second-person interactions
they in fact do not occur.
2.5 To conclude: Some objections to TT and ST
An embodied approach to social cognition conflicts at some central points with
both TT and ST. is final section sums up the most important claims of this
chapter, poses some objections for TT and ST, and shows what additional work
needs to be done in the remainder chapters.
Let me stress that the basic capacities central to this chapter are not exclusively for second-person interaction;
they of course also enable an understanding of others whom we observe from a third-person perspective.
As said in the introduction, an assumption that is usually implicit but
sometimes explicit in accounts of ST and TT is that the minds of others are hidden
away, behind the behaviour that we can perceive. In Shaun Gallagher’s (2005)
words, “the mind is conceived as an inner realm, in contrast to behavior, which is
external and observable, and which borrows its intentionality from the mental states
that control it” (p. 209). is separation of mind from perception and action is
characteristic of the classical sandwich. As Gallagher (2005) points out, this picture
reinforces a certain idea of what theories of social cognition are about, namely that
“the problem is to explain how we can access the minds of others” (p. 209). Working
under the assumption that we do not have direct access to the minds of others,
theorists have supposed that mindreading requires either postulating mental states
by means of a set of laws, or projecting the outcomes of simulations performed on
one’s own mind to the other’s mind. However, as we saw in §2.3, infants and
nonhuman primates have a substantial understanding of others. From the
perspective of TT or ST that is impressive to say the least, if not unexplainable, as
they can’t theorise or simulate. How can they access the mind of another without
these abilities?
Embodied cognition provides a different view on social cognition. It rejects the
classical sandwich for social cognition, the view that regards perception as input
from world to mind, action as output from mind to world, and cognition as
sandwiched in between. is rejection appears to be supported by findings on basic
social cognitive abilities, we saw above. Mechanisms of agency detection, eye-
tracking and of emotion and action understanding are not readily accommodated
by the classical sandwich. Agency detection and eye-tracking are skills with
cognitive import, even though they do not involve any theoretical inferences or
central cognition. Eye-tracking appears to depend on subpersonal coupling between
sensory and motor processing, rather than only on processes internal to vertical
modules. Emotions and actions of others are processed in the same circuits that
process their execution of the self, which conflicts strongly with the classical
sandwich’s strict separation of perception and action. From the perspective of
embodied cognition, I argued above, the findings do however make perfect sense.
By rejecting the classical sandwich, embodied cognition relaxes the boundaries
of our minds and makes them accessible for others. e behaviour of the other is
more than just output from the mind and our perception is more than just input
from the world; both become mindful. In effect, it becomes possible to perceive the
other’s perceptions, emotions and intentions in his or her behaviour. e findings