BEING ABOUT Chapter 6. Representing
Everything I call a representation is public. Hacking 1983, 132 In my formulation representation and cognition are not synonymous. Where other writers talk about mental representation I speak instead about cognitive structure or structural aboutness. If we use the same term both for a picture or sentence and for the bodily states by means of which we use them, we have lost our ability to speak clearly about their relation. A theory of representation is implied within a general theory of cognition; a theory of cognition is not implied within a theory of representation. A cognitive philosophy of representationRepresenting practices can be used to organize states in the user that are like the state that would be produced in the presence of something. This amounts to a sort of functional equivalence of thing and representation, but the equivalence depends on the user's response. There is no external relation of environmental thing and representing object. There is no re-presenting of the thing, only a re-evoking of a state. The same representational form can evoke different states in different contexts. Any representational effect is a structural alteration of the user, a physical, dynamical event. It is always a partial effect; it is never the only thing going on for its users, who must continue also to be about other aspects of their physical context. Representation is ontologically complex: it uses presence to evoke simulation. We are in a room, looking at a photograph. We are perceptually entrained by the room. We're seeing it. There is a sheet of photographic paper behind glass on the wall. We are seeing it too. The structured light by which we are seeing is being structured then and there, but the light reflected from that sheet of paper is also being structured (more or less) as it was when it entered a lens years ago. Looking at the photo, we are being entrained by a real structure of light to a fictional effect: it organizes us as if we were with an object we aren't with. We are not imagining the photograph. We are seeing everything else in the room, and we are seeing the photograph. The light reflected from the photographic surface is everywhere convolved with the light reflected from the rest of the room: we are seeing and seeming to see by means of the same medium. Some of the light is being used for perceptual purposes and some is being used for simulational purposes. The photo's presence as a representational object is not illusory. We do not imagine it but we imagine by means of it, the same way we can imagine by means of a flow of cloud or a line of type. Because there is no representational effect without physical presence of some kind, we always have the option of foregrounding the representing object as such. We can look at the grain on the photographic print. We can examine the leading in a line of type. We can listen to the voice talking to us instead of (or as well as) letting it engage us in simulation. The form and the presence of the representing artifact or event are critical, but they only mediate representational effect, they do not constitute it. In representing, the effective loci are social and cognitive. Representation is inherently social: it is a collection of social practices by which cognitive beings regulate each other's structural aboutness. We can use representations when we are alone, but particular representing practices come into use because they work for communities. Use of representational artifacts and practices to manage other people's cognitive states is cultural and technical. Representational artifacts are part of the culturally modified environment, like paths and pots and other tools. Representational media are systems of cognitively effective devices accumulated over millennia of invention and refinement. Every medium is thus a technology. Some, like mask- or statue-making, employ artifacts that are cognitively so effective they can be used (but not made) without training. Others, like linguistic and notational systems, require and create communities of users whose training goes on for many years because the cognitive structures they evoke are both elaborate and very minimally cued by the physical presence of the representational artifact or event. What I have sketched above is a basic reorientation in thinking about representing. It has the advantage of giving us a unified account of representational effect across media; it gives us flexible ways to describe the particularity of any representational instance, and it also allows us to begin to think more generally about representational effects -- metaphor for example -- common to many kinds of instance.
Representing capabilities evolve among preexisting perception/action skills and preexisting forms of social coordination. Common response to a shared environment and developed social attention are two prerequisites for communicational behavior. Conspecifics will be similarly structured in relation to a shared environment simply in virtue of their similar bodies. Through the course of evolution, conspecifics also become increasingly attuned to each other. There can be communicational effect without deliberate communicational action: fish swim in schools by being about the current and simultaneously about the location of other fish. Given their similar conditions and mutual attunement, an unusual motion by one of the fish can alter the operating state of all the others. In primates, environmental and social response are coordinated by these involuntary means to some degree, but they are also coordinated in joint attention. Primates look where another primate is looking. They do so reflexively (Baron-Cohen 1995), by means of a patch of cortex in the ventral stream, an area near the face perception area that is specialized for gaze perception. (This area will respond also to rudimentary pictures of eyes and even to pairs of arrows.) Human infants as young as 2 months will look where a second person is looking (Scaife 1975, 265). Primates will also respond to directional gestures if they are accompanied by eye contact (Krause and Fouts 1997, 335). Attention in infants and primates can also be directed by movements of head and chin, and by the whole posture of a conspecific. These socially directed but automatic orientation responses can be understood as forms of act empathy. Act constant neurons in premotor cortex (described in chapter 4) respond in the process of performing a particular form of grasp, and also when a monkey sees a human or another monkey performing that act. If acting and seeing an action evoke the same structure in certain eye-related arm and hand networks, we can guess that pointing and seeing a point may have similar consequences in an expanded wide net. When I see her pointing at that object, it may be (in some ways) as if I am pointing at that object. I begin to be about what she is about. There are similar findings for communicationally significant behaviors of other kinds. Newborn infants will imitate facial expressions (Atkinson 1993). Mirror neurons in facial motion areas presumably activate networks that reach into non-motor as well as motor tissue, so that facial imitation can also a effect a broader imitation of structural state. What is implied by act empathy findings is that gaze and point and facial expression and the rest of these orientative motions provide a platform of social effects that can be expanded with developing representation practices. A further precondition for representing as we know it is referential intent. A primate can learn to exploit another primate's reflex response to gaze, arm position and posture in order to show something or to request it. Laboratory macaques (but not wild) use gaze and point to refer attention to out-of-reach objects. Human infants are able to form a pointing gesture by 9 weeks, but they do not use it to refer attention until 14 months (Itakura 1996). They use referential gaze somewhat earlier.
Representing forms are used to build or evoke structural aboutness and to guide its development over time. The category of representation is very diverse, but every representing event is a dynamical event. To have representational effect, the representing event must be perceived; and it can only be perceived if the perceiver is restructured in response to it. Representational response exceeds perceptual response, but depends on it. A representational event can make use of extended and elaborate perceptual change, as when we watch a movie. Perceptual effect in representation can also be relatively slight, as when we read a number. In either case, the person using the representation is varying as a function of the physical form of the representing artifact. The material organization of a representation can affect the user in ways that are not specifically representational. When we hear speech, we hear a speaking body: we hear the gender, the age, the mood, the attitude, and often the identity of the speaker. When we look at a drawing, or even when we read mathematics written on a blackboard, we see a person's energetic quality as registered in the trace of their motion. Perceptual aboutness of these kinds, and the sorts of aboutness that are specifically representational, both occur by means of wide nets of cortical activity. They co-occur. The networks that are their means overlap to various degrees. The physical presence of a representing form may be effective without being consciously perceived. The structure by which we perceive it may not be part of the sentient subnet; we can read without conscious awareness of the letters on the page. But we cannot read without have seen them. Because they are perception-based, representational events can be understood as forms of dynamic entrainment. Speech is a central instance. To understand speech, the listener must take in the microrhythms of the speaker. Live performances of any of the time arts -- music, theatre, ritual, dance, physical comedy -- are vivid instances of dynamic entrainment. The structural aboutness of audience members is slaved to the timing of the performer-- think of children listening to a storyteller. It is a mediated invasion. We don't get entrained brain to brain, but the representing event -- the spoken sentence, cautious as it may be, the sight gag, reworked as it may be -- set us up in ways that, if they are not wholly the maker's ways, are still ways made by the maker. Representations are of course also used interactively. Conversations have rapid reversals in the direction of dynamic control. Interactive uses of representing forms are collaborations in the construction of aboutness. This sort of collaboration can use many sorts of representation at once. In its developed form, every medium uses specialized cognitive abilities, but representational media do not evolve separately. The presence of existing forms is the cognitive context for the continuing development and use of others. Chimpanzees who have been taught American Sign Language are found signing the names of objects whose photos they see in magazines (Krause and Fouts 1997, 335). People working on a shared task in a shared space -- the navigation room of a naval cruiser, in Hutchins' example (1995) -- may be using speech, mime, maps, diagrams, and instrumental readings, all at the same time. People playing collaborative cyberspace games organize a complex, geographically distributed, but concurrent common aboutness through media relayed by linked computers. Like many representational processes, this is real dynamical interaction to simulational effect. Representational forms differ along performative-interactive and instantaneous-extended axes as described above, and they also vary in terms of several other continua. Some representing forms are as transient as a spoken word and others store long potential for reconstructing aboutness. We can look at stone carved in 600 BC and be structured (to some limited extent) as if we were seeing a Greek person of that time. The handwriting in my friend's letter does not freeze the time in which she wrote it, but it retains the form she gave it in that time. Because it does so, I can read it again, and when I do, it can set up structural aboutness like the structure it set up the first time I read it. The wide net it sets up may include structures that constitute remembering when I read it before. There is a sense in which even transient forms endure as social practices rather than as formed materials: nodding yes is an example. Representational media require different degrees of engagement. Mathematical and musical notation can be used only with concentration, but a photo can be used with no effort. Seeing a photo or drawing of something is very different from seeing the real thing, but nonetheless we are seeing in order to seem to see; perceiving and seeming to perceive are using the same modality. Seeing a picture of a thing structures vision from the periphery, bottom-up, the way seeing an object does. In both instances structural response propagates through the mass of primary visual cortex and into the ventral stream for focal vision. When we read mathematical notation, on the other hand, we use ventral vision to make out the mathematical symbols, but the specifically representational net we are setting up uses areas of dorsal cortex related to action. Representation and simulationWe simulate when we are structured as if we were in circumstances that are not present. We do not simulate the presence of something; we simulate being present with something. We simulate cognitive acts, not things. We do so by means of bodies structured for effective response through eons of evolution. Simulating has itself had to pass evolutionary tests. So has representing. This is not to say that we can't go wrong by means of both. Compared to the simulational free association we experience in dreaming, the simulating organized in representing is very constrained. It is constrained in many ways and at many scales at once. It's constrained by social context. Its parts are directed by parts of the representational event, and it is organized cumulatively by the whole of that event. Some representing practices are obviously simulation-supporting, and others are not. Novels are an example of the former. Music and mathematics are examples of the latter, but they too can be understood as simulational.
Electroacoustic music is a good example of the basic relation of representational form and representational effect because composers are explicitly aware that they are creating an evocational context. A composer using this medium constructs an acoustic event that is digitally synthesized from mathematical instructions, or that uses manipulated environmental recordings, or both. In generating the piece, composers listen to their materials, assemble them, and listen to them again, testing their effect and making changes. The composer is investigating effects the listener will realize in performance. The listener is the instrument being played. The electronic composer Denis Smalley has described the composed effects he works with as indicative fields and indicative networks (1992). Smalley is primarily interested in the phenomenology of these effects -- what they are like for the listener -- but his manner of speaking is also in keeping with the wide nets by means of which these effects are produced. An indicative field for Smalley is a kind of base level aboutness evoked as we listen. An indicative network is the whole standing mix of the kinds of aboutness set up during a passage. Some of the indicative fields he names are substance and object, gesture, utterance, vision, space, and energy-motion trajectory. Real sounding events have acoustic qualities that can be recorded or synthesized so we can seem to hear pounding, scraping, or brushing gestures. We can seem to hear interactions with materials such as glass, metal, wood. We can seem to hear rain, wind, fire, doors slamming. We can seem to hear human or animal voices. We can easily seem to hear any of these things at particular distances and directions. These are instances of very straightforward simulation-supporting representational hearing. Indicative fields are interrelated; they invariably occur together in the circumstances in which they are learned. Acoustic events evoke non-acoustic events. When we seem to hear rain we may seem to see it, and may also seem to feel it. These are commonplace synaesthesias. There are also the synaesthesias we could attribute to parietal and prefrontal gradients: when we seem to hear a hand on a drum we can also seem to see it, while simultaneously seeming to feel our own palms and shoulders in the act. While we are seeming to hear and see someone speaking we can also be seeming to feel muscles tensing in our own mouths and throats. When we listen to an electroacoustic composition, then, there must be cascades of indicative triggering. From primary auditory matrices activity is propagating into unimodal and then into multimodal association cortex, both in ventral and in dorsal streams. We could call the aboutness accomplished by multidimensional matrices in these associational areas indicative subfields. Joined in recurrent wide nets, subfield foci accomplish a global perceptual/simulational aboutness. There will always be large areas of the net active outside the core dynamic subnet of sentient activity. Activity in these areas can also be invoked by the composer, deliberately or not. Part of what happens in a listener will be managed by the continuing events of the composition and part will be an unmanaged result of preexisting net dynamics. A piece of music gradually becomes its own context. New activity propagates into networks already formed. The composition will maintain activity, accumulate it, inflect, emphasize, or suppress it. There will be continuous integration of personal and composition-controlled effects. As a result of network self-organization, people will have idiosyncratic responses unforeseen by the composer, what Smalley calls personal indicative shadings. Language as simulational scoringI have given Smalley's description of indicative networks on the way to talking about language because I am going to describe language use as composed wide net evocation of the same kind. We do not already have a philosophy of electroacoustic music and it is easier to make one up in neural terms. We do already have a philosophy of language, and it has nothing to say about the sorts of representational effects I have outlined above. As we understand language, a wide net forms and reforms and shifts and drifts, and we experience a constantly changing play of indicative mixes -- of aboutness. The indicative fields evoked can include kinds of fields not usually considered when we talk about language. The best reason for describing linguistic representation in terms of structural aboutness is that it grounds the theory of language in the whole of the body. Neural networks are in functional contact with, for instance, the endocrine and immune systems. When a sentence makes us blush, we can count that as representational effect. If reading Anna Karenina and drinking strong tea in a London boarding house makes me happy, that should count as a representational effect too -- organized jointly with straightforward perceptual and directly chemical means. Representational events may organize an entire cognitive mode. The body has no difficulty convolving effects, and our theory should acknowledge the contextual flexibility with which we manage comprehension and articulation, even where we can't account for it in detail. Language is a paradigm representational medium because it can have maximal cognitive effect with minimal physically present form. Along with mathematical notation, it is the most extreme, the most technical, development of representational function. Children are speaking by two, but the full resources of language are learned over many years, and by few. For a nonhuman primate, representational response alone is a sophisticated accomplishment. What has to be learned is triggered structural evocation of a specifically linguistic kind. It is taught as perceptual synechdoche: think of the whole when you see a part. An aspect of this sort of evoking happens naturally. A chimp will see a bit of a berry and go find the rest. What a linguistic chimp has to learn over months and years is representational context: when these conditions are present (the experimenter in a certain posture, maybe), respond by making a part-whole association. Premack calls it the naming schema and describes it as the first and most difficult stage in teaching chimpanzees language (Premack 1983, 19). There are easier and harder part-whole associations. Almost any chimp can think of the orange when shown the peel, or even just the color of the peel. It's probably something about how an orange is seen, with color as a separate focus in the ventral object stream, and a whole orange object-constancy focus further downstream. The interesting thing is that once it has the hang of triggered associative response in representational context, any chimp can think of the orange in response to the name as easily as it thought of it in response to the peel (Premack 1983, 121). This is so whether the name is given acoustically, as an ASL sign, or as a graphic symbol. At its most developed, in literate use, language is the fleetest and most flexible representational medium in existence, evoking indicative mixes of extraordinary specificity at extraordinary speeds.
Language serves to prompt the cognitive constructions by means of very partial, but contextually very efficient, clues and cues. We solve massive underspecification at lightning speed Langacker 1983, 189). Linguistic theory has discriminated grammatics and pragmatics because it has wanted to understand languages as formal systems. There has been no way to understand the obviously dynamic, located, performative aspects of language in formal terms. If we are thinking of every linguistic occasion as embodied, however, the grammatics-pragmatics distinction would have to be a distinction between kinds of dynamical effect. When language is used to direct attention in real surroundings, speech is a way of modulating structure predominantly given by the speakers' mutual circumstance. Little need be said. Head it off, says the sister to the brother, where both are chasing a cow through the oat field. When we aren't there, however, sister, brother, cow and field have to be mentioned. That is, when language is being used to coordinate complex collaborative simulation rather than presence, there is more need for all the dynamic guidance afforded by complex grammatical form. Grammatical form is said to include lexicon and syntax, lexicon being the stock of terms, and syntax being the system of distinctions governing the arrangement of elements into strings. The lexicon of names can be understood as evoking indicative fields and subfields, simulational states that may be the farthest thing from an image, since they may be muscular, kinesthetic, vestibular, endocrine. A language with only lexical means is crude, the way a pidgin is crude. Fluent language also has ways of directing dynamical effects among and across indicative subnets evoked -- ways of managing sequences, emphases, inflections, mutual inhibitions, prolongations, integrated accumulations, stops and starts. Some of its means have been called pragmatic: the syllabic foot, the flights and perchings of intonational contour. Some have been called grammatic: phrase and clause structure, the parts of speech with their customary markers. A long sentence with many subordinate clauses must accumulate a wide net with distinct subnets able to hold their own within it. Both pragmatic and grammatic directives are needed. Both are dynamical in effect: both are technologies for complex integrational control. While it is not true that we have known nothing about syntax and cognitive systems, since speakers and writers have controlled cognitive effect with great and often conscious syntactic precision, we have known, and still know, very little about syntax and the brain. If we assume certain basic principles we can, however, begin to see how a cognitive grammar will have to go. The principles are the principles of a general cognitive theory of representation: representational form organizes cognitive action by organizing wide nets. The wide nets organized by representational means are part of the structural means by which we are related to present environments, or by which we seem to be so related. Any representational effect is a dynamical effect, an alteration of physical structure. Dynamical effect includes energetic propagation, focalization, and inhibition. It includes structural stabilization, accumulation, maintenance, coordination and integration, and rapid or gradual shifts. The language netTwo important changes in the way we think about language have resulted from functional imaging studies of brain activity during linguistic tasks. The first is the discovery that language is not the product of a local module (how could we ever have thought it was?) but uses networks very widely spread through the whole of the left hemisphere and parts of the right (Damasio 1989). The second is related to the first: it is that some of the foci involved in language nets are not exclusively linguistic; language also uses connections built for basic situational action and perception. Damasio thinks of the whole of the language net as spreading through three zones centered around the Sylvian fissure: a language implementation zone that includes the traditionally recognized language areas; a mediation zone, used non-linguistically as well as linguistically, adjacent to the implementation zone; and what he calls a conceptual area spread throughout primary sensory and motor cortices outside the implementation zone. Damasio's large-scale distributed system for language (as summarized in Grabowski and Damasio 2000, 429) is in accord also with descriptions by Mesulam (1990). [6-1 Damasio's implementation and mediation areas]
Language implementation areas are language perception and production areas -- left hemisphere perisylvian areas active during perception and construction of word forms. The posterior section of the implementation zone includes primary areas for modalities used to perceive language. Language perception can include words one sees written, spoken, or signed as well as words heard. It includes perception of one's own language production. Perceptual foci in the language net would thus be auditory, visual and somatosensory for spoken language, and visual and somatosensory for signed and written language. Linguistic construction can similarly include oral, written, and signed word production. Like other muscular action, speech articulation, auditory or signed, is instituted from motor cortex. Premotor and prefrontal cortex are important to the complex scheduling required by sentence production, in which articulatory rhythms are begun, suspended, and resumed, sometimes with nested subcomponents several levels deep. The anterior section of the language implementation area thus includes premotor and prefrontal cortex in left inferior frontal cortex, as well as magnificatory mouth, throat and hand areas of motor cortex. [6-2a Damasio's extended language regions] [6-2b Damasio's extended language regions] Taken together, areas in the implementation network include primary sensory areas including left auditory cortex, primary and secondary vision areas and the lower third of left somatic cortex, lower parts of inferior parietal areas 39 and 40, premotor cortex active in initiating and scheduling speech, and motor cortex, as well as Wernicke's and Broca's areas, the two centers that earlier were thought to be the only language centers of the left hemisphere. Mesulam found reciprocal connections between many of these regions (1990). The network character of language function is changing the way we understand Wernicke's and Broca's areas: both are now being seen as associative centers that set up 'word meaning' subnets both in response to language forms, and in the process of producing language forms (Mesulam 1990, 602-606). Wernicke's area, near both auditory and visual cortex, is a large region of association cortex active in linguistic function no matter what sensory modality is used. Because lesions in this area result in difficulty understanding language, Wernicke's was formerly thought to be a module for language comprehension. Wernicke's is now known to be intensively interconnected with frontal cortex and active in speech preparation as well as perception. There has been long uncertainty about the exact location and function of Broca's area, the classical language area in frontal cortex. Broca's area has been assigned to Brodmann's 44 or 45; and it has been called either premotor or prefrontal. Broca's aphasia, formerly understood as a linguistic articulation syndrome, results from lesions of inferior frontal cortex. [6-3 Broca's and Wernicke's areas with connections] Like many classical syndromes and their localizations, Broca's aphasia and Broca's area are now being understood to be more complex than we thought. Mesulam now believes the anterior language implementation region includes area 44 at its core but also an adjacent rim of areas 45, 47, 12 and 6. Of these areas, 44 and 6 are premotor ... while areas 45, 47, and 12 are constituents of prefrontal heteromodal cortex. Damage to the motor association component alone seems to elicit a motor deficit confined to language output but not the full clinical syndrome known as Broca's aphasia. 1990, 603 Wernicke's and Broca's are connected by a bundle of fibers called the arcuate fasciculus. The arcuate fasciculus can be understood as a somewhat insulated high-speed phonological through-line from auditory to motor areas; it is a white matter bundle that underlies the ventral, magnificatory end of somatosensory and motor cortices, which are interposed between Wernicke's and Broca's on the cortical surface. Kimura reports gender differences in the relative importance of anterior and posterior areas to motor preparation for speech; male speech is more damaged by posterior lesions near Wernicke's, and female speech is more damaged by anterior lesions near Broca's (1992, 124).
Damasio's language mediation zone surrounds the language implementation areas described above. It includes portions of the left frontal, temporal and parietal lobes. Lesions in these areas result in subtle impairments of linguistic function but do not result in the classical aphasias. Language mediation cortex is association cortex that includes nodes acting as two-way switches between language perception/production subnets and Damasio's conceptual or non-linguistic perception/action subnets whose activity either is triggered by linguistic forms, or else triggers construction of linguistic forms (H Damasio et al 1993, 504). Like language implementation cortex, language mediation cortex is normally left hemisphere cortex. (More on hemispheric lateralization in Chapter 8.) 'Conceptual' structure evoked by linguistic forms via structures in mediation cortex is, however, found in both hemispheres (Hanna Damasio 1996, 505). The Damasios have found that response to a word is similar to response to a thing or to a picture of a thing. Environmental behaviors directed by linguistic communication are coordinated by networks spanning many parts of both hemispheres, and the same seems to be true for linguistically directed simulational states. A name or picture of a hammer, like the hammer itself, would evoke cortical structure relevant to "the typical action of the tool in space, its typical relationship to the hand and to other objects, the typical somatosensory and motor patterns associated with the handling of the tool, the possible sound characteristics associated with the tool's operation, and so on" (1994). Evocation of some part of the potentially large number of relevant responses, "over a brief lapse of time and in varied sensorimotor cortices", would "constitute the conceptual evocation for a given tool" (Tranel, Damasio and Damasio 1997, 1324). What Damasio means by a conceptual zone, then, is areas of sensory or motor cortex in which object- or event-relevant structure can be evoked, via mediation cortex, by different means and for different purposes. He thinks of it as conceptual because it is not exclusively linguistic but is common to many kinds of differentiative response. Along with activity in Damasio's three zones, the reentrant wide net engaged when we use linguistic forms will also include subcortical and medial cortical saliency areas that set arousal levels, emotional tone and mnemonic state.
The various kinds of simulation named in Chapter 5 -- delay tasks, mnemonic reconstruction, deliberate imagining, and dreaming -- were described as perception/action structure evoked, maintained and organized through posterior counterflow activity, with or without prefrontal involvement. Hearing or reading language can be thought of as most like a kind of externally-cued dreaming, because when we hear or see speech, as when we dream, response seems to be organized by predominantly posterior means. Convergence/divergence regions described by Hanna and Antonio Damasio as important in triggering simulational aboutness are central to the Damasios' account of language function as well. As described in Chapter 5, convergence nodes within convergence regions are neuronal ensembles from which widespread patterns of activity may readily be reconstructed. Activity may traverse these zones bidirectionally. There are many linguistically specialized nodes in higher-order cortices of the left temporal lobe. By means of a series of convergence/divergence steps, linguistically triggered counterflow activity may initiate and synchronize increasingly complex trains of response to linguistic form, which may occur at various categorical levels, and may extend all the way into primary sensory and motor cortices. Convergence zones form near and between association areas important to aspects of perception and action. Systems that mediate access to concrete nouns, are for instance, "close to systems that support concepts for concrete entities" (Hanna Damasio et al 1996, 504-5). Some at least of convergent connectivity is idiosyncratic and cultural, modified by learning within an individual lifetime. Linguistic convergence zones would in fact be a paradigmatic instance of cortical plasticity, as every language organizes different sorts of sensory-motor circuits for perception and production, and must, therefore, also set up different mediational structures between these specifically linguistic, and other more generally semantic, circuits.
We can speak and understand language without prefrontal cortex (Deacon 1997), but imaging studies nonetheless find activity in a number of prefrontal areas during language tasks. Along with coordinating the complex nested cycles of speech production, parts of Broca's area in the left prefrontal are found to be involved in verb generation and classificatory naming of objects, as opposed to phonemic aspects of speech decision and action (Grabowki and Damasio 2000, 450-51). Our understanding of the linguistic role of both Broca's and non-Broca's prefrontal areas has been under revision: the functional imaging evidence points to a more complex neural arrangement for word production than was ever predicted by lesion studies. Stimulus-driven speech (e.g., verbatim reading or repetition) may not engage Broca's area to any extent, whereas effortful selection of words will involve not only Broca's area but also the middle frontal gyrus. Grabowski and Damasio 2000, 458 Recall that prefrontal areas are important to deliberate action. Because there is more activity here during experiments in which there are many competing response alternatives, Damasio and others are guessing left inferior prefrontal activity may be part of a process of selection among alternative linguistic acts (Grabowski and Damasio 2000, 453). Linguistic selection may be an aspect of "a general mechanism of selection ... in a wide range of both semantic and nonsemantic tasks", or else different regions of prefrontal cortex may perform similar sorts of decisional procedure for different sorts of task (Thompson-Schill et al 1997, 14796-7). We have seen that left prefrontal activity is important to delayed sensory response -- that is, to maintenance or reactivation of posterior sensory activity (Courtney et al 1997, 610). In monkeys lesions centered in the principal sulcus impair ability to maintain visual structure in spatial and other delay tasks (as described in Chapter 5). Imaging studies for humans find the homologous area of middle frontal gyrus, Brodmann's 46, consistently linked to delay capabilities. Prefrontal sensory maintenance may be important to language in several ways. We have to be able to remember linguistic forms while they do their cognitive work, or while we are in the midst of producing them. Delayed sensory response organized from prefrontal cortex has been found to include specifically verbal working memory -- maintenance or reactivation of structure relevant to the sight or sound or feel of verbal perception or action (Grabowski and Damasio 2000, 450). Areas near the precentral gyrus and different from those involved in phonetic processing are found active in short-term verbal maintenance (Papathanassiou 2000, 352-3). Goldman-Rakic believes human prefrontal cortex may contain multiple working memory centers, as does the monkey's, and that verbal tasks activate Brodmann's 45 and/or 44 but not 46. We also need to be able to remember or imagine what we want to talk about; Goldman-Rakic believes linguistic activity found in area 46 may be generally simulational rather than specifically verbal (1987, 378). Linguistic timing and spacingThe syntax of a language is a many-leveled system of significant contrasts at the level of the phoneme, the morpheme, the phrase, the clause or sentence, and the text. What is the cognitive work done by these elements, and how, and where, is it done? Deacon (1997) suggests that one of the functions of syntactic form is to distribute activity into regions of cortex responding at different time scales -- phoneme, word, sentence, and text requiring separate integration at different speeds. At the same time, perceptual distinctions made at the various syntactic time scales must constrain and disambiguate one another in simultaneously multilevel, multifunctional ways (Halliday 1994). Word comprehension has to wait for phoneme discrimination, but phoneme discrimination has to wait for word discrimination too. And word comprehension often has to wait for the rest of the sentence. How does syntactic form set up a neural context in which these segregations and integrations can occur? Here, as elsewhere, we can imagine a wide net being formed at the same time that its smaller subnets are forming, the whole recurrently interconnected. Wide net dynamics can account for the level of the text as well as the level of the sentence, since the effect of any sentence of a series arrives into a cortical context already active. At each of the syntactic time scales a set of categorical contrasts comes into play: subject and predicate at the level of the phrase; noun, verb, pronoun, preposition, and the rest, at the level of the word or morpheme. At least some aspects of grammatical category correlate with spatial as well as temporal differences in linguistic response. As will be seen below, syntactic form in these instances seems to effect a sort of parsing into cortical subregions.
In current linguistics the most general descriptional contrast is the contrast between open and closed class linguistic forms: Open-class forms are categories of forms that are large and easily augmented, consisting primarily of roots of nouns, verbs, and adjectives. Closed-class forms are categories of forms that are relatively small and difficult to augment. Included among them are bound forms like inflectional and derivational affixes; free forms like prepositions, conjunctions, and determiners; abstract forms like grammatical categories (e.g., "nounhood" and "verbhood" per se), grammatical relations (e.g., subject and direct object), and word order patterns; and complexes like grammatical constructions and syntactic structures. Talmy 1996, 273 At the morpheme level, open class forms are, approximately, names, and closed class forms are, approximately, function words. Talmy suggests that open class forms evoke topics and closed class forms evoke relational order (1996, 265). Open class forms direct us to perceive or imagine objects, events and acts, while closed class forms direct us in how to perceive or imagine them relative to each other. If we change the closed class forms in a sentence but retain the open class forms, the sentence is still felt as being about the same thing. But if we keep the closed class structure and change the open-class forms, it is felt to be about something else. Lesion and imaging studies of open class function show clear evidence of regional specialization. People can lose various kinds of open class terms selectively, and PET studies of word retrieval tasks find that different categories of names set up activity in different parts of the cortex. Lesion studies find double dissociations for nouns and verbs, for instance. Among nouns, there is relative segregation for common nouns and proper names, and among concrete nouns there are dissociations of names for animate and inanimate things. The ventral object recognition and memory stream seems to be relevant particularly to open class grammatical forms such as nouns. PET and fMRI imaging studies find multiple small areas in the ventral stream responding selectively when people are shown pictures of objects, or when they read or hear their names. Generating concrete nouns also activates areas of the temporal lobe, with different sorts of object activating different areas of the temporal. Kinds of objects tested have included faces, animals, houses, tools, and local environments. Names and pictures of different object types are found to activate relatively broad, overlapping regions of ventral temporal cortex, but with differently located peaks of activity correlating with object category (Martin, Ungerleider and Haxby 2000, 1032). Identifying and naming pictures of animals sets up activity in three areas known to be responsive to form, visual detail, and biological motion. Areas responding to animals, fruit and vegetables also include "a region involved in the earliest stages of visual processing" (Martin et al 1996). Identifying and naming pictures of tools sets up activity in sites that respond to observed nonbiological motion and that are included in motor response networks associated with tool use (Martin, Ungerleider and Haxby 2000, 1032). Proper names of known persons activate areas of the right and left temporal pole. Response to concrete objects includes multiple categorical levels "required to specify the multiple sensory-motor interactions that a concrete entity engages with a perceiver" (A and H Damasio 1994). We have seen that object recognition response in the ventral stream includes feature and superordinate levels. Response to object and attribute names shows aspects of this fanned-out hierarchical structure of distributed nodes within recursively interconnected parallel streams directed toward anterior temporal regions. Response to feature names occurs early in these streams, close to primary sensory cortex, while response to names of whole or unique entities occurs in areas near the anterior pole of temporal cortex (Damasio and Tranel 1993, 4959-60). Names of object features activate nodes close to regions responsive during perception of those features. Generating color words selectively activates bilateral areas of ventral temporal cortex "approximately 2-3 cm anterior to regions known to be active during color perception"; this area was also found to be active when synaesthetes reported color associations to non-color words (Martin et al 1995, 102). Object identity networks in the left hemisphere have been found to involve less cortex than on the right, as if object memory during language use may have access to a more rapid, sketchier sort of sensory evocation on the left (Kosslyn et al 1992, Kosslyn et al 1989). Damasio and Tranel describe response to verbs, or response in the process of generating verbs, as "less stratified" than noun response (1993, 4960). Rather than evoking hierarchized categorical response, verbs evoke "manners of action of an entity and trajectories of an entity in space-time". Generating verbs activates interconnected areas at both sensory and motor ends of dorsal through-streams. Sensory response is found in left occipital-temporal-parietal areas approximately 1-2 cm anterior to the regions active when we observe moving objects (Martin, Ungerleider and Haxby 2000, 1025). Frontal response is found in left hemisphere prefrontal and premotor areas (near Broca's area) which are also active when we perceive or imagine action (Damasio and Tranel 1993, 4959-60). I have discussed the way the brain distributes activity differently in response to nouns and verbs. It must do so also in response to other categories of open class terms. Adverbs and adjectives can easily be thought of as evoking neuronal groups in feature constant areas such as color and object motion areas.
Open class lexical evocation is of course only a part of linguistic function. What about closed class terms -- articles, pronouns, prepositions, conjunctions -- and form variations that pluralize, inflect verbs, effect grammatical gender, and all the rest of the purely syntactic resources every language has? Network response to grammatical function words is less well understood than network response to names. If closed class effects are global dynamical effects rather than local intensifications of activity it will be difficult to find fMRI/PET evidence. One approach by cognitive linguists such as Landau and Jackendoff has been to look to purely linguistic effects for evidence of cortical function. Base level sensory-motor aboutness organizes itself in terms of whole circumstances with objects, self, actions, events, and backgrounds. Simulation is often similarly integrated. The particular magic of sentences and clauses is that they can set up some variant of these contextual wholes. Our first sentences invariably take the subject-verb-object form, Doggie gots ball, a form evoking basic perception/action structure. Linguistic effects cited by Landau and Jackendoff include differences in the way grammatical roles, considered part of the closed class system, direct us in imagining figure objects and background objects. Any name calls up structure relevant to the thing named, but the amount and kind of structure varies with the grammatical role of that name in a sentence. Landau and Jackendoff call something evoked as subject of a verb a figure object, and something evoked as object of either the verb or preposition a reference object. What they notice is that both children and adults attend to details of an object's shape when it is named as an object, but ignore the same object's shape when what is at issue is its role as a figure in a locational expression (1993, 227). A name in the subject position causes us to treat that thing as foreground object, and to call up what we know about it. A name in the object position causes us to imagine that thing as a background object, that is, to know about it almost only its spatial relation to the subject. It isn't clear just what the subject/object difference might mean in neural terms, but it must have something to do with amount and location of structure evoked in response to a name. Where it is subject of the verb, in The ball ran over the sand, the ball, must evoke more sensory structure than it does in He caught the ball. Similarly, sand, in There was sand stuck to the side of the ball, would evoke visual structure relevant to individual grains of sand in focus against a vague rounded surface, whereas, in The ball ran over the sand, sand evokes an undifferentiated sand-coloured surface. Landau and Jackendoff suggest these differences may be differences in degree of ventral and dorsal visual participation. Language from the other sideWe must consider speech before it is spoken. Merleau-Ponty 1964, 46 There is another reason for not using the term 'syntax'. This word suggests proceeding in a particular direction, such that a language is interpreted as a system of forms, to which meanings are then attached ... In a functional grammar, on the other hand, the direction is reversed. A language is interpreted as a system of meanings, accompanied by forms through which the meanings can be realized. Halliday 1994, xvii Speech and writing are kinds of action that, like running or eye motion, are peripherally gated from centrally organized structure. The way we sometimes imagine a word before we speak it can obscure the fact that a word comes into existence only when we speak or write it; it is made when it is sounded. It does not flow from brain to lips like a packet ejected with its meaning inside it. Like a gesture, it is not transferred but enabled. Like a gesture, it is performed by means of a wide net that is still there while the word is being pronounced. The word is produced in the standing context of everything we are as we speak it: the fine-grained multiple weave that is our meaning. We don't speak about our experience, we speak from it. We don't refer to it, we refer from it. The system sorts toward words (Goodman 1968); we organize a wording (Halliday 1994) by means of it. Because language is produced from a wide net that includes aspects of aboutness we are not intending to word, we give ourselves away. Haskell (1987), for instance, found that conversations in small groups were always concerned with power relations in the group, as well as the ostensive topics. The poet Paul Valery describes language-making as being like walking in the way it works off motor patterns but must adjust to instantaneous conditions, "which combine in a novel way each time" (1958). We are speaker and listener at the same time; like a walker recovering balance, we speak amid what we have heard ourselves saying. Or what we have heard ourselves being heard as saying.
Behind every utterance there is a person. It is not simply the words that mean; it is a person who means; and what the person means, intends to convey or declare or conceal and for what reason, is physically imprinted into the structure and texture of his language ... To the perceptive ear an utterance becomes not only a declaration by the writer but also a disclosure of the writer. Whalley 1985, 82 Having seen a little of the restructuring effects installed by linguistic forms and gating them, we are in a somewhat better position to think about traditional questions of representational effect. We have for instance tended to think of representational style as something about an object, the well-wroughtness of the urn; but style is more usefully understood as something about the makers and users of language. There are genres of aboutness, cognitive styles. We notice cultural differences of energy, rhythm, speed, and tone. Within cultural styles there are personal styles, and within personal styles there are moods -- all are dynamical modes, ways of organizing bodies. Manfred Clynes (1978), calls the study of dynamic styles in music sentics, because we experience tensional organizations of our own bodies as we listen. Music by Mozart, for example, has a characteristic feel. Clynes relates sentic organization to the dynamics of emotional states, each of which (anger, hate, grief, love, sex, joy, reverence) has a rhythm and speed so particular that the emotion can be evoked by their means. Style in music is emotional style by being dynamical style. There is tensional and emotional evocation in language too. Think of the dynamical differences between the circumspect calm evoked by Jane Austen, the elation evoked by Martin Luther King, and the speedy paranoia evoked by French literary theorists. Language is produced dynamically, so it invariably registers the energetic character of its maker. There are also studied stylistic effects. Phonetic repetition sets up the ear to hear more consciously. Short lines in poetry make us more conscious in vision (Collins 1991). Paratactic series and hypotactic subordination have a very different feel: the first is like tapping or pounding a table, repeating a gesture. The second is like using both hands to hold nested groups of things (or perhaps more visual than tactile, more like seeing nested groups of things). The evocative power of language is more comprehensive than a term like mental image can suggest. We evoke entire states. We set up a loom of fancy. Reading Whitman's lists sets us up to generate lists. We can wake from dreams in which we have been writing pages of Victorian prose. We recognize Mozart by his line, but what happens is larger than that. We become Mozart when we're entrained by his particular organization of tension and release. There are people we like being: we could say that's what liking their work means. Hearing a spontaneous sentence is a relief even when we don't like what is said; we like being the dynamics. Someone whose writing or speaking state is energetic and coherent sets us up temporarily to think strong thoughts beyond our usual capacity. How to talk about representingcontent, intentionality, meaning and the whole pudding PS Churchland 1995, 24 The syntax/pragmatics contrast in linguistics has been a way of distinguishing presence effects from representational effects. Tone of voice in this way of thinking would be pragmatic, while sentence form would be syntactic. The syntax/pragmatics distinction has obscured the fact that both tone of voice and sentence form are ways of structuring persons within contexts. More, the effect of tone of voice and the effect of sentence structure are integrated in practice. An advantage of thinking in terms of wide net effects is that we can think this integration in physical terms. All representing activity is pragmatic, in the sense that its effect depends upon embodied presence and particular context. The notion of pragmatic function has been a way to handle aspects of communication that could not be thought within the metaphor that has the representing object as the effective locus. We have had difficulty talking about the mediational effectiveness of the representing event because we foreground the representing object and ignore its structuring effect. The ways we talk about representation indicate our blind spot.
We call the thing we are about as an effect of the representing object its content, but the term is a symptom of our misunderstanding of the way we use representations. There is strictly speaking nothing contained in a representing object. A written sentence, which goes on existing on the page when no one is reading it, seems to contain whatever I or any stranger find when we come to read it, the way a closed box may contain a rock that anyone who opens it will find there. But representational objects have socially-correlated effects which come into existence only as structures of individual bodies. The containment metaphor sets us up to think we exchange representations like packages, and take them into ourselves with their contents, as if we are swallowing capsules. But representation is more like magic: we manage each other's cognitive states by technical skill. Representations organize people to be related to things or circumstances; they cannot contain that relation, they evoke it.
We commonly talk as if representing objects and events have meanings. Thus we talk about finding the meaning of a difficult sentence and we wonder whether we should describe music as having meaning at all. There is a good reason why we make this kind of attempt: representational events and artifacts are public. People are perceptually affected by them in similar ways. At times the more complex simulative structures that result from perceptual entrainment may be similar too. Since we seem to share a representational effect, we tend to say it belongs to the object. Representational objects are social objects; they are used in similar ways and we can not-incorrectly call these socially correlated uses their meaning. We could say meaning and structural aboutness are the same thing. We could go on to say meaning is something we are, not something we grasp or find. This makes meaning rather global: it would be the whole of the bodily structure by means of which we are being our cognitive selves. There is a lot to include: besides the neural configurations by means of which we are perceiving or imagining the world, there are the interoceptive structures by means of which the body perceives and imagines its own tissue states, muscle tension, endocrine concentrations. And maybe the neuroceptive states by which the net senses its own condition. At any time, only some of the active cortical net will be part of the integrated subnet by means of which we are sentient. If Edelman is right (see also Kinsbourne 1995, 1324) conscious attention is hyperactivation: whatever we are sentient in will also, as a dynamic consequence, be more finely and more widely connected and readier to organize action. It will also be the means by which we are our felt sense of something -- the whole co-present weave of perceptual and simulational structure. That will be meaning as far as I can mean it. In the context of this totality it is difficult and unnecessary to decide how much or what parts of that felt sense to call the meaning given by a particular sentence or image. There will be no fact of the matter. This is even more obvious when we include as meaning the structural response of the non-sentient net which surrounds and is interfused with the sentient net.
Many of these difficulties can be avoided, however, by treating reference as a process or set of expectancies that arises between individuals as a function of past experience, rather than as a mental entity residing within a single individual. On this view, reference is primarily a social function, rather than a mental entity, and it is more appropriate to ask whether an individual uses this function than to ask whether it has some specific mental structure. Savage-Rumbaugh 1993, 459 Primary structural aboutness is basic to communicational referring, and communicational referring is basic to representing. Being in reference to something is being structurally about it. The paradigm of social reference is a situation where I direct you to look at something we both can see. I am in reference to it and I refer you to it. I can do it by looking at you and then looking at it; I can do it by pointing; I can do it by speaking. Or I can make something that will refer you to it in my absence: an arrow pointing at it, in the simplest instance. A painting of a bison. A sentence or a line of computer code or a painting has no intrinsic aboutness. In itself, it does not refer or denote or allude or designate. Saying a sentence or photo refers is metaphoric, a manner of saying that we refer by means of it and can rely on our use of it to have the effect we want. If a representing object or artifact does not have a content or a meaning and does not refer, it cannot correspond, it cannot be true or false, and it cannot resemble. Questions about the meaning, reference, truth or similitude of representing forms cannot be answered in the terms in which they are asked.
We talk about representational practices as if they exist without us. While it is true that 'mathematics' or 'English' do not depend on individual users, and while it is certainly true that there are systematicities to be found in the ways they are used, we should notice that we are being metaphoric when we describe them as having structure independent of our use of them, the way the solar system does. The metaphor that describes notational practices as systems is a variant of the metaphor that describes a word as having a meaning. In both instances the representing form is being thought of as the effective locus. But we are about something by means that include, and are not limited to, the word, and we are systematic by means that include but are not limited to the notational practice. We do need to be able to talk about representing practices as if they are wholes; we need to be able to name representational media so we can think about differences in the ways they work. But these differences need to be described as differences in cognitive means and cognitive effect. Walter Ong's (1982) account of the differences between oral and literate uses of language is an example of this sort of description. By thinking about representational media in terms of their cognitive effect, we gain a way to talk about different classes of representing practices as evoking the same and different kinds of cognitive structure. We can begin to specify differences in the ways they do it. We can talk about a single integrated effect when media are, as often, combined. When we can understand the development of media as an accumulation of ways of working with embodied aboutness, we are more able to appreciate the sophistication of representational and cognitive practices we use routinely. We are more able to see how these sophistications are learned and taught: we can notice, for instance, the training in linguistic-graphic coordination provided by children's picture books.
The aboutness evoked during a live performance of music, for example, can be very complex. We can be 'listening to' the soloist, the composer, the conductor, the instrument (as a class or as that particular instrument). We can be 'listening to' the instrumental balance of the whole. We can be listening to the hall. To the nearest loudspeaker. To our own 'feelings' as we listen. More problematically we can be listening to 'the music'. The structure of the music. The meaning of the music. These are manners of listening. In every instance their cortical means is a wide net of activity, partially the same, partially different, as the listener attends differently. What listening to means can be quite different in the different manners of listening. We listen to a woman singing, or we listen to the oboe: these are instances of base level perceptual presence. So is listening to the hall. But listening to the music is something else. The logical grammar of the sentence suggests base level presence, but what we are doing is more complicated. We are really listening, but we are also using a metaphor to think this listening. We think and speak as if we are listening to a sounding object, 'the music'. Music is social use of perceivable form. Is it being used to set up simulational aboutness? Usually not in an obvious way. It is being used to manage aboutness. The aboutness is not base level; it is not a relation to objects and actions in background environments. But it is experienced as if it were, whenever we hear 'the music' instead of the singer. This is like seeming to see the photographed object rather than actually seeing a sheet of marked photographic paper. There is a difference, though, and it is subtle. With the photo, we seem to see something. With the music, we are really hearing, but it is the sense of hearing a something that is simulational. We elaborate this simulating by talking about the structure of the music. Critical writing about music or film or any of the other media needs this sort of locution. We want to talk about individual works, and we end up talking as if there were such things apart from the cognitive effects of arrangements of materials. A bird cry exists apart from its cognitive effect; it is the broadcast effect of the sounding state of a bird's body. It can have cognitive effect or not. A 'piece of' music, however, exists as such only in so far as it does have cognitive effect. To talk about music, we have to allow ourselves to be organized by it, and, immediately or later, we have to take note of being so organized. We have to determine how the effect was managed. Doing these things occurs by means of an integrated net. To say what we think, we have to allow this complex state to organize our speech. We can take this complicated attitude toward nonrepresentational objects and events too. We can look at a mark on the wall and say what we are brought to see or feel or imagine by looking at it. We can do this with the bird's cry as well, but when we do, we are thinking of it metaphorically, the way we think of 'the music'. We really hear it but we simulate hearing an object.
Chapter 7. Representational effects |