Being About table of contents     Chapter 3  bibliography    Chapter 4 What, where and how    Ellie Epp home page


BEING ABOUT  Chapter 3. Perceiving

Orienting

Responsive restructuring

Audition: an example

About vision

How to talk about perceiving

An epistemic base


In this chapter I will talk first about the two aspects of perception that together enable basic immediate aboutness. The first is orientation -- action the organism takes to begin to be in touch with where it is. The second is the sensor-driven responsive restructuring that occurs as a result of this action. In the second half of this chapter I will go into some detail about the recent neuroethology of auditory presence. I'll conclude with general implications for a philosophy of perception.

Orienting

Perceiving and acting

no line can be drawn between a sensory side and a motor side in the organization of the brain Nauta and Fiertag 1990, 106

Pictorial and other sorts of representational metaphor for perception set up an unworkable separation between perceiving and acting, which are essentially interconnected.

We cannot for instance see an object until we are looking at it. We have to have turned our heads, directed our gaze in some particular direction, and focussed at some particular distance. If something is moving we have to be able to track it.

Action, like object perception, occurs in the context of sensing our own action -- seeing the arm moving toward the object, hearing the hand on the table, feeling muscles and joints. And sensing our own action is part of perceiving other things. When we are tracking a moving object, feeling the muscles accomplishing vergence and lens accommodation is an intrinsic part of knowing that the object is moving.

Sensor and effector systems are separated in the spinal cord and brainstem, and the cortex maintains a degree of back-to-front segregation between sensory cortex and motor cortex, but given sensor-effector through-function, there must be progressive integration between the two. Sensory response is found deep into frontal cortex; there are cells in premotor cortex that respond to light (Rizzolatti et al 1996). Motor response is also found early in sensory streams; direct stimulation in an area of secondary auditory cortex can cause a reflex head movement (Nauta and Fiertag 1990, 106). As will be described below, many matrices throughout associative areas are sensor-effector covariant.

Task axis

"the situation of the body in face of its tasks" Merleau-Ponty 1962, 100

Referredness to a location is the common factor in perception of an object and further active engagement with it; it is the first requirement for both.

Source localization is an ancient ability, faster and coarser than conscious perception. This first aspect of object-related spatial function, which has been called attentional capture or attentional individuation, is thought to be performed by predominantly subcortical sensory-motor connections. In animals without a fovea, these ancient circuits turn the animal not the eye, turning head and body so that nose, mouth and paws are lined up toward the object. Oriented in this way the animal is prepared to move toward it and engage it with nose or whiskers. When primates are aligned so they are focussed on something at a particular distance, the slower thalamic path into visual cortex can begin to respond, and within the thalamocortical path the even slower high-resolution capabilities of foveal vision.

Spatial directedness prepares perception and action simultaneously. In primates, being keyed to a particular distance and direction is being keyed to the object found there. Being keyed to the object is being keyed to the object's spatial qualities -- its three-dimensional form, and its motion and orientation -- but it is also being keyed to what it is and what it is good for. Being keyed to object qualities is being keyed to action possibilities. Being keyed to the object's location relative to the body is being keyed to the kinds of motion needed to attain it. And so on. Spatial referredness gives us object and action in an already integrated way.

Dana Ballard, who emphasizes this directed engagement in his work in robotics, calls it deictic reference, or a deictic axis (1997). Deixis is a term taken from linguistics, and I want to retain it for representational contexts described in later chapters, so in the context of perception and action I will use another of Ballard's terms, task axis.

Acting to perceive

In the cortex, perceiving an object and acting in relation to that object both occur in the neural context of acting in order to perceive it. There is evidence that acting in order to perceive is what organizes a task axis (Ballard et al 1997). We focus on the thing we are targeting both to see it and to aim ourselves at it.

The structures that coordinate acting in order to perceive also coordinate perception of the same object by different modalities. For most purposes limb coordination and all the senses seem to make calibrational use of gaze. Sound response fields, for instance, are found keyed to eye position at many locations from midbrain to forebrain.

A task axis centers the animal, and its nervous system, on an object. Centeredness means several things here. Because eyes, ears, nostrils, and whiskers have fixed positions on the head and are bilaterally symmetrical, centering one system will center the rest; so will contact with the always-centered tongue. Visual fixation on a located object directs the fovea, the center of the eye, toward it. Centered foveal vision has greater resolution both at the retina and in many matrices upstream from the retina; Kosslyn points out that a number of retinotopic early vision areas have their foveal centers in register across visual cortex in the occipital (1993, 279). When the eyes are tracking a moving thing, the fovea remains stably centered on the object. Equally important, fixation at a location in depth means there is zero optical disparity between the two eyes for whatever is seen at that location.

When the animal is responding to a fixated object, there will thus be more detailed, more unified, more closely calibrated and more sustained response to that object at many cortical areas on through-routes anchored at the retina. The recurrently connected net set up by detailed and sustained response at these many areas throughout the cortex will concurrently effect object recognition, object evaluation, and motion toward that object at that location. The independent motions of an arm and a hand are organized at different cortical sites, but if both these sites are working off gaze fixation, they can be coordinated automatically by being directed toward the same location.

Rapid orienting motion in sensors and effectors happens in about 200-300 milliseconds, and Ballard suggests that the timing of saccadic eye motion between fixations, and the time of the fixation itself, may be an important time scale for action segmentation in humans (1997). Edelman's basic unit of sensor-effector through-function, a perceptual-motor categorization, is a motivated response triggered by a perceptual sample; not coincidentally, Edelman also suggests 200-300 ms as the time minimum for a sentient or core dynamic subnetwork, since it "would lead to the functional closure of longer reentrant loops and thereby support reentrant interactions among more distant regions" (Tononi and Edelman 1998, 1848).

The attention subnet

Perceiving is as much a motor phenomenon as it is sensory, and this is nowhere more understandable than in the process of selective attention. Mesulam 1981, 315

PET studies find that we set up an attentional axis by means of a distributed net with many foci. Mesulam (1981, 1990) describes a wide net that must include at least four cortical nodes:

A review of unilateral neglect syndromes in monkeys and humans suggests that four cerebral regions provide an integrated network for the modulation of directed attention within extrapersonal space. Each component region has a unique functional role that reflects its profile of anatomical connectivity, and each gives rise to a different clinical type of unilateral neglect when damaged. Mesulam 1981, 309

[3-1 Mesulam's attentional network]

A subcortical (reticular) subnet provides the underlying level of arousal and vigilance. A motor net coordinates orientational and exploratory motion. Multimodal sensory areas correlate responses from different sensors. A motivational net heavily connected to subcortical saliency nuclei resets synaptic properties in other nets.

The three cortical foci, Mesulam says,

... are probably engaged simultaneously and interactively by attentional tasks, and it is unlikely that there is a temporal or processing-level hierarchy among them. The resultant phenomenon of directed attention is not the sequential sum of perception plus motivation plus exploration but an emergent (i.e., relational) quality of the network as a whole. Mesulam 1990, 600

By means of this large-scale net for attentional directedness other sensory and motor areas are invoked. These include primary and unimodal association cortex important to any or all of the senses. The subcortical thalamus, which gates all sensory modalities into the cortex, may synchronize cortical response for common action.

The resultant network sustains multifocal convergence and some degree of serialization but is especially well suited to the temporally coherent and reentrant sampling of disjunctive informational sets and also for parallel distributed processing. Mesulam 1990, 601

Since acts like reach and grasp can be guided by alternative senses, or by several senses at once, the attentional network must manage mutual facilitation among sensory systems directed toward the same location, and inhibition of sensory systems irrelevant to the task. It must maintain an axis for as long as it takes to find out about a location, or to deal with something at that location, or both. With complex tasks, the net may have to maintain a structure long enough to complete an action, even when an object has gone out of sight. It must be able to disengage and then reengage at another location, or scan for something that hasn't been located yet. Disengaging, searching, and reengaging are aspects of attention thought to involve partially different areas, both cortical and subcortical.

Attention (like imagining, understanding and feeling) has historically been taken as an agent or faculty, and there have been attempts to find the cortical locus of attention somewhere in the cortex. Mesulam's description of an attentional network parses the notion of attention into aspects -- general arousal, neurotransmitter priming for sensitivity to certain kinds of object, and oriention toward locations. As Mesulam understands it, attention is the organization of the creature engaged with some thing some where. It thus has much in common with the notion of a task axis.

Responsive restructuring

Order is established by the fact that this complicated reflex apparatus becomes active, and is kept active, through the total stimulation of the environment. Goldstein 1995, 89

Orientation and sensor response are complementary aspects of perceptual skill. I have emphasized orientation activity in the section above, and in what follows I will emphasize environment-originated wide-net structure. I will make several very general points first, and then illustrate them with a quite detailed account of auditory perception.

If we understand perception to be the creation of object-relevant structure in the perceiver, it is apparent that the object itself must have a role in creating perceptual structure. Order in the world can be a strong source of order for the nervous system. By means of deep loops -- from sensors through midbrain to primary sensory areas and then through association cortex to motor areas -- the organism's dynamical coupling with its environment can predominate in setting up the wide net organization of the moment. This will be so both for the global net and its core dynamic subnet. Even at foci deep into sensory-motor through-streams, the most important organizing factor at a given moment may be what is happening at sensory sheets.

Sensory contact with the world is energetic; active sensors are sources of energetic flow into the cortex (Tomatis 1987, 1991). When they are entrained by strongly oscillatory events in the environment, sensors respond with strongly patterned activity. Propagated downstream to many foci and integrated iteratively through reentrant connections, incoming pattern can be a source of wide net coherence. The stability of environments can stabilize a wide net when sensors go on being driven by the same scene or object.

Proximal fields and deconvolution

When we talk about perception we usually start by talking about the senses, but we should instead start by talking about the universe. Perception is possible only in worlds whose materials have already sorted themselves into systems segregated and integrated at many spatial scales, interacting over many temporal periods, vastly and minutely coordinated so that nothing happens without altering many other things in systematic ways.

The simplest forms of proximal perception are possible because something about an animal's body changes when it is in contact with something about the world. Distal perceiving -- sonar, vision and audition -- also requires contact, but in these instances it is contact with mediating acoustic and electromagnetic fields.

A mediating field integrates effects of environments that may be extraordinarily complex: the environmental facts that contribute structure to a field may include the mass geometries of stable and moving shapes, the surface organization and material character of many substances, and the energetic quality and temporality of events at many temporal scales. The current state of transmission media (atmosphere, water, illumination) and these many environmental facts jointly determine the field structure available to a perceiver.

How perceivers contact these fields depends on their own position and motion, and on the external structure, the state, and the position and motion of their scanning surfaces. What happens as a result of contact is a covariance of perceiver-internal structure with proximal field structure, but it is also -- because the proximal field is itself varying with so much else in the surrounding world -- a covariance with distal facts.

Because proximal field structure is an integrated global result, perceivers needing to know some particular distal fact will have to be able to tune their own structure to respond primarily to the particular contributions made to the field by that fact. That is, they will have to deconvolve that contribution -- which is to say that some aspect of their own structure will have to covary with a particular distal fact rather than with other distal facts whose effects are also being summed into the proximal field.

How this is done needn't be simple. Advanced distal perceivers have many scanning surfaces -- some of them tuned to different kinds of field, and some of them scanning the same field from slightly different positions. Responses at any of these surfaces are propagated deep into the observer's body always by routes that allow interaction of responses from different transducers.

So there are in principle many ways deconvolution could happen. Response from any single transducing surface could vary with a particular distal fact. A single transducing surface could be organized into zones or interdigitated fields each of which are varying with different distal facts, these responses being propagated by different routes or multiplexed on the same route.

Or there may be nothing that could be called covariance with distal facts before there has been an interaction of responses from two or more transducing surfaces. The interaction required may have to be staged, iterative, and protracted. Many of these interactions may be happening at the same time, so that different aspects of perceiver-internal structure can be varying with different distal facts, all at the same time.

Many of the details of observer-internal response organization are still unknown, and the multidimensionality of internal response variables involved is hard to imagine, but at the most general level what we know about distal perception is this: it is possible because there is complex covariance within environments, within perceiving bodies, and, in precisely gated ways, across perceiver-environment boundaries.

Many sources, no sink

...the neural pathways from perception to action are high-bandwidth all the way through. If anything, the bandwidth increases toward the center, rather than narrowing down. Haugeland 1998, 228

When we talk about individual senses such as vision or hearing, we tend to think in terms of small scanning surfaces tuned to different ranges of the continuum of frequencies present in ambient fields. This picture changes if we consider that, in the more complicated animals, the whole external surface is sensitive. Certain patches of surface are specialized for narrow frequency windows, but taken together, the animal's boundary is responding at a very wide range in the frequency scale.

Response at various frequency scales converges toward different entry-points into the cortex, but from those entry points it fans out and propagates through, so there is cross-modal contact of many kinds at many points. The same object or event may alter ambient fields at many scales at once. Since alterations at different frequency scales are often systematically related (those caused by speech sound and visible lip movements, for instance), convergent multisensory contact can be necessary, rather than supplementary, to successful perception (Gibson 1966, Stoffregan and Bardy 2001).

Sensor-driven response is broadband because all the sense modalities together cover a broad region of field frequencies. It is broadband all the way through because sensor-driven response in any modality fans through the cortex, diverging and converging at matrices covarying for many purposes. It is also broadband all the way through because there are many termini for these streams, many effector systems being organized by means of many foci in many streams.

Locational facilitation

In the current framework, attention ... is fundamentally location-based. Farah 1990, 153

Slotnick and others have found a cortex-internal aspect of spatial orientation that modulates sensory effect: they have found that sensory response is facilitated for the spatial location being fixated. Attention to a location in space produces facilitation that is maximal for that location and gradually falls off with increasing distance (Slotnick et al 1998).

These results suggest a sensory aspect of an established task axis: automatic facilitation of sensor effect in relation to whatever is being targeted by the oriented creature. If it occurs in primary sensory cortex, effects of this facilitation will presumably extend upstream to matrices working off primary response, probably all the way to motor cortex.

Maintaining and accumulating

Perceiving takes time. Perceiving more takes more time (Gibson says it can take years).

The accumulation, stabilization and clarification of structures of neural activity over time may account for certain observed cognitive effects. There are well-established temporal minima for perception: it takes 60-70 ms to see even a very simple experimental stimulus. With phenomena presented more briefly than that minimum, we do not experience true onset or correctly estimate duration (Crick and Koch 1992, 157).

Networks that establish a task axis must also maintain orientation. When we go on directing ourselves at something, sensors go on responding to it. Driven by continued sampling of a proximal field, the net being organized through the sensors stabilizes, recruits, builds and rebuilds itself. Neuronal groups in the many covariant foci upstream from sensory surfaces have time to settle. Reentrant networks have time to form. Relevant structure accumulates. Directed from more -- or more precisely discriminatory -- structure, action can be more precise; embodied in more extensive, more stable, and more integrated structure, experienced perception can be more detailed or more comprehensive. It can be richer.

Object subnet segregation

The dynamical engagement of a body with proximal fields is temporal on macroscopic and microscopic scales. All the spatial senses use temporal characteristics of the proximal field (or of their interaction with it) to find and orient to an object, and also, once found, to determine what it is and to perceive it as an integrated whole.

There are temporal aspects of the perceptual situation, or of the environment itself, that will tend to segregate structure relevant to different objects, although each is perceived over extended periods of time, and all are being perceived at the same time.

Events have their own macrotiming. Gross cross-modal correlations in response to (relatively nearby) objects and events will begin and end at something like the same time at sensor surfaces of different kinds. This mutuality of timing may be important where sensory streams converge in multimodal association areas.

Entrainment with sensory microtiming is important to both object localization and object perception. Since objects of different kinds organize ambient fields to oscillate at different rates, differential entrainment may organize perception of object identity. Audition relies on temporal characteristics of ambient pressure waves, and, as will be seen in the next section, it uses subtle timing correlations and phase off-sets to find auditory sources. Vision is also based on wave phenomena, and so is also a microtemporal function; observer entrainment with object microtiming is thus important to object and scene vision as well.

A research group working with a connectionist model of voice recognition found that temporal microstructure may be used to segregate specific, object-related subnets:

... from an initially totally interconnected set of tonotopic neurons two distinct groups of neurons -- corresponding to the two distinct voices -- arise. The mechanism of this segmentation is the temporal synchronization of simultaneously active cells using a postulated fast Hebbian synaptic modulation mechanism (on the tens to hundreds of milliseconds scale). von der Malsberg and Schneider, 1986, 204

Edelman suggests that fine-scale oscillatory characteristics of objects may be used not only to recognize objects, but also to organize entire subnets as object-relevant configurations.

Short-term correlations (i.e., correlations with narrow peaks on the order of a few to tens of milliseconds) may be particularly important for neural function, because the membrane time constants make them highly responsive to the degree of synchronization of their synaptic inputs. Tononi, Sporns and Edelman 1992, 311

Through successive cycles in recurrent loops, resonance or coherence may thus be achieved over the whole of a subnet, which is thereby differentiated from other subnets being organized in relation to other objects at other locations. What is being suggested is that we see objects (and act toward locations) in unified ways because the networks by means of which we are seeing and acting are synchronized through object timing and network recurrence.

Microelectrode recordings have revealed the presence of such short-term correlations within and between several areas of the cat visual cortex, in the form of fast, coherent oscillations. In a series of simulations based on these experimental results, we showed that coherent oscillations can arise within neuronal groups on the basis of excitatory-inhibitory interactions, and that short-term correlations among different groups may signal the linking of similar stimulus features. Recently, we have indicated how the establishment of such correlations among neuronal groups within a primary visual area may provide a neural basis for perceptual grouping and figure-ground segregation as well as for some of the Gestalt laws. Tononi, Sporns and Edelman 1992, 311

Subnets distinguished by a particular phase are called phase coherent, or phase cohorts. Since frequencies of oscillation organized by similar objects at different locations will arrive at an observer's position with different relative phase, location can be used to discriminate objects of otherwise identical form. Phase differences of these kinds could be used when we see that things in different places are both red, or are both red and both apples.

If Edelman's thesis turns out to be right, we will have resolved what has been called the binding problem. Like many other long-standing problems of cognitive theory, it will have been solved by understanding the dynamical self-organization of environmental and neural systems, and the dynamical embedding of the latter in the former.

As with other long-standing problems, our metaphor will have been at fault. Talking about binding seems to imply that 'features' or parts or aspects are perceived separately and must then be bound together. Object features, as perceived, are of course results rather than causes of object perception. They are perceived because objects at locations are being perceived.

As will be seen later in this chapter, the perceptual situation itself also has aspects that tend to organize feature and categorical response, since particular neuronal subgroups within a object-relevant subnet may be called on in relation to many other objects (edibility, for instance), or objects of many kinds (color, for instance). Along with objects themselves, we perceive object features and object categories as a result of something about the object, something about ambient media, and something about the perceiver.

Audition: an example

This chapter has been considering how perceptual theory should change if we think in terms of whole body structural adaptation and cortical wide nets. The main points so far have been these:

1) Perception has two aspects always -- what is done to be oriented to something somewhere in an environment, and what occurs throughout a cortex as a result.

2) Cortical alteration as a result of sensor engagement with ambient fields is co-caused by things in the world and the existing structure of the organism.

3) Perceptual structure is distributed across many nodes along streams traversing most of the cortex. What happens at these nodes coordinates many kinds of aboutness simultaneously.

I have said that to be able to think better about perceiving and other sorts of cognition we need to be able to imagine how it is done. In this section, I will go into some of the technicalities of auditory aboutness and its cortical facilitation.

Audition is not simple, but it is simpler than vision, and it is the sensory mode we understand best. My description draws on recent neuroethology and its interpretation in connectionist models. Like much of what I have to say, it owes much to James and Eleanor Gibson. I see the neuroethology reported below as confirming and extending their vision, and the connectionism as explicating their notion of tuning.

Mediated distal perception

The atmosphere can mediate acoustical contact because it is elastic: it takes and propagates structural alteration. A vibrating object broadcasts every tremor. When one blade of grass is blown against another, every detail of the temporal structure of that tiny event travels away from it in all directions. Anything happening on a scale that starts a pressure wave train patterns the air.

Near its source, a wave train is precisely correlated with object, event and location: its component periodicities, their relative energies and timing, are exact consequences of what happened and what or who it happened to. Because pressure wave trains travel outward like expanding concentric spheres, they also are exact consequences of where it happened.

A complication is that all these crossed and interwoven wave trains reflect, or are absorbed, or partly reflect, or partly are absorbed, wherever they meet an obstacle or change of medium. The perceiving creature, standing in the atmospheric sea, will intercept many wavetrain versions of the same event, or will intercept a wave train when its energy is unevenly attenuated.

But order lost can here be seen as order gained. Wave trains converging on the perceiving creature are changed in ways precisely correlated also with where they have been -- through miles of desert air, back and forth inside a canyon, under a door. Pressure wave trains by the time they reach the perceiver also are exact consequences of the atmospheric conditions and the reflective and absorptive environment. In consequence, many things about the surrounding world are specified at any point in an acoustic field; particular objects or events or directions or atmospheric facts will be specified by different covariances present in the array. Creatures who hear at all hear successfully, so we know auditory systems are able to do this very complex thing -- comb out the different covariances that specify object, location and environment.

We do not hear with our ears. We hear with the help of our ears, but properly speaking we hear by means of the entire auditory system -- all the streams, loops, matrices and multiplexing through-nets of the auditory nervous system, and all the motor states by which we search for acoustic pattern. Senses are not end organs: they are systems that reach all the way up into the brain (where they intersect and interpenetrate other senses) and then down again into muscles that turn the head or twitch in the pinna.

Hearing is knowing by means of the auditory body. Auditory knowledge is the spatiotemporal structuredness by which the listener is managing to hear, and which is occurring so that the listener's auditory system can pick out one set of higher-order patterns rather than another. Auditory competence is the listener's pre-structure, the auditory system's inherent and experienced order. The listening moment is that pre-structure selectively, responsively active, picking out, "resonating to", higher-order pattern in the ambient array. It is a listener physically configured in a way that is specific to something about the world, to what is being heard.

A listener comes to be so configured by being in contact with the world. Pressure wave trains are patterns of impact. The world never stops touching us. It can touch the inside as well as the outside of our bodies: when a helicopter rises over our heads, we feel ourselves touched in all our tissues. Some voices are felt as very pleasant touches in the solar plexus. Other kinds of acoustic touch are so fast and sharp we don't feel them as touches, and yet we hear by means of them because and only because they are touches. They literally communicate with us: they communicate energy and pattern. The tympanum is the window by which the smaller-scale patterns can get into us: a bottleneck. From that point on it is the wider nervous system that finds the patterns in the patterns.

We are temporally entrained by what we are hearing: when we hear a truck shift down at the lights, we hear it not quite when it happens, but in the order that it happens. And we are spatially reorganized in hearing it, though in very minute ways. Temporal entrainment and responsive spatial reorganization are what Gibson means by resonance. What Gibson's approach gives is a way of understanding the acoustic and neural processes of perception not as signal processing but as comprehensive contact.

Generally speaking we hear the sorts of things we have evolved to hear -- the sorts of things that are relevant to the kind of creature we are. We hear what we can attend to. We don't hear the fluttering passage of a raised dot on a moth's wing, although a bat does. We do hear our friend's pleasure or the wetness of the street.

Hearing a seagull fly over the roof we are also hearing the open air that allows its passage. In the particular balance of the seagull's cry with traffic noise we hear that it is early morning. When we hear a train at the crossing seven blocks away we are also hearing the presence of that reach of space around us.

Neural audition

Auditory response in any animal cortex originates at the cochlea and invariably retains a tonotopic or cochleotopic organization as it propagates to matrices further up.

The micromechanics of the basilar membrane of the cochlea -- membrane stiffness, membrane mass, and density of cochlear fluid -- cause its response to pressure waves to vary systematically, from base to apex, with displacement. Subsurface neurons are able to propagate this differential response.

In effect, the cochlea sorts wave trains into their component periodicities, and so responds specifically to the presence (or sometimes absence) of any particular frequency. It does so by activity at a position on a spatial scale corresponding to the spatial continuum that is the basilar membrane. This spatial scale can be thought of as projected in parallel right up through all the auditory nuclei in the brainstem to the surface of the cortex, and then even further, to the forebrain and hippocampus.

Speaking very generally, a hearer begins to respond when a sounding event begins, and stops responding when the sounding event stops. On a finer scale, the auditory nervous system may phase-lock the response timing in its scaled bank of frequency filters so that frequency responses are synchronized with each other as they project up through the brainstem.

Frequencies in an acoustic field also have phase and amplitude properties. These set up temporal rather than spatial patterns. The amplitude or intensity of a frequency component may determine axonal spike count per second, for instance. It can also alter response phase relative to some reference oscillation.

Because time patterns result from both event timing and amplitude variation, the auditory system is thought to segregate event timing and intensity response for at least parts of the march of frequency response up to, and through, the cortex. In other words, the frequency response pattern is sent up two different pathways, one responding primarily to stimulus time and one responding primarily to intensity.

The split between time and intensity response is prepared at auditory nerve cells with different properties. In bats, for instance, the auditory subsystem that differentiates target velocities needs initial frequency response that is very precise. But the auditory subsystem that differentiates target distance needs less sharply tuned frequency response and more precise time response (Konishi 1988). These differences originate at the first auditory nucleus.

The essential thing about propagated cochlear response is that it is selective alteration of cortical structure by spatial and temporal means. Scattered through the cortex there are matrices structurally changed as a result. Neuronal groups in these matrices can be thought of as responding to complex combinations of frequency, amplitude and phase response, and, in some kind of approximate sense, these units can be thought of as responding to spectral envelope, formants, and onset or offset transients -- the staples of psychophysical acoustics.

Auditory matrices

There are matrices at all levels of the auditory nervous system, from the auditory nerve just behind the cochlea, through primary and secondary auditory areas and on into multimodal association cortex.

Some auditory streams fan in from more cells to fewer, but in humans the auditory projection fans out, from 90,000 cells at the cochlear nucleus in the brainstem, to 400,000 cells in the inferior colliculus of the midbrain, to ten million cells in primary auditory cortex (Edelman et al eds 1988). Some projections pass activation patterns without significantly changing them, but more often a pattern of activity is transformed as it propagates through a matrix. Different kinds of transformation are possible. The simplest is pattern clarification: noise is eliminated by cooperative settling among interconnected cells at the same level, or by weighted connections with a next layer (Mead 1995, 256).

Response patterns from the two ears are convolved early, at the cochlear nucleus. In the superior colliculus in the midbrain, auditory response is convolved with visual response. At various levels all the way up, auditory response makes motor connections. As always in the nervous system, back-connections are also present.

In humans and other mammals an area called the primary auditory area or A1 receives almost exclusive essential projection from the medial geniculate nucleus or MGN, which is the last and most important switching station between ear and cortex. A1 is arranged as a continuum; columns of cells that respond maximally to a particular frequency are found adjacent to columns responding maximally to a slightly higher or lower frequency. Motion of particular regions of the basilar membrane will activate corresponding neuronal groups in A1.

In humans A1 is located on the superior temporal plane, that is, on the flat upper surface of the temporal cortex, hidden inside the Sylvian fissure that separates temporal cortex (adjacent to the ear) from parietal cortex (above and behind the ear). The exact function of A1 is not known, but PET studies have shown A1 to be bilaterally activated by any auditory stimulus (Zatorre 1985, 38).

Secondary auditory areas (one of which is called A2 in mammals) also receive direct projection from auditory centers in the MGN, but they receive propagated activity as well from other parts of the cortex, and are therefore called auditory association cortex. Careful microelectrode mapping of auditory cortex in macaque monkeys (Merzenich 1973, 292) has shown at least five secondary auditory areas, which are thought to be distinct regions both for cytoarchitectonic reasons and because several of them show tonotopic response. Four are distributed around A1 on the superior temporal plane, and the fifth lies just adjacent to one of these. It is thought that secondary auditory cortex may also extend a short distance into the upper, parietal surface of the Sylvian fissure.

[3-3 Macaque primary and secondary auditory areas]

Areas of the temporal lobe adjacent to regions identified as primary or secondary auditory cortex are called periauditory. In some of these areas, visual and auditory through-lines make contact on their way to object recognition areas in the temporal pole (Nauta 1990, 105).

Bats, owls and cats

We know much more about auditory function in animals than we did fifteen years ago, by reason of a convergence of work in ethology, neuroscience and connectionist computer modeling.

Ethologists study the behavior of animals in their natural habitat; neuroethologists work with animals in laboratories, but they attempt to study behaviors or abilities that would be important to the animal in the wild. Neuroethologists studying animal audition will thus study neural response to biologically important sound, often in animals with auditory specializations, such as bats, owls and cats. Auditory neuroethology uses the findings and some of the technologies of auditory neurophysiology: action potentials are recorded from microelectrodes while recorded sounds are played to the animal through either headphones or loudspeakers. Bat studies use recordings of the bat's own pulse emissions with computed simulations of echos at various rates of delay and frequency shift.

Source localization

Barn owls hunt at night and are able to locate their prey by hearing alone. It is thought that their auditory systems use inter-ear time differences to find the azimuthal (right-left) position of a sound source, and that they use inter-ear intensity differences to find its elevation. The two coordinates are differentiated separately and then azimuth and elevation response is convolved at the superior colliculus, which is the higher of several midbrain auditory nuclei, and which also has motor connections setting up reflex head and eye orientation toward the located sound source (Konishi 1988).

Spike discharges from frequency-sensitive neurons in the earliest of auditory nuclei are phase-locked to the pressure events whose arrival they are reporting: in other words, there is a common delay between pressure wave arrival at the cochlea and spike discharge, so all frequencies are reported synchronously. This phase-locked common response is propagated up through the brainstem in parallel. At the cochlear nucleus, the phase-locked array is propagated both up to the next nucleus on the same side of the brainstem, and across to the corresponding nucleus on the contralateral side. The nucleus at which phase-locked spike trains converge from the owl's right and left ears is the nucleus laminaris.

The nucleus laminaris is a time-difference matrix, a 3D matrix organized so that phase-locked parallel frequency response arrives from the ipsilateral ear at the edge of the matrix nearest to the back of the owl's head, and response from the contralateral ear enters the matrix from the edge facing the front of the owl's head. Fibers from both sides of the matrix, and thus from both ears, interdigitate across the matrix.

The temporal organization of the matrix involves delay lines. Activity arriving at either edge is propagated into the matrix by a range of delays that corresponds to the range of interaural time disparities the owl encounters. In every frequency band, depth into the matrix is correlated to a specific temporal offset from arrival time. Phase-locked spikes from the ipsilateral ear thus arrive at the back (dorsal) edge of the matrix with no delay, but they arrive at the ventral edge with maximal delay. For phase-locked spikes arriving from the contralateral ear, relative delays are the obverse.

Neurons in the nucleus laminaris time-difference matrix are coincidence detectors, which may respond weakly when activated monaurally but will respond maximally only when activated simultaneously from both sides of the matrix. Although phase-locked response arriving from either ear is broadcast into the matrix at all the delays afforded by the range of delay lines, it will have a critical effect only at the one position in the matrix where it coincides with activity transmitted from the other side. Activity at this position picks out a particular interaural delay ratio. A neuron maximally active at this position will begin to direct the owl's gaze, head motion, or flight toward the azimuthal position of a sound source.

The nucleus laminaris projects to several nuclei at the midbrain, where, in sequence, a) neuronal selectivity for interaural time difference is sharpened, b) response to different frequencies converges on single neurons that respond more generally; and c) time and intensity streams converge to form a bicoordinate azimuth-plus-elevation response to source location (Konishi 1990, 3245).

The optic tectum of the barn owl is also thought to have, in register with it, a matrix whose units respond to activation from the eyes (Brainard and Knudsen 1993). A more recent study (Brainard and Knudsen 1995) reports neurons in this region of the optic tectum whose response is bimodal: they will respond to a particular interaural time difference, or to activity in a particular visual region, or to a coincidence of both.

The fact that sense modalities are already integrated at the midbrain, and the discovery of bimodal neurons, revises our tendency to think of sense modalities as physiologically segregated. But the most striking aspect of Brainard and Knudsen's work has been their finding that the auditory response properties of these bimodal neurons are developmentally calibrated to their visual response properties.

Baby barn owls raised wearing prismatic spectacles that shifted their visual fields 23 degrees to the right were found to have visual receptive fields in their optic tectum systematically shifted by the same amount, and their auditory receptive fields shifted along with them. Bimodal neurons that would normally have responded to a sound source or a visual event at x were now responding to events shifted 23 degrees.

But it was also found that this tuning dependence of auditory response on visual response was not automatic. In a separate experiment, baby owls were not fitted with prismatic lenses until they were 60 to 80 days old. By this time their bimodal optic tectum neurons had established normal response properties. When they were then fitted with prismatic lenses their bimodal neurons' visual receptive fields shifted (of course) immediately, but their auditory receptive fields took several weeks to shift. In the intermediate states, the neurons' tuning to interaural time differences was found to be very broad. Neurons took longer to respond to interaural time differences, and when they did respond their responses lasted longer. These intermediate response characteristics are reminiscent of the behaviors of connectionist hidden units when they have been training up to a new data set.

The story of auditory source location in the barn owl illustrates the most important principle of sensory matrix function, that is, the way response from different sensors, when convolved, becomes response to higher-order facts about the animal's environment. The following story illustrates another of the many functional possibilites offered by sensory matrices.

Sensory magnification

The auditory neurophysiology of bats is specialized in interesting ways and as a consequence bat audition has become important to neuroethologists as well as to bats.

A bat is a very small mammal: its entire brain is "about the size of a large pearl" (Suga 1990). Nevertheless, auditory functioning of the bat central nervous system has been mapped in unusual detail. A number of auditory regions have been found both in the bat's cortex and in lower nuclei, and we have recently come to know quite a bit about how they work and what they accomplish.

Bat audition is unusual in the importance it gives to wave trains emitted by the animal itself, and thus to reflected wave trains, but the general principles of acoustic perception, as we are beginning to understand them in bats, generalize well: what we see in operation are through-lines, matrices, and networks.

Eight tonotopic matrices have been discovered in auditory cortex of the mustached bat (Suga 1994). I will describe two of them below.

[3-5 Mustached bat auditory cortex]

One of the auditory matrices discovered in secondary auditory cortex of the mustached bat is called the CF-CF area because it correlates responses to constant frequency pulses the bat emits when it is hunting flying insects. The pulses emitted consist of a fundamental (about 30.5 kHz) and three harmonics.

[3-6 Bat CF/CF area]

The bat is able to control the energy of each harmonic. The fundamental is always very low in intensity: it is the bat's reference frequency, and also the frequency by which the bat knows its own pulse from those of its companions. The prominence given other harmonics varies according to conditions. Low frequencies will be less attenuated by distance, but high frequencies are useful when closing on small, fast, near objects.

The primary function of the CF-CF area is to prepare the animal to avoid obstacles and to respond to target velocities; it is in effect an array of Doppler shift detectors. It is tonotopic along several axes at once. Its elements respond to specific combinations of pulse fundamental and echo harmonic. Pulse fundamental frequency varies along one axis, and the second and third echo harmonics increase along the axis orthogonal to it, segregated in strips adjacent to each other (Suga 1990, 65). These strips show a disproportionate number of neurons responsive to velocities the animal encounters in important maneuvers such as closing on prey or docking at a roost.

In the mustached bat the range of best frequency response across the tonotopic spread of primary auditory cortex is approximately 10-100 kHz. Around 61-61.5 kHz this array bulges into a specialized subregion that takes up 30% of A1 and is given its own name. It is the DSCF -- Doppler shift compensated constant frequency -- area. Columns in this matrix are 40-50 neurons deep. Each column responds to a particular combination of frequency and amplitude of the second harmonic of a constant frequency pulse echo.

[3-7 Bat DSCF area}

The DSCF area is radially organized: moving out from the center, the best frequency of the neurons increases; moving circularly, the best amplitude changes (Suga 1990, 63). The DSCF region's specialization is prepared at the cochlea, which has a similar expansion of tuning sharpness around 61-61.5 kHz, the frequency corresponding to the normal second harmonic of the bat's resting pulse fundamental.

In resting conditions the echo second harmonic will also fall into this area. But when the bat is using pulse-echo frequency shift to detect target velocities, Doppler shift can increase echo second harmonic frequency to a point where it falls outside the cochlea's area of increased sensitivity. The result would be a relative audio blindness if the bat did not have recourse to Doppler shift compensation. By lowering the constant frequency pulse it emits by about 2 kHz, it brings the Doppler shifted echo into its sensitive region. A further specialization of the cochlea is that it is particularly insensitive to frequencies around 59.5 kHz (the exact frequency depends on the individual animal's resting pulse frequency) which would be the bat's pulse second harmonic frequency. This prevents masking effects.

The CF-CF region of bat secondary auditory cortex is used to detect velocities by means of pulse-echo frequency shifts, as described above. Why does a bat need the much finer discriminations of frequency shift that are made possible by the specialization of the DSCF area? Illustrations of the mustached bat primary auditory cortex look as if a magnifying glass had been laid over it at 61 kHz. The visual illusion corresponds to a functional truth. The mustached bat is so sensitive to frequency differences in this range that it can pick out frequency shifts an order of magnitude smaller than those useful in detecting wing motion in flying insects (Suga 1994, 189). What is implied is that the DSCF region can be used to pick out detail on the fluttering wings of the insect.

Response in a column of the DSCF matrix is response to a conjunction of frequency and amplitude. Echo amplitude varies with the surface material of the object reflecting the biosonar pulse. Combined with the DSCF region's very acute frequency resolution, amplitude resolution would seem to be response to the details of a flying object's substance as well as to the relative velocity of these details as the wings flutter. Taken together, DSCF response would give the mustached bat a finely tuned ability to identify prey or conspecifics.

Localizing them is the task of other tonotopic matrices connecting specific delays between pulse fundamental and echo second harmonic to motor patterns for swerving or accelerating. Pulse-echo delay is a measure of target distance: a one-millisecond delay will correspond to a distance of 17.3 cm at an air temperature of 25 C (Suga 1990, 65). Distance matrices are organized so that iso-delay bands -- bands with neuronal groups responding to the same target distance -- are orthogonal to an amplitude axis. As in the bat's DSCF area, amplitude is correlated with fine size and texture characteristics of a target, so maximal activation at some position in this matrix will set up response to a particular kind of object at a particular distance.

Matrix multifunctionality

Nobuo Suga, who has been responsible for the bat studies that give us our most detailed picture of nervous system auditory function, says he has begun to believe that our functional descriptions of auditory matrices have been too simple (1994).

We know that the response of a single neuron in an pulse-echo delay matrix, for instance, is affected by several acoustic parameters other than echo delay. What this means in practice is that such a neuron would show its sharpest tuning to echo delay but there would be a systematic relation also to echo amplitude, echo frequency, and Doppler shift.

The same neuron will also respond weakly to high amplitude communicational calls. It is possible that the function of a bat's auditory matrix will depend on whether the bat is in echolocation mode or in communication mode. "This clearly indicates that the characterizations of single neurons in our past experiments are partial" , Suga says (1994, 136).

I propose a 'multifunction' theory which states that each specialized area has a primary function for processing a particular aspect of an acoustic signal and non-primary functions for processing others. The spatial distribution of neural activity of this area depends upon its primary function, but the magnitude of neural activity is influenced by its non-primary functions. (1994, 141)

The discovery or acknowledgment of this sort of multifunctionality is the most challenging aspect of recent auditory neuroscience. It is difficult to picture multifunctional activity, because units must be imagined to be doing different things not only at the same time but in the same space: in the same matrix. It becomes difficult to know what to call what it's doing: is it 'representing' hundreds of kinds of 'information' at the same time? Theodore Bullock of the Neurosciences Institute at UCSD talks in terms of "multiple, simultaneous codes among the many parallel channels":

A given neuron, or even a given axon, probably conveys information at the same time in two or more distinct codes for different recipient cells or arrays. One post synaptic cell, for example, may read the number of spikes in the last 0.5s, with a certain weighting for the most recent intervals, while another post synaptic cell integrates over several seconds, and a third is sensitive to so-called trophic factors or to irregularity of intervals or to collaterals where impulses do not reach the terminals and only decremented, graded, slow potentials carry information from this input channel. Bullock 1993, 6

With this sort of multifunctionality, talk of 'codes' or even of 'channels' seems obsolete. More suggestive is the approach being developed among connectionist modelers, who say that, when units can modify themselves to strengthen just those connections that work for the animal as a whole, it is not useful to talk in representational terms: it is in principle impossible to say exactly what 'features' hidden units are 'representing' or 'detecting.'

A connectionist model

Neti and Young (1992) have constructed a connectionist model of groups of neurons that could determine sound source azimuth and elevation using monaural frequency alone. Their net had 128 input nodes, 4 to 10 hidden unit nodes, and was output-mapped onto a two-dimensional array reporting azimuth and elevation coordinates. It was fully connected and used an error back-propagation algorithm for weighting adjustments.

The net was trained on data extracted from microelectrode studies of cats. Input nodes were given values corresponding to the spectra of tones after reflection within the cat's ear. Values at output nodes corresponded to sound-source decisions as they occur in superior colliculus neurons. The model was also trained to be level invariant by randomizing levels of training inputs.

[3-8 Connectionist hidden units]

Analysis of activity at hidden units showed that the cue most useful to the net was the frequency of the first spectral notch, presumably created by interference effects on pinna-reflected lower frequencies. For the model, and possibly for the cat, the frequency of the first spectral notch specifies elevation and azimuth uniquely (Neti and Young 1992, 3141).

Neti and Young summarize what they take to be the wider implications of their model in this way:

It is important to notice that the first notch cue is an emergent property of the network solutions. There are no constraints on the way in which models combine information across frequency regions, except that combinations of information across frequency is a key aspect of the transformations performed by the models. This sort of information processing may be an important aspect of complex auditory pattern recognition tasks in general, and the use of spectral cues for sound localization may provide a convenient, straightforward, and easily interpreted paradigm for studying complex stimulus processing in the auditory system (1992, 3154).

The most interesting suggestion I would draw from Neti and Young's model is that we may be misled by thinking of neural elements of any auditory matrix in too simple a way. Some and perhaps all of them may be working like hidden units in a connectionist net; their function may be just to learn to respond in whatever way works for the global sensor-effector through-line concerned. They may not have functions that can be named in terms of the elementary acoustic variables of time, frequency and intensity. It may not even be possible to name their individual or group functions in terms of combinations or ratios of these variables.

The second interesting suggestion is that the auditory system may be doing things more than one way at a time; it may be partially redundant. And this redundancy may extend across sensory modes. Response at auditory matrices further downstream may at times be formed solely on the basis of sideways connections to other sense modalities, without any response patterns propagated from auditory sensors. That is, association area matrices might be able to derive higher-order acoustic covariances by unusual routes.

About vision

Ecological optics is less concerned with seeing light than with the seeing of things by means of light Gibson 1982, 75

This chapter's remarks about vision are intended as establishing general guidelines for a discussion to be continued in more detail in Chapter 4.

The visual system is so important to primates that, in the macaque, visual cortex takes up about half the 100 cm2 extent of each hemisphere (Van Essen 1992, 419). It is very intensively studied, but it is not well understood. This is so partly because its neuroanatomy is complex, but it is a result also of conceptual difficulties that are Cartesian in origin. Gibson is helpful with these difficulties, and the following suggestions are Gibsonian in spirit.

1) We should not think of the eyes as separate sensors. The visual system is a single binocular system, which has two peripheral scanners directed toward an object from slightly different angles. We also should not think of ourselves as seeing with our eyes. We do not see what the eyes would see if they could see: we see by means of our entire visual system.

2) The classical notion of the retinal image is unworkable. There are many subsystems originating at the retina and using different aspects of retinal response, so there would have to be many retinal images. Moreover, imagining a retinal image makes us think of retinal response as punctate, the way a photographic image is composed of grain. There is point-wise response at the retinal surface; rods and cones are very small points of differential response -- to the broad spectrum of visible frequencies of light, or to narrower bandwidths. But response of neural elements connected to rods and cones is already comparative; it is already response to field properties of the array contacted by retinal sheets -- to contrasts and changes in contrast.

Further, the neurons correlating points of retinal response are already of different kinds, so they are responding to different kinds of field properties of the optical array. Responses to these field properties are propagated through many matrices, where they are convolved with responses from other matrices.

Trying to think of these matrices as 'extracting' 'features' of an image works against our ability to understand that vision involves numbers of networks recurrently connected and working in parallel. Rather than imagining a retinal image propagated from the eye to the brain it is probably useful to think of the retina as transparent or blind. Then we are more able to imagine the many systems of differential response as response not to an image but to objects, to backgrounds, and to the perceiver's own location and behavior.

3) Sensory action is an essential part of our ability to see, and proprioceptive response, response to the body's own perceptive action, is correlated with other sensor response at many points especially in the parietal and forebrain. What is happening in the rest of the body can codetermine what we see and where we see it: whole-body orientation, head position, gaze direction, gaze motion, and gaze slippage during saccades are all important. Eye convergence when we focus on nearby objects, and lens accommodation during focus at any distance, are especially critical.

Saccades, fixation, and foveal magnification work together. Human eyes make a hundred saccades a minute; the fixation period between saccades is about 300 ms (Ballard 1996, 116). When we are looking, in other words, our eyes are moving about half the time. Saccades controlled from the superior colliculus in the midbrain often are not experienced as such. Another more intended kind of saccade is controlled from the frontal eye fields in the forebrain. The superior colliculus, the parietal, and the FEF are reciprocally interconnected, and the current position of the eyes is registered throughout spatial function areas, a constantly updated aspect of operational context that factors into seeing as it is accomplished.

Gibson describes saccadic eye motion as being like the sweep of a palm across a texture again and again. An orderedness of illumination discontinuities and gradients is present in the array as a standing structure the eyes' motion is sampling "like a blind man feeling an object on different sides in succession" (Gibson 1982, 154).

4) Vision is not instantaneous, although it seems to be. Like other, slower senses, vision is essentially integrative over time as well as space. This is so as early as the retina, where rods and cones have refractory periods and respond as a function of photons absorbed within some specific time period. That visual structure is cumulative also in cortical areas which are the means of sentient vision is shown by the fact that we see video images all at once although they are scanned onto a screen pixel by pixel. We see consciously by means of a subnet of activity in the cortex which accumulates structure across time periods that include many saccades, many fixations.

5) Foveal and peripheral vision seem to be seeing different things by different means. About half the 1.07 million retinal ganglion cells serve the central 16 degrees of the retina. Foveal vision, whose resolution is the combined result of central focus and of the greater density of receptors at the center of the retina, seems to be essentially involved in object vision. It enables color vision, stereoptic depth vision, and our ability to track a moving object and not just intercept it. The contrasting form of vision should probably be called non-foveal rather than peripheral, since there is at least one system of retinal neurons that responds to events over the whole of the retina, but without added resolution at the center (Zeki 1993). Non-foveal vision seems to be vision not so much of motion as for the purpose of motion. As such it is particularly relevant to locomotion and to perception of backgrounds.

6) The classical notion of a hierarchy in cortical visual response is metaphorical and inexact.

V1 and V2 are visual receiving areas at the occipital pole. There are many secondary visual areas; Van Essen (1992) reports 32 areas with some retinotopic organization in the macaque, 25 thought to be primarily visual and the remaining 7 thought of as polymodal. Secondary visual areas are much smaller than V1 and V2. Most are less than one tenth their size.

Until recently, we have thought of the relation of primary and secondary visual cortex as hierarchical, differentiations made at V1 and V2 being early stages of 'processing' finalized further on. At the same time, these areas are thought to be the areas most important to conscious vision, because conscious vision can survive lesions to secondary areas but not to these primary areas. It has not been obvious how the supposed lowliness of V1/V2 in a hierarchy should be reconciled with the centrality of these areas to conscious vision.

The fact that forward connections are usually reciprocated, along with Edelman's notion of synchronous recurrent reentry, can help with this puzzle. We should probably think of V1/V2 as organized simultaneously from above and below, by a convergence of response propagated from sensors and from the many contextualizing matrices in secondary and multimodal association cortex (Farah 1990, Pollen 1999).

7) Finally, the visual system is robust but tunable. Sensitivities of neurons as early as the retinal ganglion can be altered via recurrent connections with more central areas. The more important tuning, however, is the tuning set up by the environment. When we look at different things, and when our visual circumstances change (illumination for example), differential response propagated from the retina will automatically activate different through-paths, customizing the system to existing conditions.

Looking at something with central focus will, for instance, activate focal subsystems right through the brain, and these subsystems will automatically activate other subsystems at many levels. Similarly, patterns of response in peripheral areas will automatically activate other subsystems at many depths. These subsystems can function coherently in parallel because the visual system has evolved and developed in surroundings where many kinds of spatial fact -- facts about body, world, and their relation in action -- are consistently correlated. This consistency allows cortical function to be coherent overall.

How to talk about perceiving

The discussion of perception in this chapter has elaborated three emphases that are recent in perceptual theory: it has tried to show how perception will be understood if we avoid representation metaphors; if we understand it as part of environmentally embedded organic aboutness; and if we envisage its cortical means as a widely distributed dynamical network self-organized in response to environmental perturbations. In this section I will spell out very briefly some of the implications of these emphases for inherited questions in the philosophy of perception.

Sensory push

The history of perceptual theory shows an ongoing anxiety about activity and passivity, which has been expressed in two ways. One group of theorists have insisted that perception is an activity, not a passive being-structuring by the world; another has used a sensation/perception distinction to get around the evident fact that what happens at sensory surfaces must to some extent be controlled by what is happening in an environment. Sensation is passive reception of structure from outside, they say, but perception is activity operating on data received -- it is interpretation, conceptualization, computation, or the like.

In fact perception is thoroughly and simultaneously both active and passive. Perception is active in as much as the perceiver always has to work for contact with environmental fact -- has to orient and focus and preorganize. It is passive in as much as, having oriented and focussed and tuned, the organism has to allow itself to be restructured, not only at sensor surfaces but all the way through to effectors. There can't be relevant activity without responsive differentiation. Presence structure is always co-caused; to be adaptive, it must be jointly organized by world structure and the existing structure of the organism.

Theoretical worries about anxiety and passivity have a gender undercurrent, and to those who have not been able to sort the question, there is this to say: a perceiver can be understood as fortunately hermaphroditic, like the snail. Perceiving requires that we poke ourselves into the world while being penetrated by it.

Perception and readiness

If there is any objectively demonstrable fact about perception that indicates the nature of the neural process involved, it is the following: in so far as an organism perceives a given object, it is prepared to respond with reference to it. This preparation-to-respond is absent in an organism that has failed to perceive (Sperry 1952, 301, cited in Coren 1986).

One of the reasons perceiving and acting have been contrasted (and maybe one of the reasons perceiving has been thought of as passive) is that, although we cannot perceive without acting, we can perceive without acting on what we perceive. We can sit and stare for sheer entertainment.

We cannot, however, perceive without being prepared to act. We perceive because some part of our physical structure changes in the presence of something in our surroundings. The change that occurs has the effect of setting us up to act in relation to that thing, in that environment. When we see a mountain in the distance, we are more ready to travel toward it than we were before we saw it, even if we are not going to do so.

We never perceive everything around us, and we are never prepared to act in relation to everything around us. Not everything is relevant to the kind of being we are, and not everything is relevant to our state at the moment. Neither do we perceive isolated things. Being prepared to act in relation to something involves being able to act in some context, which includes not only the organization of things around us, but also the body by means of which we act.

Because it is a part of readiness, occurrent perception is a subcategory of environmental adaptation. We are massively preadapted, structurally related to the world through events occurring at many times, many of which may be brought to bear in the perceiving moment. (Long-term changes that adapt the animal to background constants such as gravity and the specific density of water are part of the perceiving moment in the sense that they are the structural contexts within which changes brought about under control of present environments can be useful.) Species form, structures evolved to be modifiable in somatic time, structures as actually modified in the moment -- and among the latter, the immediate response which is perceptual structure -- all participate in an organism's moment-to-moment effective presence. As the last of these modifications, the perceiving moment rounds out or points a preparedness already existing.

Perceiving is not internal

... the brain is not as it were the subject of vision Noe 2000, 41

The classical distinction between perception and adaptive physical structure is a remnant of mentalist ways of thinking about perception as if it were the taking-in of pictures or of data from which internal pictures are constructed.

Like the passive-active distinction, the external-internal distinction is misapplied when we are trying to talk about perceiving. A perceiving organism is about things that may be internal or external (a tooth or a street), by means that are likely to be both internal and external. (Means of hearing the street include the air as well as the auditory system; means of seeing the tooth can include a mirror.)

There are, of course, internal structures essential to perceptual relation. The means of every kind of perceiving include neural structure inside the skin and within the brain: that much is indisputable. But perception theory based on a representation metaphor thinks of (some subset of) this structure as the loci of aboutness, the way paintings are thought to be a locus of aboutness. Contemporary perceptual computationalism is a new variant of this old manner of speaking; Hume's version of representationalism spoke of images "in the mind", and a 1980s version spoke of symbols or sentences in the brain. Both have a brain-in-a-vat feel, as if perception is a purely internal transaction.

What is at issue is not a fact but a manner of speaking and an emphasis. Certain sorts of talk are rightly applied at the level of the creature or the person, and are misapplied when attributed metaphorically to anything inner, whether brain or 'mind'. Any perceiving is the relatedness of an oriented creature to a perceivable situation. Direct perception does not mean there is no internal structure, it means nothing internal should be said to perceive. There is no internal seeing.

Similarly we shouldn't talk about internal inference or computation. This does not mean that a perceptual moment isn't enabled by structure deep in cortical streams, or by structures built at other times -- that it is not made possible by training, experience, naming practices, and so on. It does mean that concepts like inference are not properly applied to these expanded structural means. Persons infer and compute, and when they do, they often use representations, which are objects and events outside rather than inside their bodies.

Features and categories

Representation-based theories have needed to posit inference because they have thought of perception as a copying of the particular thing in front of us when we perceive. Category has then been thought of as something added to particular response, a 'concept' 'applied to' particulars. A picture of a particular cannot (it is true) be a picture of a category, and so, they reason, some inferential process is needed to contribute categories and other kinds of abstraction. Kant, in the Critique of Pure Reason, said, for instance, that categories have to be applied -- by the faculty of Understanding -- to materials furnished by the senses (B181).

If we do not think of perception as internal representation, and if, instead, we think of it as complex structural adaptation, we do not need the notion of internal inference. When we understand perception in terms of wide nets, generic response seems instead to be a part of particular response, but a part evokable in many different circumstances (and one that can sometimes be evoked separately -- see Chapters 6 and 7).

Eleanor Rosch, a perceptual psychologist who was first writing in the 1970s, discovered that, at a level most relevant to the survival of animals and humans, perception is automatically categorical. The world builds the brain so we see things immediately as kinds of things. When we are engaged with an object, many kinds of response in many cortical areas, some close to primary sensory cortex, some much further away, are facilitating aspect aboutness. Seeing an apple by means of a web of covariant foci, we are also seeing it as an eatable and a round thing. We are seeing it as these things in the sense that, seeing it, we're ready to eat it, to shape our hand to it.

Areas in the human ventral temporal lobe (near the middle of the underside of the cortex) have been associated with categorical perception of, for instance, faces or animals.

[3-10 Face recognition areas in ventral temporal cortex]

Researchers probing part of this area, the fusiform gyrus, think response in local neuronal groups in this area is probably not to classes of objects as such:

Rather, each object type activated a relatively broad region of the fusiform gyrus ... but the peaks of these activations were centered on different parts of the fusiform gyrus. Therefore, rather than being organized by object category, per se, this pattern of results was more consistent with the idea that this cortex is tuned to different object features that members of a category have in common (Martin, Ungerleider and Haxby 2000, 1032).

If there are neuronal groups that respond whenever there is something round, or whenever something is moving at a given speed, they could be called feature constant. Some, like neuronal groups in color areas, would be primarily unimodal. Others, like shape or texture nodes, could be activated by vision, touch, or both.

As we saw in Chapter 2, cortical regionalization is codetermined by the organization of the world and the existing organization of the organism. Sensory subabilities like timbre perception or color perception have their own areas because they are independently varying parts of object- or act-categorical response; differentiating a particular timbre is part of recognizing objects for different purposes.

Not all the subfields potentially responsive to an object are active in every instance: we are able to recognize something in the dark, or even just from a whiff of scent. In these instances, there is segregated use of the connections of a few foci: we are able to act on aspects.

There may also be neuronal groups active only when combinations of nodes are active -- neuronal groups whose response is or superordinate category-constant. The wide net of response to an individual object could include these category constancies too.

If object response can include many neuronal groups -- which are distributed but linked in recurrent networks when active -- different subnets (active for different purposes) will be parts of our response to the object -- but that does not necessarily make them response to parts of the object. It can be difficult to characterize what is happening at some of these nodes. The parcellation of response in association cortex will be a function of physical differences and similarities among the objects we deal with, but it will also be a function of our senses, our needs, our contexts, and later in the evolutionary trajectory, our cultural training.

Local cortical destruction damages cognitive abilities in unexpected ways. People can lose their ability to recognize, name, or remember facts about living things, while keeping these abilities in relation to tools. Because these losses are not clear cut -- they are relative losses of speed and accuracy -- researchers have worked with principal components analysis to try to discover a categorical basis for the dissociations found (Martin, Ungerleider and Haxby 2000).

Objects were evaluated in relation to a list of variables: similarity of form, value to perceiver, manipulability, familiarity, characteristic motion, characteristic sensory modality, age of acquisition, and perceptual distinctiveness. The dissociation of object agnosia deficits turned out to be a dissociation across clusters of values on these axes. One category, as determined by skill dissociation, turned out to be the category of practically useful, touchable, objects, learned young.

Are there embedded assemblies, a core subset of connections, that are active when we name or use or think of any practically useful, touchable, object learned young? There could be; assemblies could specialize in that way just because they are located at junctions where they are active in many contexts, with many kinds of objects, as part of many kinds of action. They would be high level generic filters, responding to what Gibson called high level invariants in situations that include observer, object and background:

I have suggested that the environment consists basically of substances, surfaces, planes, objects, and events. These perceivables have a special kind of meaning that I call affordances. They are 'nested'; they are not discrete entities or denumerable units, and they cannot be inventoried...[They] do not differ from one another along single dimensions. They have meaningful combinations of dimensions, i.e., features, that constitute their affordances ... we cannot study the development of our ability to discriminate them by isolating and controlling one variable at a time in the orthodox way. The theory of sets does not apply to them; they are not categorized except insofar as names are applied to them. (Gibson 1982, 292)

This sort of high-level generic discrimination may even be built before finer discriminations. Developmental evidence says babies have vehicle response, for instance, before they have airplane response, and that they discriminate at the level of animal or utensil before they can think dog or cup (Mandler, personal communication).

Rather than being sophisticated rational additions to object perception, categories seem in fact to be the easier part of perceiving. As we gain structural differentiation we become able to see differences at the same time as similarities. As we get to know someone, we begin to be able to see that they look different every day. The more time we give something -- the more structure we have built in relation to that thing -- the more we are able to see its particularity. This implies that the higher function, the more experienced or the more educated or the more evolved function, is to see particulars as more particular, to see more of them, to be more particularly about them.

Correspondence

Representational theories of perception must posit correspondence between perceptual structure and things being perceived. Since correspondence is a one-one relation between discrete entities or their parts, correspondence is more than implausible in the perception-action constellation of the moment.

The structure of proximal fields is complex, jointly determined at every point by many things about the surrounding environment. The event of perception is complex, jointly determined by the structure of the field and the present structure of the perceiver. The perceiver's adaptive structural response is temporally complex, occurring in evolutionary and somatic time as well as in the present. And the perceiver's immediately responsive, differentiative structure is complex, including, as it does, sensory, visceral and motor systems.

If we consider only sensory response, that, too, is complex, since it involves many recurrently connected matrices, each of which can be primarily and secondarily involved in high order covariance detections wanted for different purposes.

Keeping these complexities in mind, it is easier to notice that an organism's sentient aboutness, as well as its global structural aboutness, can be complex.

We are codifferentiative: we are about one thing by many means as well as being about many things by the same means.

We could start with the way we are about many things by the same means. Because distal perception makes use of medial fields, and because the effects of many distal facts are convolved at any point in those fields, and because perception is codetermined by world and perceiver, and because sensory cortex is a congeries of interconnected matrices responding to different covariances, we can be about many things as a result of the same instantaneous scan. We experience this multiplicity in the way we can look at the horizon to see the hill, to see the shape of the hill, to see the moisture or dust content of the air, to see the relative distance of a particular hill, to see the time of day, or to see whether our eyesight is good.

Now look at how we are about one thing by many means. Say we are catching a chicken. The senses aren't functionally separate from each other on the way up: vision, hearing, touch and proprioception are collaborating as early as the midbrain, and they go on feeding back onto each other all the way to the cortex and beyond. Our sense of where the chicken is is heavily visual but also has converging participation from auditory, somatic and motor cortex. The evidence is that this coordination reaches even to senses we don't seem to be using (PET imaging for speech comprehension tasks has for instance found activation in auditory cortex even when we are only watching someone's lips move)(Zatorre, 1992, 846).

When we are perceiving something in particular we are also inevitably about what we are having to do to perceive it. When the chicken is just out of reach we are feeling ourselves here as part of seeing it there. Feeling ourselves here and it there may include postural tensions that are a sort of imitation of the chicken's crouch (more about that in Chapter 4). It also includes the muscle tensions of our readiness to dart toward exactly that spot the chicken hasn't reached yet. We may be anticipating having caught it, as if feeling a feathered leg in the hand. We may startle as if it had already pecked us, an emotional synaesthesia Damasio calls a value marker (Damasio 1994, 198). Some of the wide net that is 'perceiving the chicken' will be included in the core dynamic subnet of conscious perceiving, and some of it will not, but all of it will be part of the structural knowing that is engagement with the chicken.

Objectivity

The auditory system has evolved for detection and processing of biologically important sounds and localization of the sources of these sounds. Suga 1994, 175

Questions about the objectivity or reliability of perception cannot be answered in any general way. It depends. That it depends on conditions is obvious. We cannot perceive with damaged sensory systems, and we cannot perceive when proximal fields do not offer appropriate structure, when it is too noisy, or when it is dark. Certain conditions, rare when not humanly engineered, will always set up perceptual illusions.

If we do not know how perception works, physically, it is unclear what to make of limitations and differences in the perceptual aboutness creatures can achieve. We make more, or fewer, or different perceptual distinctions than other species, because we are differently structured. We see holly berries and leaves as differently colored, where a dichromat will see them as the same color.

The fact that a dichromat does not see red berries does not mean that trichromats see what does not really exist -- that if we see a red object we must be seeing a red internal picture. Redness as such neither exists not does not exist. Red things do exist. We are not able to hear the velocity of a raised dot on a moth's wing; the bat is. Trichromats are able to see red things; dichromats are not. It is a matter of sensory resolution. Seeing green leaves and red berries is seeing differences in surface microtextures.

Like any kind of perceiving, this kind of perceiving requires complex means that include the object, the proximal field, and perceiver structure. We see colored objects because light waves have different frequencies, because plant surfaces reflect these frequencies differently, and because we have the sorts of retinas and deeper matrices that can respond differentially to the effects of these frequencies in the structure of ambient light.

There is no such thing as a pure perception of an object, Damasio suggests. If by purity we were to mean that only the object, and not the proximal field or perceiver structure, has a part in the perceiving, of course perception is not pure. But purity would be an odd demand.

The question of perceptual objectivity comes up as if we expect perception to happen without means (Kelly 1986). The tendency to call perception subjective because it requires complex material covariances seems to be a remnant of the centuries that imagined an absolute Perceiver who could see everything about everything, absolutely without mediating fields or physical restructuring. We are the sort of perceivers we are, material and unabsolute, but this does not mean we cannot perceive what really exists, or that we 'only perceive our own representations'. Perception, of course, is both subjective and objective. What else could it be?

Different people have made this point in different ways. Pribram says perception supplies "a definition of reality that would be animal-relative, but no less real for being so" (1991, xxiv). Gibson says the real world is detected "within the limits of the invariants to which the system is attuned" (1979, 261). Varela describes vision as "an occasion for multiple modes of dimensioning internal regularities across the animal worlds, all of them viable within certain broad constraints of animal bodies, light" (1984, 218). Susan Oyama says

... the impact of sensory stimuli is a joint function of the stimuli and state of the organism; the 'effective stimulus' is defined by the organism that is affected by it ... that the same stimulus may have different effects on the same organism at different times, does not render stimulation causally irrelevant or merely permissive (as opposed to formative). Oyama 1993, 26

An epistemic base

Representation-based epistemologies have thought of certain kinds of verbal belief as foundational, with non-foundational sorts of claim derivable from them. Rationalists like Descartes think of knowledge in terms of formal systems, which are founded on a set of axioms that are its foundation.

In an epistemology that thinks of organisms rather than representational artifacts as the loci of aboutness, basic level presence is foundational; perception and action abilities in relation to particular kinds of things, known at a particular physical scale and categorical level, have had evolutionary priority and so are epistemically basic for the kinds of knowers we are (Maddy 1980, Lakoff and Johnson 1999). Cultural elaborations can be understood as built on this base, less reliable as they are more remote from it. This view is in contrast with Cartesian epistemologies that take perception and action as less certain than representation-based reasoning abilities.

The notion of epistemic base that I intend has two origins. One is the early 20th century logical empiricist's notion of base-clause epistemic foundations (Irvine 1989). Russell and other logical empiricists suggested, as a corrective to Cartesian rationalism, that we should maintain a clear distinction between logical foundations and epistemological foundations. Axiomatic foundations are attempted when we formalize knowledge; epistemic foundations are about coming to know, rather than securing or guaranteeing certainty. A base-clause foundational empiricist believes that we have reliable and immediate knowledge of middle-sized observable objects, and that we arrive at nonobservational or theoretical understanding through reliable inference from these observational foundations.

The logical empiricist's distinction between logical and epistemic foundations is an important clarification, but it is only begins the correction needed. Both rationalists and logical empiricists are representationalist about knowing or aboutness. Even for a logical empiricist such as Russell, it is observational statements that are foundational.

For both rationalists and empiricists, in other words, knowledge is an edifice built in public space. A rationalist edifice is built in one direction, upward from its foundations, like a cathedral or a shopping mall. Empiricist observation statements would be sky-hung foundations, and the empiricist edifice, like a space lab, built out in all directions from an original core.

If there is no aboutness without structured organisms, the edifice of knowledge is the knowing body embedded in a world in which it is competent. The foundation of that competence is the orderedness of that world, the orderedness of the body evolved within it, and the ongoing dynamical relatedness of the two. The empiricists were right in saying there must be epistemic foundations before there can be formal systems, but they overlooked the bedrock materiality of the foundation there is.

The second origin of my notion of an epistemic base is Eleanor Rosch's idea of a categorical base level, a domain of function which is a base level for humans. It is a base level in the sense that it is the scale at which our capabilities are most reliable because it is the scale at which they have been sharpened by selective pressure. There has been debate among cognitive scientists about the center or bounds of basic capabilities, but the notion that evolved organisms will be most competent in relation to what is most relevant to their survival seems self-evident.

The reliability of perception varies, therefore, with what is being perceived. We are most readily effective, competent, successful, in relation to things (like other people) that have mattered most. We are tuned to a particular physical scale and categorical level; we are apt with objects and events at the scale of edibles and negotiable surfaces, and at the categorical level of the genus (dog as opposed to mammal or collie) (Rosch 1978). These kinds of objects and events are recognized more rapidly, perceived more vividly, and remembered more readily. Children learn their names earlier. The same sorts of objects and events are named earlier in the development of a language, and given shorter names (Lakoff and Johnson 1999).

So the notion of basic level function can be operationalized by noticing what is rock solid in daily life, what is fastest and easiest, first to develop, last to deteriorate, least vulnerable to derangement. But at the same time, the notion of an epistemic base cannot be very exact. There cannot be necessary and sufficient conditions for being a base level competence.

Basic is being used here in several senses, related to the two origins described above. A competence may be basic because it is essential. Essential kinds of competence or aboutness are often those we share with animals and children. An ability or competence may also be basic in the other, foundation-like sense that it is a core that non-basic abilities may be built around.

The two senses are related in that an evolving species builds new abilities around a basis of old abilities that must not be lost. The structural means of basic abilities will be somewhat reorganized as new abilities are worked out -- the way food search alters when there is an opposable thumb -- but, no matter what else is elaborated, basic competences must be retained in some form. So our evolved base competencies, which are epistemic foundations to the species we are, will include cellular nutrition and self-repair, adaptations we share with yeasts and paramecia. They will include locomotor competence we share with invertebrates, and kinds of place knowledge we share with mammals like rats. It will also include social and environmental capabilities, like infant care and medicinal use of plants, that we share with other primates.

More complicated animals have a broader epistemic base: they are able to be competent about more kinds of things, in more kinds of circumstance; they are able to be competent about more things at once. For animals in the wild, any competence could probably be called basic. A human epistemic base is also very broad, but there are many forms of human aboutness that are not basic. We need the basic-nonbasic distinction mainly in relation to humans, in fact, because humans are uniquely able to bypass or overbuild basic function, and to live in elaborate simulational states and in massively artifactual surroundings.

In Part III I will describe instances non-basic uses of basic perception and action capabilities.

Base level and organism perspective

Basic does not mean simple or unitary. We are basically, but simultaneously and complicatedly, about situational wholes. This chapter has described some of the multifocal complexity of base level perceiving. In the next chapter I will describe another complexity of basic function, the multifocal coordination of motor behaviors directed toward located objects. Understood in wide net terms, base level aboutness is structural organization constellating perception of foreground and background objects, events, actions, and places, and preparing us for, or engaging us in, action relevant to any of them.

Basic function must be complex self-organization in relation to whole situations because any behaviour occurs in a complex environment. When most intent in stalking or fleeing, an animal must be focally about the object of interest, while also being more generally, less consciously perhaps, about conditions taken as background. It must be about the terrain in which it moves. It must also be keyed to other sorts of danger and opportunity. Foreground-background organization within a task axis is not a matter of ignoring background, but of managing priorities within simultaneous constraints.

Damasio calls the situated, integrated wholeness of basic located function organism perspective, "continually and irrevocably built" (1999, 146-7) by the interaction of body and world, the way optical perspective on a scene is built as a function of both eye and optical geometry.

Organism perspective, Damasio says, is also built by the vestibular system and by adjustments in the muscles that control the lens and the pupil, the position of the eyeball, and the head, the neck, and the trunk. These are familiar task axis points, but Damasio adds that felt muscle tensions are essential to a sensed or perceived i-it organization in bodily involvement with located things. Feeling oneself here is part of seeing it there.

Emotional response, which he describes as endocrine adjustments and changes in the smooth musculature of the viscera, is part of organism self-regulation for Damasio, but like muscle tension it is part also of the perception both of the object and of perceiver state, and so of the sensed relatedness that is organism perspective -- referentiality and subjectivity reciprocally present.

Organism perspective, for Damasio, includes but is not limited to structure included in sentient function. Effective segregation and integration of sentient and nonsentient function, of core dynamics and global networks, must itself be part of basic wide net function.

 

 


Chapter 4. What, where and how