theory index



We know much more about auditory perception than we did ten years ago, by reason of a convergence of work in ethology, neuroscience and connectionist computer modeling.

Ethologists study the behavior of animals in their natural habitat; neuroethologists work with animals in laboratories, but they attempt to study behaviors or abilities that would be important to the animal in the wild. Neuroethologists studying animal audition will thus study neural response to biologically important sound, often in animals with auditory specializations, like bats, owls and cats. Auditory neuroethology uses the findings and some of the technologies of auditory neurophysiology: action potentials are recorded from microelectrodes while recorded sounds are played to the animal through either headphones or loudspeakers. Bat studies for instance use recordings of the bat's own pulse emissions with simulations of echos at various rates of delay and frequency shift.

Along with microelectrode studies, auditory neuroscience proper has made progress in the embryology and physiology of the auditory nervous system: much more is known about the fine detail of its development, synaptic chemistry and connectivity. On a larger scale, improvement in PET/MRI imaging has surprised us with information about ways different regions of the brain work together when we hear or produce sound.

Connectionists modeling auditory neurophysiological or neuroethological work will train nets to replicate neural response given the original stimulus conditions as input. Or they will offer a connectionist simulation of some aspect of auditory function as a hypothesis about neural circuit design.

Net models are models of pattern recognition accomplished by spatiotemporal means: spatial simultaneity and temporal sequencing in complex interaction. As such they bring auditory epistemology into contact with a new range of conceptsmaps, arrays, matrices, map transformations, projections and so onconcepts that enable us to imagine more clearly the logistic aspects of what it is for an amimal to know by acoustic means. The existence of connectionist models leads us to ask more finely focussed questions about perception's embodiment. Just how does the nervous system schedule connections; what does the immense complexity of its connective topology enable?

Complexity is not easy to imagine. Spatial complexity is complexity of connections among elements that exist simultaneously. Temporal complexity is a similar complexity of causal relation, whereby events occurring at the same instant can have effects which occur at different times, and events occurring at different instants can have effects occurring at the same time.

The sense of paradigm shift in progress is very lively in some of the work I will describe below. A pradigm shift is a shift not only in what we know but in how we know. Our nets will have to reconfigure. We will have to be more detailed about the space and time of perception's fine working structure, and at the same time we will have to keep in mind the instantaneous wholeness of the perceiving animal. Going even farther, since perception is after all contact with the world, we will have to learn to imagine a wholeness of the perceiving person or animal IN an environment, wholeness constantly reconfigured, and reconfigured perhaps globally, as the environment changes.

The notions that seem currently to be the most useful, and that are under interesting revision, are the notions of map, staged parallel map transformation, filter, temporal filter, and multiple function. These ideas are mutually dependent in ways I will try to begin to demonstrate in what follows.


Certain regions of nervous systems are thought of as maps because when the response properties of the neurons included in them are tested, they are found to vary systematically with the neuron's position in that region. They may vary systematically in one dimension, in two, or in more than two. Their axes of variability may be orthogonal or nonorthogonal. Or the maps may be irregular in various ways: there may be functional bulges whereby one coordinate position has more cells, or cells that are more sharply tuned, or cells with a lower firing threshold.

A map region is often distinctive in other ways. Its cytoarchitecturethe look and arrangement of its cells, columns, bands, layers and axonal interconnectionsmay differ from that in surrounding areas. It may have distinctive lines of in- or out-connections to other regions.

Maps are located at all levels of the auditory nervous system, from the auditory nerve just behind the cochlea, to the surface of the cortex, and they project onto one another in register, all the way up and down the line. Some of these projections fan in from more cells to fewer, but overall the auditory projection fans out, from 90,000 cells at the cochlear nucleus in the brainstem, to 400,000 cells in the inferior colliculus of the midbrain, to ten million cells in auditory cortex. Some projections can be thought of as through or labeled lines, because they pass activation patterns without significantly transforming them. In these instances we can say the map projects directly. But more often a map is transformed as it projects through a processing nucleus.

Different kinds of map transformation are possible. The simplest is pattern clarification: noise is eliminated by cooperative settling among interconnected cells at the same level, or by weighted connections with a next layer (Mead 1995, 256).

Maps may gain dimensions as they project to a next level. Maps from the two ears are convolved at the cochlear nucleus, for example. In the superior colliculus auditory maps are projected onto visual maps. At various levels all the way up, auditory maps are convolved with motor response maps. And maps may integrate feedback projections.

A map may project into several maps with fewer dimensions; it may split. Or it may gain dimensions by integrating more sorts of variable. It may also effect a derivationit may extract higher-order patterns, either spatial or temporal or both. It may also be doing all of these things at the same time, by means of elements and connections whose multifunctional capabilities we are just beginning to be suspect.

I will be giving examples of auditory map transformations in sections III and IV, but first I want to say something about the conceptual relations among maps, map transformations, and filters.

Before electronics, a filter was a strainer that allowed one kind of thing to pass while retaining something else; a housewife would filter milk by passing it through a cloth (L. feltrum, a felt cloth). In a time when filters are often also transducers, however, our sense of filtering has expanded to include different kinds of selective response. For instance we speak of auditory neurons in the cochleasemi-metaphoricallyas centre-surround band-pass frequency filters. They do not of course transmit or retain anything: they respond selectively.

In the extended, metaphorical sense of the term, any neuron, any column of neurons, any circuit or layer of neurons, any neural map, may be thought of as a filter, as long as it is responding selectively. We can go farther and say the entire nervous system is acting as a global filter, since it is an overall configuration of selective response. I will come back to this point, but what I want to emphasize here is that neural connections are not pipes. The do not transmit some fluid, or some symbol, or some packet of `information'. When they are connections among sensory neurons, they respond selectively in ways that are putting the animal in touch with its environment.

The concept of filter is related to the concept of map in two ways. We can think of the map as being made up of filters, as being an array or matrix of unit filters, and we can think of the map as being a filter itself. The difference is a difference of logical type: the map's overall response is a spatiotemporal pattern made up of the responses of its elements. A neural map transformation is a process by which one map's spatiotemporal pattern contributes causally to the spatiotemporal pattern selected at the next.

It is difficult to speak clearly about any complex four-dimensional process, but it might be approximately correct to say that a map transformation is an even higher-order filter, because, seen in temporal cross-section, if there is such a thing, it is a patterning of patterns.

Neurons are not simple: their function is to produce an appropriate pattern of output spikes in response to the state of their "approximately 104 input synapses" by a dynamic process that "arises out of the interaction of the many species of active and passive conductances that populate the dendritic membrane of the neuron and that serve to sense, amplify, sink and pipe analogue synaptic currents en route to the spike generation mechanism in the soma" (Mead 1995, 268).

Another way nervous system filtering is not simple is that it is highly tunable. Neurons are selective in several ways. They can have different valences: they can be excitatory or inhibitory. They can have different activation thresholdsthey can fire only when strongly activated, or they can fire to any old input. They can be sharply tuned or they can have a wide response window. But all of these response characteristics can change.

They can alter as a consequence of field effectsdiffusion of neurotransmitters through intercellular fluid, for instance, or the presence of slow-wave potentials throughout an area (Bullock 1993, 8). By these means a neuron's selective response properties could make large shifts in operating mode. And synapse properties are being modified all the time: synaptic change is what allows an animal to learn.

The tunability of wetware filters makes them strikingly different from the sorts of hardware filters we know. A second large difference is how multidimensional a wetware filter can be. A neural filter's response decision must emerge from a constellation of factors all operating simultaneously and all contributing with different relative weights. Seen at a very fine scale, ANY of these factors may be systematically relevant to the animal's perceptual circumstance, and thus may count as a response parameter.

On a scale closer to the scale of sound events as we usually think of them, we find neurons that are combination-sensitive to different degrees. A neuron may respond to a particular coincidence of a signal from the left ear and a signal from the right ear, but only where both are reports of some specific frequency. Other neurons are tuned to complex frequencies: they respond to the simultaneous presence of several or many frequencies, but only at particular amplitudes. And so on. Neurons may even be bimodal: audio-visual, or audio-visual-motor. Still others could be called temporal filters because they respond only to particular sequences. I will describe various kinds of time-filter in IV.


1. the cochlear map

A map is a map in virtue of the fact that it scales on (at least) two axes. In the auditory nervous system, one of these axes is invariably the frequency axis. Auditory maps are tonotopic. Another way to say this is to say they are cochleotopic. Auditory maps in animal brains are all prepared at the cochlea.

The micromechanics of the basilar membranemembrane stiffness, membrane mass, and density of cochlear fluidcause its response to pressure waves to vary systematically with displacement from base to apex. Subsurface neurons are able to transduce this differential response into action potential spikes. In effect, the cochlea sorts wave trains into their component periodicities, and reports the presence ( or sometimes absence) of any particular frequency by activity at a position on a scale. This scale can be thought of as projecting right up through all the auditory nuclei in the brainstem to the surface of the cortex, and then even further, into the forebrain and hippocampus. There are even return projections from cortex back to subcortical nuclei.

The frequency make-up of a pressure wave train is thus reported by spatial means, but pressure waves vary along more than one dimension. Individual frequency components also have temporal and amplitude properties. These are not transduced spatially but temporally. The larger scale timing of acoustic events just propagates up through the auditory system: the animal is perceptually entrained so that, speaking generally, it begins to respond when a sound event begins, and stops responding when the sound event stops. On a finer scale, the auditory nervous system may phase-lock the response timing in its scaled bank of frequency filters so that frequency reports are synchronized as they project up through the brainstem.

The auditory nervous system can also use temporal scheduling the way it uses spatial ordering, to report variation in dimensions that are not temporal. The amplitude of a frequency component may be transduced as spike count, for instance. It can also be transduced by phase relative to some reference oscillation.

Because time-ordering is used to report both event timing and amplitude variation, the auditory system is thought to segregate time and intensity reports for at least parts of the march of frequency reports up to the cortex. In other words, the frequency scale is sent up two different pathways, one specialized to report stimulus time and one specialized to report intensity. This segregation of function resembles the separation of visual reports into separate streams for object shape and spatial motion (Kosslyn 1995, 23).

The split between time reports and intensity reports is prepared at the auditory nerve cells with different properties. In bats, for instance, the auditory subsystem that determines target velocities needs initial frequency reports that are very precise. But the auditory subsystem that determines target distance needs less sharply tuned frequency reports and more precise time reports (Konishi 1988). These differences originate at the first auditory nucleus.

The essential things to notice about cochlear projections are first, that they are PATTERNS, both spatial and temporal, and second, that as soon as we have spatial and temporal patterns we also and immediately have patterns of patternshigher order patterns available to any specialized subsystem that wants them. Thus, at some higher level the auditory nervous system has available to it the higher-order invariants that specify what we think of as sound characteristicsspectral envelope, formants, transient onset/offset, etc. And at some presumably highest level it will have available to it the higher-order invariants that specify the sound eventthe sounding objectsin the animal's environment.

2. cortical maps

In humans and other mammals there is an area called the primary auditory area or A1. It is also called the auditory receiving area, because it receives almost exclusive essential projection from the medial geniculate nucleus, which is the last and most important switching station between ear and cortex.

In cortical field A1 neurons with very similar best frequency are arranged vertically with respect to one another and within a narrow region of cortex can be found in all cortical cell laminae. Similarly, within each lamina neurons with similar best frequency can be found distributed horizontally ... At threshold sound pressure levels the resultant motion of a small region on the basilar membrane ultimately leads to excitation of a relatively small population of neurons that can be viewed as being arranged in a band having length, depth and width. (Merzenich and Brugge 1973, 293).

A1 is located on the superior temporal plane, that is, on the flat upper surface of the temporal cortex, hidden inside the Sylvian fissure, which separates temporal cortex (adjacent to the ear) from parietal cortex (above and behind the ear). The exact function of A1 is not known, but PET studies have shown A1 to be bilaterally activated by ANY auditory stimulusnoise, music, speech or environmental sound (Zatorre 1985, 38).

Secondary auditory areas (one of which is called AII in mammals) also receive direct projection from auditory centers in the MGB, but they receive input as well from other parts of the cortex, and are therefore called auditory association cortex. They are thought to be necessary to "higher order auditory processing of complex signals" (Zatorre 1992, 847). Careful microelectrode mappings of secondary auditory cortex in macaque monkeys (Merzenich 1973, 292) has shown at least five secondary auditory areas, which are thought to be distinct regions both for cytoarchitectonic reasons and because several of them show tonotopic response. Four are distributed around AI on the superior temporal plane, and the fifth lies just adjacent to one of these. It is thought that secondary auditory cortex may also extend a short distance into the upper, parietal surface of the Sylvian fissure. Bat secondary auditory areas have been well mapped as well (Suga 1994, 1993, 1990, 1985). Eight functionally distinct maps have been discovered in auditory cortex of the mustached bat. I will describe three of these in sections III and IV.

Areas of the temporal lobe adjacent to regions identified as primary or secondary auditory cortex are areas called periauditory:

Out from these primary sensory fields come fibers that synaptically affect adjoining areas that cannot unreservedly be called sensory, ... and out from these areas come fibers that terminate in areas still farther away from the primary sensory fields. The areas of the neocortex at various removes from the primary fields are called association areas ... more advanced stages of processing presumably are embodied in association cortex. For example there are places where the auditory and the visual converge (Nauta 1990, 105).


1. how a map works: the mustached bat's CF-CF area

Audition is particularly important to bats. Their auditory neurophysiology is specialized in interesting ways and as a consequence bat audition has become important also to neuroethologists. A bat is a very small mammal: its entire brain is "about the size of a large pearl" (Suga 1990). Nevertheless auditory functioning of the bat central nervous system has been mapped in unusual detail. A number of auditory regions have been found both in the bat's cortex and in lower nuclei, and we have recently come to know quite a bit about how they work and what they accomplish.

Bat audition is unusual in the importance it gives to reflected sound and to sounds emitted by the animal itself, but the general principles of acoustic perception as we are beginning to understand them in bats generalize well: what we see in operation are filters (complex and simple), maps, and staged parallel projections which effect map transformations.

One of the auditory maps discovered in secondary auditory cortex of the mustached bat is called the CF-CF area because it correlates responses to constant frequency pulses the bat emits when it is hunting flying insects. The pulses emitted consist of a fundamental (about 30.5 kHz) and three harmonics. The bat is able to control the energy of each harmonic. The fundamental is always very low in intensity: it is the bat's reference frequency, and also the frequency by which the bat knows its own pulse from those of its companions. The prominence given other harmonics varies according to conditions. Low frequencies will be less attenuated by distance, but high frequencies are more useful when closing in on small, fast, near objects.

The primary function of the CF-CF area is to prepare the animal to avoid obstacles and to respond to target velocities; It is in effect an array of Doppler shift detectors. As a map it can be said to be tonotopic, but along several axes at once. It is in fact a matrix whose elements respond to specific combinations of pulse fundamental and echo harmonic. Pulse fundamental frequency varies along one axis, and the second and third echo harmonics increase along the axis orthogonal to it, segregated in strips adjacent to each other (Suga 1990, 65). Both strips show a disproportionate number of neurons responsive to velocities the animal encounters in important maneuvers such as closing on prey or docking at a roost.

2. map magnification: the mustached bat's DSCF area

In the mustached bat the range of best frequency response across the tonotopic spread of the primary auditory map is approximately 10-100 kHz. Around 61-61.5 kHz this map bulges into a specialized subregion that takes up 30% of primary auditory cortex and is given its own name. It is the DSCFDoppler shift compensated constant frequencymap. Columns in this map are 40-50 neurons deep. Each column responds to a particular combination of frequency and amplitude of the second harmonic of a constant frequency pulse echo.

The map is radially organized:

One can (crudely) picture the area as a bicycle wheel: as one moves outward along a spoke, the best frequency of the neurons increases; as one moves circularly from one spoke to the next, the best amplitude changes. (Suga 1990, 63)

The DSCF region's specialization is prepared at the cochlea, which has a similar expansion of tuning sharpness around 61-61.5 kHz, the frequency corresponding to the normal second harmonic of the bat's resting pulse fundamental. Under resting conditions the echo second harmonic will also fall into this area. But when the bat is using pulse-echo frequency shift to detect target velocities, Doppler shift can increase echo second harmonic frequency to a point where it falls outside the cochlea's area of increased sensitivity. The result would be a relative audio blindness if the bat did not have recourse to Doppler shift-compensation. By lowering the constant frequency pulse it emits by about 2 kHz, it brings the Doppler shifted echo into its sensitive region. A further specialization of the cochlea is that it is particularly insensitive to frequencies around 59.5 kHz (the exact frequency depends on the individual animal's resting pulse frequency) which would be the bat's pulse second harmonic frequency. This prevents masking effects.

The CF-CF region of bat secondary auditory cortex is able to detect velocities by means of pulse-echo frequency shifts, as described above. Why does a bat need the much finer discriminations of frequency shift that are made possible by the specialization of the DSCF area? Illustrations of the mustached bat primary auditory cortex look as if a magnifying glass had been laid over it at 61 kHz. In this instance, the visual illusion corresponds to a functional truth. The mustached bat is so sensitive to frequency differences in this range that it can pick out frequency shifts an order of magnitude smaller than those useful in detecting wing motion in flying insects (Suga 1990, 62; 1994, 189). What is implied is that the DSCF region can be used to pick out DETAIL in the fluttering wings of the insect.

But columns of the DSCF map are complex filters that further sort frequency reports into joint reports of frequency and amplitude. How does the bat use maps that pick out a frequency at a particular amplitude? Echo amplitude varies with the surface properties of the object reflecting the biosonar pulse. Combined with the DSCF region's very acute frequency resolution, amplitude resolution would seem to be able to report the details of a flying object's textures as well as the relative velocity of these details as the wings flutter. Taken together the DSCF map would give the mustached bat a very finely tuned ability to identify friends or food.

3. map plasticity: the optic tectum of the barn owl

The optic tectum of the barn owl (which corresponds to the higher of several midbrain nuclei in mammals) has been thought to have auditory maps in which units respond to binaural time differences and thus detect the azimuthal location of a sound source. It is also thought to have, in register with it, a visual space map with units responding to activation from the eyes (Brainard and Knudsen 1993). A more recent study (Brainard and Knudsen 1995) reports the presence in this region of the optic tectum of neurons whose response is bimodal: they will respond to a particular interaural time difference OR to activity in a particular visual region OR to a coincidence of both. A sheet of such neurons would constitute an auditory-visual map of spatial location relative to the owl's head.

The fact that sense modalities are already integrated at the midbrain and the discovery of bimodal neurons revises our tendency to think of sense modalities as physiologically segregated. But the most striking aspect of Brainard and Knudsen's work has been their finding that the auditory response properties of these bimodal neurons are developmentally calibrated to their visual response properties.

Baby barn owls raised wearing prismatic spectacles which shifted their visual fields 23lo to the right were found to have their optic tectum visual receptive fields systematically shifted by the same amount, and their auditory sound source location sensitivity systematically shifted along with them. Bimodal neurons that would normally have responded to a sound source OR a visual event at x were now responding to events shifted 23o.

But it was also found that this tuning dependence of auditory response on visual response was not automatic. In a separate experiment, baby owls were not fitted with prismatic lenses until they were 60 to 80 days old. By this time their bimodal optic tectum neurons had established normal response properties. When they were then fitted with prismatic lenses their bimodal neurons' visual receptive fields shifted (of course) immediately, but their auditory receptive fields took several weeks to shift. In the intermediate states, the neurons tuning to interaural time differences was found to be very broad. Neurons took longer to respond to interaural time differences, and when they did respond their responses lasted longer. These intermediate response characteristics are reminiscent of the behavior s of connectionist hidden units when they have been training up to a new data set.


1. delay lines: barn owl nucleus laminaris maps

Barn owls hunt at night and are able to locate their prey by hearing alone. It is thought that their auditory systems use interaural time differences to find the azimuthal (right-left) position of a sound source, and that they use interaural intensity differences to find its elevation.

Spike discharges from frequency-sensitive neurons in the auditory nerve's tonotopic array are phase-locked to the stimulus whose arrival they are reporting: in other words, there is a common latency between stimulus arrival and spike discharge, so all frequencies will be reported synchronously. This phase-locked common report will be propagated up through the brainstem in line labeled spatial parallel as well. At the cochlear nucleus, the phase-locked array is propagated both on up to the next brainstem nucleus on the ipsilateral side and across to that nucleus on the contralateral side. The brain stem nucleus at which phase-locked spike trains converge from the owl's right and left ears is the nucleus laminaris.

The nucleus laminaris is a 3-d array which includes a tonotopic map that functions as a time-to-space converter. It does so by two means, one spatial and one temporal. The spatial organization of the map is this: phase-locked parallel frequency-reporting neural signals arrive from the ipsilateral ear at the side of the map facing the back of the owl's head. Signals from the contralateral ear enter the map from the side facing the FRONT of the owl's head. Fibers from both sides of the map and thus from both ears interdigitate across the map.

The temporal organization of the map involves delay lines by which a signal arriving at either edge of the map will be transmitted into the map by a range of delays that corresponds to the range of interaural time disparities the owl encounters. Within every isofrequency band, depth into the map will be correlated to a specific temporal offset from arrival time. Phase-locked spikes from the ipsilateral ear will thus arrive at the back (dorsal) edge of the map with no delay, but they will arrive at the far edgethe ventral edgewith maximal delay. For phase-locked spikes arriving from the contralateral ear, relative delays will be the obverse. In this way, a point at any specific depth into the map will correspond to a RATIO of ipsilateral ear delay and contralateral ear delay.

Neurons in the nucleus laminaris binaural time difference map are coincidence detectors, which may fire weakly when activated by a monaural signal but will fire maximally only when activated simultaneously by signals from both sides of the map. A phase-locked signal arriving from either ear will be broadcast into the map at all the delays afforded by the range of delay lines, but it will have a critical effect only at the one position in the map where it coincides with a signal transmitted from the other side of the map. This position corresponds to a particular interaural delay ratio, and thus neural activation at this position specifies the azimuthal position of a sound source in relation to the owl's head. A neuron firing maximally at this position will begin to direct the owl's gaze, head motion, or flight.

From the nucleus laminaris the frequency x time delay map projects to several nuclei at the midbrain, where, in sequence, a) neuronal selectivity for interaural time difference is sharpened, b) signals from different frequencies converge on single neurons to create a simplified map, and c) time and intensity pathways converge to form a map of auditory space that is bicoordinate (Konishi 1990, 3245).

2. delay lines: mustached bat FM-FM area

The mustached bat makes similar use of delay lines in its target range map. There is a tonotopic region in bat secondary auditory cortex called the FM-FM region because it responds exclusively to the frequency-modulated component of the mustached bat's biosonar pulse emissiona downward sweep of about one octave. Like the constant frequency portion of its pulse emission, the mustached bat's frequency sweep has four harmonics. Positions in the FM-FM array respond to specific delays between pulse fundamental and echo second harmonic: two other areas in secondary auditory cortex have been found to compare pulse fundamental with echo third and fourth harmonics.

Pulse-echo delay is a measure of target distance: a one-millisecond delay will correspond to a distance of 17.3 cm. at an air temperature of 25o C (Suga 1990, 65). The FM-FM map is organized so that iso-delay bandswhich signify the same target distance across the mapare orthogonal to an amplitude axis. As in the bat's CF-CF map, amplitude is correlated with fine size and texture characteristics of a target, so maximal activation at some position in the map will indicate a particular kind of object at a particular distance.

The iso-delay bands in this map are prepared at several lower levels in the bat's brain. At the midbrain level there are neurons responding to pulse fundamental and to echo second harmonic individually. When these responses propagate upward to the next processing stage, echo responses will naturally lag behind pulse responses. But here delays are created in the axons delivering pulse fundamental spikes: echo responses will be delivered quickly, and pulse responses will be transmitted over a range of delays corresponding to the range of delays the bat uses for range-finding.

Echo reports and pulse reports are brought together in a map in the medial geniculate body of the thalamus. The MGB is the nucleus which projects exclusively to auditory cortex; it therefore contains and projects a number of auditory maps serving different purposes. The map which projects upward to the bat's cortical FMFM area is specialized to correlate pulse-echo timing. As in the owl's nucleus laminaris map, the pulse fundamental signal is broadcast across the map at a range of delays. (But here only the pulse signal is delayed; the echo signal is constant across the board.) Coincidence detecting neurons within the map will respond maximally to the coincidence of an echo signal with a pulse signal delayed by the amount that corresponds to a particular pulse-echo delay. The map's activation at this point is then passed up to the cortical FM-FM map with which it is in topographic register.

3. delay lines for auto-correlation: Mead's silicon match filter

Carver Mead has demonstrated yet another sort of use for delay lines. His (1995) design for a silicon auditory nerve complex to be implemented as an analog chip and employed with his (1992) silicon cochlea uses delay lines along with phase-responsive signals to extract frequency peaks from noise.

It works this way: the signal corresponding to an auditory nerve impulse is propagated along two lines, one of which is delayed by an interval equal to the period of the frequency to be detected at that neuron's position in the tonotopic array. The signal is then auto-correlatedthe delayed signal is fed to a coincidence detector which by now is receiving the next impulse on the direct line from the auditory neuron. Delayed spikes that are true reports of the presence of a particular periodicity at the cochlea will coincide with undelayed report of the next cycle of that periodicity. Responses that coincide will report a frequency peak, but coincidence detectors will fail to respond to non-periodic spikes; they will be functioning as match filters.

Mead intends his analog model to be neuromorphic; he considers it to be an implementation of a plausible hypothesis about auditory nerve frequency peak extraction. Auditory neurons reporting frequencies below 5kHz do tend to discharge no more than once per cycle and they are phase- responsive. Where stimulus cycles are too short for an auditory neuron to be able to respond once per cycle, Mead says, match filtering could be done by groups of neurons (Mead 1995, 265)

4. Hopfield's phase-time coding

The two sorts of delay-lines mentioned so far have used delays sequenced to neural spikes phase-locked to stimulus timing. These delays have been used to map temporal ratios onto a spatial array. Hopfield has suggested a scheme of neuroarchitecture by which intensity could be mapped onto a temporal pattern, which could perhaps be mapped onto spatial position further up by means of delay lines as already described. Hopfield calls his scheme action potential phase-time coding (1995, 33).

Intensities of the frequencies reported at the auditory nerve have usually been thought to be rate-coded: a frequency at low energy will generate a few spikes and a frequency at higher energy will generate more. But Hopfield suggests than an initial detector cell such as the auditory neurons responding to inner hair cells at the basilar membrane may have a subthreshold oscillatory membrane potential generated by "intrinsic cellular effects or multi-neural circuitry" (Hopfield 1995, 33). Since neural spiking occurs when incoming current reaches threshold strength, high-intensity events (a relatively large displacement of the basilar membrane) will generate more current, and so will cause this sort of detector neuron to spike earlier in its oscillatory cycle. Thus the PHASE of an auditory neuron's spiking could indicate signal intensity. If cells generate more than one spike, the phase of the first could indicate intensity and later spikes could indicate something elsea sort of signal multiplexing.

5. delay used to create simultaneous scenes

Dear et al. (1993) have suggested a use of delays to solve yet another problem of neural coordination, again in the bat's FM-FM pulse-echo distance map (this time in the big brown bat). Dear and his colleagues have suggested that a delay mechanism could explain the creation of a global `scene' in which objects existing at different distances could be sensed simultaneously.

The delays suggested are not accomplished by means of delay lines between maps, but by means of temporal response properties of neurons themselves. It was observed that in a nontonotopic area of the FM-FM map, a neuron's response latencythe latency of its response to an echo-delay pairis proportional to its best delay. That is, a brief delay is reported quickly; a long delay naturally takes longer to report. But in an adjacent area which IS tonotopic, many neurons tuned to short delays in fact have long response latencies. In this way neurons at different positions, which correspond to pulse-echo delay times and thus to sound source distance, can all be reporting simultaneously. In the intervals between pulse emissions the bat can neurally ACCUMULATE a simultaneous sense of objects present at different distances, and moreover will have it in a physical form that can map directly onto motor coordinates effecting swerves or accelerations.


1. connectionist mapping: monaural sound location in the cat's superior colliculus

The auditory localization map-transformation processes described so far have used binaural time cues and binaural level cues, respectively, to detect sound source azimuth and elevation, the two coordinates being determined separately and then convolved at the superior colliculus, which is the higher of several midbrain auditory nuclei.

Neti and Young (1992) have constructed a connectionst model of groups of neurons that could determine sound source azimuth AND elevation by monaural frequency cues alone. Their net had 128 input nodes, 4 to 10 hidden unit nodes, and was output-mapped onto a two-dimensional array representing azimuth and elevation coordinates. It was fully connected and used an error back-propagation algorithm for weighting adjustments.

The net was trained on data extracted from microelectrode studies of cats. Input nodes were given values corresponding to the spectra of tones as they would be after reflection within the cat's outer ear. Values at output nodes corresponded to sound-source decisions as they occurred in superior colliculus neurons. The model was also trained to be level invariant by randomizing levels of training inputs.

Analysis of activity at hidden units showed that the cue most useful to the net was the frequency of the first spectral notchpresumably created by interference effects on pinna-reflected lower frequencies. For the net, and presumably for the cat, the frequency of the first spectral notch specifies elevation and azimuth uniquely (Neti and Young 1992, 3141).

Neti and Young summarize what they take to be the wider implications of their model in this way:

It is important to notice that the first notch cue is an emergent property of the network solutions. There are no constraints on the way in which models combine information across frequency regions, except that combinations of information across frequency is a key aspect of the transformations performed by the models. This sort of information processing may be an important aspect of complex auditory pattern recognition tasks in general, and the use of spectral cues for sound localization may provide a convenient, straightforward, and easily interpreted paradigm for studying complex stimulus processing in the auditory system (1992, 3154).

The most interesting suggestion I would draw from Neti and Young's model is that we may be misled by thinking of neural elements of any auditory map in too simple a way. Some and perhaps all of them may be working like hidden units in a connectionist net; their function may be just to learn to respond in whatever way works for the global input-ouput transform being performed. They may not have functions that can be named in terms of the elementary acoustic variables of time, frequency and intensity It may not even be possible to name their individual or group functions in terms of combinations or ratios of these variables.

The second interesting suggestion is of course that the auditory system may be doing things more than one way at a time; it may be partially redundant. Or maps higher in the auditory chain may be able, once trained, to configure themselves independently, without input patterns extracted at brainstem or midbrain maps. High-level maps might be able to use hidden units to extract or create higher-order patterns solely on the basis of sideways connections at their own level. (Auditory hallucinations come to mind.)

2. a connectionist delay line

Hopfield (1991) has demonstrated a connectionist net implementation of a procedure for recognizing sequences: his particular net was a word recognition processor that was able to correctly identify spoken names from short-time spectral characteristics. His net had no clocked periods; the circuit was data-driven. Recognition was at word-level, not phoneme-level.

Input to the net was from 32 centre-surround bandpass filters Each input node was connected to each of 10 hidden units by means of ten delay lines, each with a different delay. Weights on each delay line were made proportional to just how much the presence of that frequency at that position in a sequence would contribute to word recognition. The net was trained on data that included noise, and it proved to be noise resistant.

Delay lines worked to translate a sequence of spectral features into a simultaneous description by means of coincidence detector at the hidden unit level. Say an input node reporting the presence of a frequency x simultaneously activates delay lines with delay values of 1 to 10. Then another input node reporting the presence of frequency y simultaneously activates its own set of delay lines with values 1 to 10. The x activation in delay line 2 would coincide at the detector receiving y activation by delay line 1. And so on.

Hopfield's net (which he claims could be implemented on analog chips) successfully recognizes words in its output vocabulary with nearly 100% success, and it does so with connected speechi.e. with sound sequences that do not mark the beginnings and ends of words. Hopfield has suggested further that the net's recognition capabilities could be extended by the addition of units that respond to abrupt onset/offset of spectral energies and that could detect the local direction of formant energies.

Hopfield's speech recognition mechanism is a frequency x time position matrix. Even more generally, it is a sequence detector: a time pattern detector. Implemented as neural circuitry it could sort time patterns for various purposes and on various scales.

3. within-neuron processes: connectionist model of bat FM-FM neurons

There has been a recent attempt (Chittajallu and Wong 1994) to model the behavior of single delay-sensitive neurons in a bat's FM-FM area by means of a connectionist net. The study used empirical data from little brown bats whose FM pulse emissions are somewhat different from those of the mustached bats.

The network configuration that was found most successful in modeling the biological FM-FM neurons' overall behavior had two input nodes, 15 hidden unit nodes and one output node. It was a fully connected feedforward net with suboptimally set connection weights adjusted by means of an error back-prop rule. Input values were of two kinds, pulse repetition rate and pulse-echo delay times. The net was trained on a data set taken from microelectrode studies of actual FM-FM neurons; output intended was the number of neural spikes actually observed under various input conditions.

Biological neurons were observed to have alternative operating states. When the bat was in search mode and its pulse repetition rate was low, it responded to some extent to a fairly wide range of echo delaysits delay-width profile was quite widely spread around the best-delay value at which it responds maximally. But when the bat was in approach mode and its pulse repetition rate was high, its delay width window became narrowerit became much more finely tuned to its best-delay value. The connectionist net was able to replicate this shift in response with operating modality.

This connectionist model is interesting mainly as a picture of a kind of complex filtering that 1) could be LEARNED by a neuron purely in response to correlated patterns of input; and that 2) can shift some but not all aspects of its behavior in relation to large or mode shifts in just one of its input parameters.

In other words, it shows how a neuron given particular combinations of input can show responses different from others of similar kinds, purely on the basis of the very particular combinations of input it receives. But neurons in a connected array receive systematically different input simply in virtue of their position in that array.

It also shows how an overall difference to one or more of the map's input patterns can reset the operating mode of all the neurons in the array, so that the array is doing something slightly different, or doing it in a different way. An array of neurons similar to the one modeled above could have an ability to adapt its behavior to search and approach conditions. But we could imagine cortical maps able to switch their operating modes in other ways tooas between perceiving and imagining conditions, for instance, or in different sorts of context. Chittajallu and Wong's connectionist model can be seen as a model of a neuron's, or a circuit's, or a map's, necessary plasticity.

Certain kinds of sound cue have values that will have to change as an animal matures. Pinna filtered first notch frequency values as used by the cat for monaural sound source localization will shift as the shape of the cat's ear changes, for example. The precise organization of the cat's superior colliculus sound source map will presumably have to reorganize itself as the ear grows, in the same gradual way that the owl's azimuth map adjusted to visual shift.

There is evidence also that auditory neurons in auditory maps will change their response characteristics as a result of increased or specialized practice. Merzenich et al (1993) report that the tonotopic map in adult owl monkey primary auditory cortex changes its organization significantly after the animal has been trained to make fine frequency discriminations. The surface area of the map responding to frequencies used in training increases; response latencies to these frequencies decrease; and individual neurons are more sharply tuned.


In a recent paper (1994) Nobuo Suga, who has been responsible for the bat studies that give us our most detailed picture of nervous system auditory function, says he has begun to believe that our functional descriptions of auditory maps have been too simple minded:

A revision of the description of auditory processing in the bat's brain is necessary with respect to whether each specialized cortical area is devoted to processing just one aspect of an acoustic signal, because we know that the response of a single cortical neuron, e.g., an FM-FM neuron, is affected by several acoustic parameters other than echo delay, such as echo frequency, Doppler shift, echo amplitude, relative target size, target direction azimuth and elevation. Furthermore, the FM-FM neuron does respond to communication sounds to some extent. Therefore, a single cortical neuron must be involved in processing different types of auditory information. That is, it must have multiple functions. If the neuron has multiple functions in signal processing, some function may be a primary function and others may be secondary functions. (1994, 137)

What this means in practice is that FM-FM neurons show their sharpest tuning to echo delay but their response also shows systematic relation to echo amplitude, echo source direction, and relative target velocity. The same neuron will also respond weakly to high amplitude communicational calls. It is possible, Suga says, that the function of a bat's auditory maps will depend on whether the bat is in echolocation mode or in communication mode. "This clearly indicates that the characterizations of single neurons in our past experiments are partial" (1994, 136, 142).

I propose a `multifunction' theory which states that each specialized area has a primary function for processing a particular aspect of an acoustic signal and non-primary functions for processing others. The spatial distribution of neural activity of this area depends upon its primary function, but the magnitude of neural activity is influenced by its non-primary functions. (1994, 141)

The discovery or acknowledgment of this sort of multifunctionality is the most challenging aspect of recent auditory neuroscience. It is difficult to picture a multifunctional processor's activity, because its units must be imagined to be doing different things not only at the same time but in the same space: in the same map. It becomes difficult to know what to call what it's doing: is it `representing' hundreds of KINDS of `information' at the same time?

Theodore Bullock of the Neurosciences Institute at UCSD talks in terms of "multiple, simultaneous codes among the many parallel channels":

A given neuron, or even a given axon, probably conveys information at the same time in two or more distinct codes for different recipient cells or arrays. One post synaptic cell, for example, may read the number of spikes in the last 0.5s, with a certain weighting for the most recent intervals, while another post synaptic cell integrates over several seconds, and a third is sensitive to so-called trophic factors or to irregularity of intervals or to collaterals where impulses do not reach the terminals and only decremented, graded, slow potentials carry information from this input channel. (Bullock 1993, 6)

But I am not sure that talk of `codes' or even of `channels' is very useful. What I do find useful is the approach being developed among connectionist modelers, who say it is not necessary or even possible to say exactly what `features' hidden units are `representing' or `detecting.' What is more helpful is to know that such units can modify themselves to strengthen just those connections that work for the animal. Perhaps a neuron in a bat's cortical FM-FM area participatesto some degreein discriminating EVERYTHING the animal is sensing at any given moment.