MAPS, MAP TRANSFORMATIONS AND FILTERS IN
THE AUDITORY NERVOUS SYSTEM
We know much more about auditory perception than we did ten
years ago, by reason of a convergence of work in ethology, neuroscience
and connectionist computer modeling.
Ethologists study the behavior of animals in their natural habitat; neuroethologists
work with animals in laboratories, but they attempt to study behaviors or
abilities that would be important to the animal in the wild. Neuroethologists
studying animal audition will thus study neural response to biologically
important sound, often in animals with auditory specializations, like bats,
owls and cats. Auditory neuroethology uses the findings and some of the
technologies of auditory neurophysiology: action potentials are recorded
from microelectrodes while recorded sounds are played to the animal through
either headphones or loudspeakers. Bat studies for instance use recordings
of the bat's own pulse emissions with simulations of echos at various rates
of delay and frequency shift.
Along with microelectrode studies, auditory neuroscience proper has made
progress in the embryology and physiology of the auditory nervous system:
much more is known about the fine detail of its development, synaptic chemistry
and connectivity. On a larger scale, improvement in PET/MRI imaging has
surprised us with information about ways different regions of the brain
work together when we hear or produce sound.
Connectionists modeling auditory neurophysiological or neuroethological
work will train nets to replicate neural response given the original stimulus
conditions as input. Or they will offer a connectionist simulation of some
aspect of auditory function as a hypothesis about neural circuit design.
Net models are models of pattern recognition accomplished by spatiotemporal
means: spatial simultaneity and temporal sequencing in complex interaction.
As such they bring auditory epistemology into contact with a new range of
conceptsmaps, arrays, matrices, map transformations, projections and so
onconcepts that enable us to imagine more clearly the logistic aspects of
what it is for an amimal to know by acoustic means. The existence of connectionist
models leads us to ask more finely focussed questions about perception's
embodiment. Just how does the nervous system schedule connections; what
does the immense complexity of its connective topology enable?
Complexity is not easy to imagine. Spatial complexity is complexity of connections
among elements that exist simultaneously. Temporal complexity is a similar
complexity of causal relation, whereby events occurring at the same instant
can have effects which occur at different times, and events occurring at
different instants can have effects occurring at the same time.
The sense of paradigm shift in progress is very lively in some of the work
I will describe below. A pradigm shift is a shift not only in what we know
but in how we know. Our nets will have to reconfigure. We will have to be
more detailed about the space and time of perception's fine working structure,
and at the same time we will have to keep in mind the instantaneous wholeness
of the perceiving animal. Going even farther, since perception is after
all contact with the world, we will have to learn to imagine a wholeness
of the perceiving person or animal IN an environment, wholeness constantly
reconfigured, and reconfigured perhaps globally, as the environment changes.
The notions that seem currently to be the most useful, and that are under
interesting revision, are the notions of map, staged parallel map transformation,
filter, temporal filter, and multiple function. These ideas are mutually
dependent in ways I will try to begin to demonstrate in what follows.
Certain regions of nervous systems are thought of as maps
because when the response properties of the neurons included in them are
tested, they are found to vary systematically with the neuron's position
in that region. They may vary systematically in one dimension, in two, or
in more than two. Their axes of variability may be orthogonal or nonorthogonal.
Or the maps may be irregular in various ways: there may be functional bulges
whereby one coordinate position has more cells, or cells that are more sharply
tuned, or cells with a lower firing threshold.
A map region is often distinctive in other ways. Its cytoarchitecturethe
look and arrangement of its cells, columns, bands, layers and axonal interconnectionsmay
differ from that in surrounding areas. It may have distinctive lines of
in- or out-connections to other regions.
Maps are located at all levels of the auditory nervous system, from the
auditory nerve just behind the cochlea, to the surface of the cortex, and
they project onto one another in register, all the way up and down the line.
Some of these projections fan in from more cells to fewer, but overall the
auditory projection fans out, from 90,000 cells at the cochlear nucleus
in the brainstem, to 400,000 cells in the inferior colliculus of the midbrain,
to ten million cells in auditory cortex. Some projections can be thought
of as through or labeled lines, because they pass activation patterns without
significantly transforming them. In these instances we can say the map projects
directly. But more often a map is transformed as it projects through a processing
Different kinds of map transformation are possible. The simplest is pattern
clarification: noise is eliminated by cooperative settling among interconnected
cells at the same level, or by weighted connections with a next layer (Mead
Maps may gain dimensions as they project to a next level. Maps from the
two ears are convolved at the cochlear nucleus, for example. In the superior
colliculus auditory maps are projected onto visual maps. At various levels
all the way up, auditory maps are convolved with motor response maps. And
maps may integrate feedback projections.
A map may project into several maps with fewer dimensions; it may split.
Or it may gain dimensions by integrating more sorts of variable. It may
also effect a derivationit may extract higher-order patterns, either spatial
or temporal or both. It may also be doing all of these things at the same
time, by means of elements and connections whose multifunctional capabilities
we are just beginning to be suspect.
I will be giving examples of auditory map transformations in sections III
and IV, but first I want to say something about the conceptual relations
among maps, map transformations, and filters.
Before electronics, a filter was a strainer that allowed one kind of thing
to pass while retaining something else; a housewife would filter milk by
passing it through a cloth (L. feltrum, a felt cloth). In a time when filters
are often also transducers, however, our sense of filtering has expanded
to include different kinds of selective response. For instance we speak
of auditory neurons in the cochleasemi-metaphoricallyas centre-surround
band-pass frequency filters. They do not of course transmit or retain anything:
they respond selectively.
In the extended, metaphorical sense of the term, any neuron, any column
of neurons, any circuit or layer of neurons, any neural map, may be thought
of as a filter, as long as it is responding selectively. We can go farther
and say the entire nervous system is acting as a global filter, since it
is an overall configuration of selective response. I will come back to this
point, but what I want to emphasize here is that neural connections are
not pipes. The do not transmit some fluid, or some symbol, or some packet
of `information'. When they are connections among sensory neurons, they
respond selectively in ways that are putting the animal in touch with its
The concept of filter is related to the concept of map in
two ways. We can think of the map as being made up of filters, as being
an array or matrix of unit filters, and we can think of the map as being
a filter itself. The difference is a difference of logical type: the map's
overall response is a spatiotemporal pattern made up of the responses of
its elements. A neural map transformation is a process by which one
map's spatiotemporal pattern contributes causally to the spatiotemporal
pattern selected at the next.
It is difficult to speak clearly about any complex four-dimensional process,
but it might be approximately correct to say that a map transformation is
an even higher-order filter, because, seen in temporal cross-section, if
there is such a thing, it is a patterning of patterns.
Neurons are not simple: their function is to produce an appropriate pattern
of output spikes in response to the state of their "approximately 104
input synapses" by a dynamic process that "arises out of the interaction
of the many species of active and passive conductances that populate the
dendritic membrane of the neuron and that serve to sense, amplify, sink
and pipe analogue synaptic currents en route to the spike generation mechanism
in the soma" (Mead 1995, 268).
Another way nervous system filtering is not simple is that it is highly
tunable. Neurons are selective in several ways. They can have different
valences: they can be excitatory or inhibitory. They can have different
activation thresholdsthey can fire only when strongly activated, or they
can fire to any old input. They can be sharply tuned or they can have a
wide response window. But all of these response characteristics can change.
They can alter as a consequence of field effectsdiffusion of neurotransmitters
through intercellular fluid, for instance, or the presence of slow-wave
potentials throughout an area (Bullock 1993, 8). By these means a neuron's
selective response properties could make large shifts in operating mode.
And synapse properties are being modified all the time: synaptic change
is what allows an animal to learn.
The tunability of wetware filters makes them strikingly different from the
sorts of hardware filters we know. A second large difference is how multidimensional
a wetware filter can be. A neural filter's response decision must emerge
from a constellation of factors all operating simultaneously and all contributing
with different relative weights. Seen at a very fine scale, ANY of these
factors may be systematically relevant to the animal's perceptual circumstance,
and thus may count as a response parameter.
On a scale closer to the scale of sound events as we usually think of them,
we find neurons that are combination-sensitive to different degrees. A neuron
may respond to a particular coincidence of a signal from the left ear and
a signal from the right ear, but only where both are reports of some specific
frequency. Other neurons are tuned to complex frequencies: they respond
to the simultaneous presence of several or many frequencies, but only at
particular amplitudes. And so on. Neurons may even be bimodal: audio-visual,
or audio-visual-motor. Still others could be called temporal filters because
they respond only to particular sequences. I will describe various kinds
of time-filter in IV.
II. THE AUDITORY NERVOUS SYSTEM
1. the cochlear map
A map is a map in virtue of the fact that it scales on (at
least) two axes. In the auditory nervous system, one of these axes is invariably
the frequency axis. Auditory maps are tonotopic. Another way to say this
is to say they are cochleotopic. Auditory maps in animal brains are all
prepared at the cochlea.
The micromechanics of the basilar membranemembrane stiffness, membrane mass,
and density of cochlear fluidcause its response to pressure waves to vary
systematically with displacement from base to apex. Subsurface neurons are
able to transduce this differential response into action potential spikes.
In effect, the cochlea sorts wave trains into their component periodicities,
and reports the presence ( or sometimes absence) of any particular frequency
by activity at a position on a scale. This scale can be thought of as projecting
right up through all the auditory nuclei in the brainstem to the surface
of the cortex, and then even further, into the forebrain and hippocampus.
There are even return projections from cortex back to subcortical nuclei.
The frequency make-up of a pressure wave train is thus reported by spatial
means, but pressure waves vary along more than one dimension. Individual
frequency components also have temporal and amplitude properties. These
are not transduced spatially but temporally. The larger scale timing of
acoustic events just propagates up through the auditory system: the animal
is perceptually entrained so that, speaking generally, it begins to respond
when a sound event begins, and stops responding when the sound event stops.
On a finer scale, the auditory nervous system may phase-lock the response
timing in its scaled bank of frequency filters so that frequency reports
are synchronized as they project up through the brainstem.
The auditory nervous system can also use temporal scheduling the way it
uses spatial ordering, to report variation in dimensions that are not temporal.
The amplitude of a frequency component may be transduced as spike count,
for instance. It can also be transduced by phase relative to some reference
Because time-ordering is used to report both event timing and amplitude
variation, the auditory system is thought to segregate time and intensity
reports for at least parts of the march of frequency reports up to the cortex.
In other words, the frequency scale is sent up two different pathways, one
specialized to report stimulus time and one specialized to report intensity.
This segregation of function resembles the separation of visual reports
into separate streams for object shape and spatial motion (Kosslyn 1995,
The split between time reports and intensity reports is prepared at the
auditory nerve cells with different properties. In bats, for instance, the
auditory subsystem that determines target velocities needs initial frequency
reports that are very precise. But the auditory subsystem that determines
target distance needs less sharply tuned frequency reports and more precise
time reports (Konishi 1988). These differences originate at the first auditory
The essential things to notice about cochlear projections are first, that
they are PATTERNS, both spatial and temporal, and second, that as soon as
we have spatial and temporal patterns we also and immediately have patterns
of patternshigher order patterns available to any specialized subsystem
that wants them. Thus, at some higher level the auditory nervous system
has available to it the higher-order invariants that specify what we think
of as sound characteristicsspectral envelope, formants, transient onset/offset,
etc. And at some presumably highest level it will have available to it the
higher-order invariants that specify the sound eventthe sounding objectsin
the animal's environment.
2. cortical maps
In humans and other mammals there is an area called the primary
auditory area or A1. It is also called the auditory receiving area, because
it receives almost exclusive essential projection from the medial geniculate
nucleus, which is the last and most important switching station between
ear and cortex.
In cortical field A1 neurons with very similar best frequency are arranged
vertically with respect to one another and within a narrow region of cortex
can be found in all cortical cell laminae. Similarly, within each lamina
neurons with similar best frequency can be found distributed horizontally
... At threshold sound pressure levels the resultant motion of a small region
on the basilar membrane ultimately leads to excitation of a relatively small
population of neurons that can be viewed as being arranged in a band having
length, depth and width. (Merzenich and Brugge 1973, 293).
A1 is located on the superior temporal plane, that is, on the flat upper
surface of the temporal cortex, hidden inside the Sylvian fissure, which
separates temporal cortex (adjacent to the ear) from parietal cortex (above
and behind the ear). The exact function of A1 is not known, but PET studies
have shown A1 to be bilaterally activated by ANY auditory stimulusnoise,
music, speech or environmental sound (Zatorre 1985, 38).
Secondary auditory areas (one of which is called AII in mammals) also receive
direct projection from auditory centers in the MGB, but they receive input
as well from other parts of the cortex, and are therefore called auditory
association cortex. They are thought to be necessary to "higher order
auditory processing of complex signals" (Zatorre 1992, 847). Careful
microelectrode mappings of secondary auditory cortex in macaque monkeys
(Merzenich 1973, 292) has shown at least five secondary auditory areas,
which are thought to be distinct regions both for cytoarchitectonic reasons
and because several of them show tonotopic response. Four are distributed
around AI on the superior temporal plane, and the fifth lies just adjacent
to one of these. It is thought that secondary auditory cortex may also extend
a short distance into the upper, parietal surface of the Sylvian fissure.
Bat secondary auditory areas have been well mapped as well (Suga 1994, 1993,
1990, 1985). Eight functionally distinct maps have been discovered in auditory
cortex of the mustached bat. I will describe three of these in sections
III and IV.
Areas of the temporal lobe adjacent to regions identified as primary or
secondary auditory cortex are areas called periauditory:
Out from these primary sensory fields come fibers that synaptically affect
adjoining areas that cannot unreservedly be called sensory, ... and out
from these areas come fibers that terminate in areas still farther away
from the primary sensory fields. The areas of the neocortex at various removes
from the primary fields are called association areas ... more advanced stages
of processing presumably are embodied in association cortex. For example
there are places where the auditory and the visual converge (Nauta 1990,
III. EXAMPLES OF AUDITORY MAPS
1. how a map works: the mustached bat's CF-CF area
Audition is particularly important to bats. Their auditory
neurophysiology is specialized in interesting ways and as a consequence
bat audition has become important also to neuroethologists. A bat is a very
small mammal: its entire brain is "about the size of a large pearl"
(Suga 1990). Nevertheless auditory functioning of the bat central nervous
system has been mapped in unusual detail. A number of auditory regions have
been found both in the bat's cortex and in lower nuclei, and we have recently
come to know quite a bit about how they work and what they accomplish.
Bat audition is unusual in the importance it gives to reflected sound and
to sounds emitted by the animal itself, but the general principles of acoustic
perception as we are beginning to understand them in bats generalize well:
what we see in operation are filters (complex and simple), maps, and staged
parallel projections which effect map transformations.
One of the auditory maps discovered in secondary auditory cortex of the
mustached bat is called the CF-CF area because it correlates responses to
constant frequency pulses the bat emits when it is hunting flying insects.
The pulses emitted consist of a fundamental (about 30.5 kHz) and three harmonics.
The bat is able to control the energy of each harmonic. The fundamental
is always very low in intensity: it is the bat's reference frequency, and
also the frequency by which the bat knows its own pulse from those of its
companions. The prominence given other harmonics varies according to conditions.
Low frequencies will be less attenuated by distance, but high frequencies
are more useful when closing in on small, fast, near objects.
The primary function of the CF-CF area is to prepare the animal to avoid
obstacles and to respond to target velocities; It is in effect an array
of Doppler shift detectors. As a map it can be said to be tonotopic, but
along several axes at once. It is in fact a matrix whose elements respond
to specific combinations of pulse fundamental and echo harmonic. Pulse fundamental
frequency varies along one axis, and the second and third echo harmonics
increase along the axis orthogonal to it, segregated in strips adjacent
to each other (Suga 1990, 65). Both strips show a disproportionate number
of neurons responsive to velocities the animal encounters in important maneuvers
such as closing on prey or docking at a roost.
2. map magnification: the mustached bat's DSCF area
In the mustached bat the range of best frequency response
across the tonotopic spread of the primary auditory map is approximately
10-100 kHz. Around 61-61.5 kHz this map bulges into a specialized subregion
that takes up 30% of primary auditory cortex and is given its own name.
It is the DSCFDoppler shift compensated constant frequencymap. Columns in
this map are 40-50 neurons deep. Each column responds to a particular combination
of frequency and amplitude of the second harmonic of a constant frequency
The map is radially organized:
One can (crudely) picture the area as a bicycle wheel: as one moves outward
along a spoke, the best frequency of the neurons increases; as one moves
circularly from one spoke to the next, the best amplitude changes. (Suga
The DSCF region's specialization is prepared at the cochlea, which has a
similar expansion of tuning sharpness around 61-61.5 kHz, the frequency
corresponding to the normal second harmonic of the bat's resting pulse fundamental.
Under resting conditions the echo second harmonic will also fall into this
area. But when the bat is using pulse-echo frequency shift to detect target
velocities, Doppler shift can increase echo second harmonic frequency to
a point where it falls outside the cochlea's area of increased sensitivity.
The result would be a relative audio blindness if the bat did not have recourse
to Doppler shift-compensation. By lowering the constant frequency pulse
it emits by about 2 kHz, it brings the Doppler shifted echo into its sensitive
region. A further specialization of the cochlea is that it is particularly
insensitive to frequencies around 59.5 kHz (the exact frequency depends
on the individual animal's resting pulse frequency) which would be the bat's
pulse second harmonic frequency. This prevents masking effects.
The CF-CF region of bat secondary auditory cortex is able to detect velocities
by means of pulse-echo frequency shifts, as described above. Why does a
bat need the much finer discriminations of frequency shift that are made
possible by the specialization of the DSCF area? Illustrations of the mustached
bat primary auditory cortex look as if a magnifying glass had been laid
over it at 61 kHz. In this instance, the visual illusion corresponds to
a functional truth. The mustached bat is so sensitive to frequency differences
in this range that it can pick out frequency shifts an order of magnitude
smaller than those useful in detecting wing motion in flying insects (Suga
1990, 62; 1994, 189). What is implied is that the DSCF region can be used
to pick out DETAIL in the fluttering wings of the insect.
But columns of the DSCF map are complex filters that further sort frequency
reports into joint reports of frequency and amplitude. How does the bat
use maps that pick out a frequency at a particular amplitude? Echo amplitude
varies with the surface properties of the object reflecting the biosonar
pulse. Combined with the DSCF region's very acute frequency resolution,
amplitude resolution would seem to be able to report the details of a flying
object's textures as well as the relative velocity of these details as the
wings flutter. Taken together the DSCF map would give the mustached bat
a very finely tuned ability to identify friends or food.
3. map plasticity: the optic tectum of the barn owl
The optic tectum of the barn owl (which corresponds to the
higher of several midbrain nuclei in mammals) has been thought to have auditory
maps in which units respond to binaural time differences and thus detect
the azimuthal location of a sound source. It is also thought to have, in
register with it, a visual space map with units responding to activation
from the eyes (Brainard and Knudsen 1993). A more recent study (Brainard
and Knudsen 1995) reports the presence in this region of the optic tectum
of neurons whose response is bimodal: they will respond to a particular
interaural time difference OR to activity in a particular visual region
OR to a coincidence of both. A sheet of such neurons would constitute an
auditory-visual map of spatial location relative to the owl's head.
The fact that sense modalities are already integrated at the midbrain and
the discovery of bimodal neurons revises our tendency to think of sense
modalities as physiologically segregated. But the most striking aspect of
Brainard and Knudsen's work has been their finding that the auditory response
properties of these bimodal neurons are developmentally calibrated to their
visual response properties.
Baby barn owls raised wearing prismatic spectacles which shifted their visual
fields 23lo to the right were found to have their optic tectum visual receptive
fields systematically shifted by the same amount, and their auditory sound
source location sensitivity systematically shifted along with them. Bimodal
neurons that would normally have responded to a sound source OR a visual
event at x were now responding to events shifted 23o.
But it was also found that this tuning dependence of auditory response on
visual response was not automatic. In a separate experiment, baby owls were
not fitted with prismatic lenses until they were 60 to 80 days old. By this
time their bimodal optic tectum neurons had established normal response
properties. When they were then fitted with prismatic lenses their bimodal
neurons' visual receptive fields shifted (of course) immediately, but their
auditory receptive fields took several weeks to shift. In the intermediate
states, the neurons tuning to interaural time differences was found to be
very broad. Neurons took longer to respond to interaural time differences,
and when they did respond their responses lasted longer. These intermediate
response characteristics are reminiscent of the behavior s of connectionist
hidden units when they have been training up to a new data set.
IV. TEMPORAL FILTERS
1. delay lines: barn owl nucleus laminaris maps
Barn owls hunt at night and are able to locate their prey
by hearing alone. It is thought that their auditory systems use interaural
time differences to find the azimuthal (right-left) position of a sound
source, and that they use interaural intensity differences to find its elevation.
Spike discharges from frequency-sensitive neurons in the auditory nerve's
tonotopic array are phase-locked to the stimulus whose arrival they are
reporting: in other words, there is a common latency between stimulus arrival
and spike discharge, so all frequencies will be reported synchronously.
This phase-locked common report will be propagated up through the brainstem
in line labeled spatial parallel as well. At the cochlear nucleus, the phase-locked
array is propagated both on up to the next brainstem nucleus on the ipsilateral
side and across to that nucleus on the contralateral side. The brain stem
nucleus at which phase-locked spike trains converge from the owl's right
and left ears is the nucleus laminaris.
The nucleus laminaris is a 3-d array which includes a tonotopic map that
functions as a time-to-space converter. It does so by two means, one spatial
and one temporal. The spatial organization of the map is this: phase-locked
parallel frequency-reporting neural signals arrive from the ipsilateral
ear at the side of the map facing the back of the owl's head. Signals from
the contralateral ear enter the map from the side facing the FRONT of the
owl's head. Fibers from both sides of the map and thus from both ears interdigitate
across the map.
The temporal organization of the map involves delay lines by which a signal
arriving at either edge of the map will be transmitted into the map by a
range of delays that corresponds to the range of interaural time disparities
the owl encounters. Within every isofrequency band, depth into the map will
be correlated to a specific temporal offset from arrival time. Phase-locked
spikes from the ipsilateral ear will thus arrive at the back (dorsal) edge
of the map with no delay, but they will arrive at the far edgethe ventral
edgewith maximal delay. For phase-locked spikes arriving from the contralateral
ear, relative delays will be the obverse. In this way, a point at any specific
depth into the map will correspond to a RATIO of ipsilateral ear delay and
contralateral ear delay.
Neurons in the nucleus laminaris binaural time difference map are coincidence
detectors, which may fire weakly when activated by a monaural signal but
will fire maximally only when activated simultaneously by signals from both
sides of the map. A phase-locked signal arriving from either ear will be
broadcast into the map at all the delays afforded by the range of delay
lines, but it will have a critical effect only at the one position in the
map where it coincides with a signal transmitted from the other side of
the map. This position corresponds to a particular interaural delay ratio,
and thus neural activation at this position specifies the azimuthal position
of a sound source in relation to the owl's head. A neuron firing maximally
at this position will begin to direct the owl's gaze, head motion, or flight.
From the nucleus laminaris the frequency x time delay map projects to several
nuclei at the midbrain, where, in sequence, a) neuronal selectivity for
interaural time difference is sharpened, b) signals from different frequencies
converge on single neurons to create a simplified map, and c) time and intensity
pathways converge to form a map of auditory space that is bicoordinate (Konishi
2. delay lines: mustached bat FM-FM area
The mustached bat makes similar use of delay lines in its
target range map. There is a tonotopic region in bat secondary auditory
cortex called the FM-FM region because it responds exclusively to the frequency-modulated
component of the mustached bat's biosonar pulse emissiona downward sweep
of about one octave. Like the constant frequency portion of its pulse emission,
the mustached bat's frequency sweep has four harmonics. Positions in the
FM-FM array respond to specific delays between pulse fundamental and echo
second harmonic: two other areas in secondary auditory cortex have been
found to compare pulse fundamental with echo third and fourth harmonics.
Pulse-echo delay is a measure of target distance: a one-millisecond delay
will correspond to a distance of 17.3 cm. at an air temperature of 25o C
(Suga 1990, 65). The FM-FM map is organized so that iso-delay bandswhich
signify the same target distance across the mapare orthogonal to an amplitude
axis. As in the bat's CF-CF map, amplitude is correlated with fine size
and texture characteristics of a target, so maximal activation at some position
in the map will indicate a particular kind of object at a particular distance.
The iso-delay bands in this map are prepared at several lower levels in
the bat's brain. At the midbrain level there are neurons responding to pulse
fundamental and to echo second harmonic individually. When these responses
propagate upward to the next processing stage, echo responses will naturally
lag behind pulse responses. But here delays are created in the axons delivering
pulse fundamental spikes: echo responses will be delivered quickly, and
pulse responses will be transmitted over a range of delays corresponding
to the range of delays the bat uses for range-finding.
Echo reports and pulse reports are brought together in a map in the medial
geniculate body of the thalamus. The MGB is the nucleus which projects exclusively
to auditory cortex; it therefore contains and projects a number of auditory
maps serving different purposes. The map which projects upward to the bat's
cortical FMFM area is specialized to correlate pulse-echo timing. As in
the owl's nucleus laminaris map, the pulse fundamental signal is broadcast
across the map at a range of delays. (But here only the pulse signal is
delayed; the echo signal is constant across the board.) Coincidence detecting
neurons within the map will respond maximally to the coincidence of an echo
signal with a pulse signal delayed by the amount that corresponds to a particular
pulse-echo delay. The map's activation at this point is then passed up to
the cortical FM-FM map with which it is in topographic register.
3. delay lines for auto-correlation: Mead's silicon match filter
Carver Mead has demonstrated yet another sort of use for delay
lines. His (1995) design for a silicon auditory nerve complex to be implemented
as an analog chip and employed with his (1992) silicon cochlea uses delay
lines along with phase-responsive signals to extract frequency peaks from
It works this way: the signal corresponding to an auditory nerve impulse
is propagated along two lines, one of which is delayed by an interval equal
to the period of the frequency to be detected at that neuron's position
in the tonotopic array. The signal is then auto-correlatedthe delayed signal
is fed to a coincidence detector which by now is receiving the next impulse
on the direct line from the auditory neuron. Delayed spikes that are true
reports of the presence of a particular periodicity at the cochlea will
coincide with undelayed report of the next cycle of that periodicity. Responses
that coincide will report a frequency peak, but coincidence detectors will
fail to respond to non-periodic spikes; they will be functioning as match
Mead intends his analog model to be neuromorphic; he considers it to be
an implementation of a plausible hypothesis about auditory nerve frequency
peak extraction. Auditory neurons reporting frequencies below 5kHz do tend
to discharge no more than once per cycle and they are phase- responsive.
Where stimulus cycles are too short for an auditory neuron to be able to
respond once per cycle, Mead says, match filtering could be done by groups
of neurons (Mead 1995, 265)
4. Hopfield's phase-time coding
The two sorts of delay-lines mentioned so far have used delays sequenced
to neural spikes phase-locked to stimulus timing. These delays have been
used to map temporal ratios onto a spatial array. Hopfield has suggested
a scheme of neuroarchitecture by which intensity could be mapped onto a
temporal pattern, which could perhaps be mapped onto spatial position further
up by means of delay lines as already described. Hopfield calls his scheme
action potential phase-time coding (1995, 33).
Intensities of the frequencies reported at the auditory nerve have usually
been thought to be rate-coded: a frequency at low energy will generate a
few spikes and a frequency at higher energy will generate more. But Hopfield
suggests than an initial detector cell such as the auditory neurons responding
to inner hair cells at the basilar membrane may have a subthreshold oscillatory
membrane potential generated by "intrinsic cellular effects or multi-neural
circuitry" (Hopfield 1995, 33). Since neural spiking occurs when incoming
current reaches threshold strength, high-intensity events (a relatively
large displacement of the basilar membrane) will generate more current,
and so will cause this sort of detector neuron to spike earlier in its oscillatory
cycle. Thus the PHASE of an auditory neuron's spiking could indicate signal
intensity. If cells generate more than one spike, the phase of the first
could indicate intensity and later spikes could indicate something elsea
sort of signal multiplexing.
5. delay used to create simultaneous scenes
Dear et al. (1993) have suggested a use of delays to solve
yet another problem of neural coordination, again in the bat's FM-FM pulse-echo
distance map (this time in the big brown bat). Dear and his colleagues have
suggested that a delay mechanism could explain the creation of a global
`scene' in which objects existing at different distances could be sensed
The delays suggested are not accomplished by means of delay lines between
maps, but by means of temporal response properties of neurons themselves.
It was observed that in a nontonotopic area of the FM-FM map, a neuron's
response latencythe latency of its response to an echo-delay pairis proportional
to its best delay. That is, a brief delay is reported quickly; a long delay
naturally takes longer to report. But in an adjacent area which IS tonotopic,
many neurons tuned to short delays in fact have long response latencies.
In this way neurons at different positions, which correspond to pulse-echo
delay times and thus to sound source distance, can all be reporting simultaneously.
In the intervals between pulse emissions the bat can neurally ACCUMULATE
a simultaneous sense of objects present at different distances, and moreover
will have it in a physical form that can map directly onto motor coordinates
effecting swerves or accelerations.
V. CONNECTIONIST MODELS
1. connectionist mapping: monaural sound location in the
cat's superior colliculus
The auditory localization map-transformation processes described
so far have used binaural time cues and binaural level cues, respectively,
to detect sound source azimuth and elevation, the two coordinates being
determined separately and then convolved at the superior colliculus, which
is the higher of several midbrain auditory nuclei.
Neti and Young (1992) have constructed a connectionst model of groups of
neurons that could determine sound source azimuth AND elevation by monaural
frequency cues alone. Their net had 128 input nodes, 4 to 10 hidden unit
nodes, and was output-mapped onto a two-dimensional array representing azimuth
and elevation coordinates. It was fully connected and used an error back-propagation
algorithm for weighting adjustments.
The net was trained on data extracted from microelectrode studies of cats.
Input nodes were given values corresponding to the spectra of tones as they
would be after reflection within the cat's outer ear. Values at output nodes
corresponded to sound-source decisions as they occurred in superior colliculus
neurons. The model was also trained to be level invariant by randomizing
levels of training inputs.
Analysis of activity at hidden units showed that the cue most useful to
the net was the frequency of the first spectral notchpresumably created
by interference effects on pinna-reflected lower frequencies. For the net,
and presumably for the cat, the frequency of the first spectral notch specifies
elevation and azimuth uniquely (Neti and Young 1992, 3141).
Neti and Young summarize what they take to be the wider implications of
their model in this way:
It is important to notice that the first notch cue is an emergent property
of the network solutions. There are no constraints on the way in which models
combine information across frequency regions, except that combinations of
information across frequency is a key aspect of the transformations performed
by the models. This sort of information processing may be an important aspect
of complex auditory pattern recognition tasks in general, and the use of
spectral cues for sound localization may provide a convenient, straightforward,
and easily interpreted paradigm for studying complex stimulus processing
in the auditory system (1992, 3154).
The most interesting suggestion I would draw from Neti and Young's model
is that we may be misled by thinking of neural elements of any auditory
map in too simple a way. Some and perhaps all of them may be working like
hidden units in a connectionist net; their function may be just to learn
to respond in whatever way works for the global input-ouput transform being
performed. They may not have functions that can be named in terms of the
elementary acoustic variables of time, frequency and intensity It may not
even be possible to name their individual or group functions in terms of
combinations or ratios of these variables.
The second interesting suggestion is of course that the auditory system
may be doing things more than one way at a time; it may be partially redundant.
Or maps higher in the auditory chain may be able, once trained, to configure
themselves independently, without input patterns extracted at brainstem
or midbrain maps. High-level maps might be able to use hidden units to extract
or create higher-order patterns solely on the basis of sideways connections
at their own level. (Auditory hallucinations come to mind.)
2. a connectionist delay line
Hopfield (1991) has demonstrated a connectionist net implementation
of a procedure for recognizing sequences: his particular net was a word
recognition processor that was able to correctly identify spoken names from
short-time spectral characteristics. His net had no clocked periods; the
circuit was data-driven. Recognition was at word-level, not phoneme-level.
Input to the net was from 32 centre-surround bandpass filters Each input
node was connected to each of 10 hidden units by means of ten delay lines,
each with a different delay. Weights on each delay line were made proportional
to just how much the presence of that frequency at that position in a sequence
would contribute to word recognition. The net was trained on data that included
noise, and it proved to be noise resistant.
Delay lines worked to translate a sequence of spectral features into a simultaneous
description by means of coincidence detector at the hidden unit level. Say
an input node reporting the presence of a frequency x simultaneously
activates delay lines with delay values of 1 to 10. Then another input node
reporting the presence of frequency y simultaneously activates its
own set of delay lines with values 1 to 10. The x activation in delay
line 2 would coincide at the detector receiving y activation by delay
line 1. And so on.
Hopfield's net (which he claims could be implemented on analog chips) successfully
recognizes words in its output vocabulary with nearly 100% success, and
it does so with connected speechi.e. with sound sequences that do not mark
the beginnings and ends of words. Hopfield has suggested further that the
net's recognition capabilities could be extended by the addition of units
that respond to abrupt onset/offset of spectral energies and that could
detect the local direction of formant energies.
Hopfield's speech recognition mechanism is a frequency x time position matrix.
Even more generally, it is a sequence detector: a time pattern detector.
Implemented as neural circuitry it could sort time patterns for various
purposes and on various scales.
3. within-neuron processes: connectionist model of bat FM-FM neurons
There has been a recent attempt (Chittajallu and Wong 1994)
to model the behavior of single delay-sensitive neurons in a bat's FM-FM
area by means of a connectionist net. The study used empirical data from
little brown bats whose FM pulse emissions are somewhat different from those
of the mustached bats.
The network configuration that was found most successful in modeling the
biological FM-FM neurons' overall behavior had two input nodes, 15 hidden
unit nodes and one output node. It was a fully connected feedforward net
with suboptimally set connection weights adjusted by means of an error back-prop
rule. Input values were of two kinds, pulse repetition rate and pulse-echo
delay times. The net was trained on a data set taken from microelectrode
studies of actual FM-FM neurons; output intended was the number of neural
spikes actually observed under various input conditions.
Biological neurons were observed to have alternative operating states. When
the bat was in search mode and its pulse repetition rate was low, it responded
to some extent to a fairly wide range of echo delaysits delay-width profile
was quite widely spread around the best-delay value at which it responds
maximally. But when the bat was in approach mode and its pulse repetition
rate was high, its delay width window became narrowerit became much more
finely tuned to its best-delay value. The connectionist net was able to
replicate this shift in response with operating modality.
This connectionist model is interesting mainly as a picture of a kind of
complex filtering that 1) could be LEARNED by a neuron purely in response
to correlated patterns of input; and that 2) can shift some but not all
aspects of its behavior in relation to large or mode shifts in just one
of its input parameters.
In other words, it shows how a neuron given particular combinations of input
can show responses different from others of similar kinds, purely on the
basis of the very particular combinations of input it receives. But neurons
in a connected array receive systematically different input simply in virtue
of their position in that array.
It also shows how an overall difference to one or more of the map's input
patterns can reset the operating mode of all the neurons in the array, so
that the array is doing something slightly different, or doing it in a different
way. An array of neurons similar to the one modeled above could have an
ability to adapt its behavior to search and approach conditions. But we
could imagine cortical maps able to switch their operating modes in other
ways tooas between perceiving and imagining conditions, for instance, or
in different sorts of context. Chittajallu and Wong's connectionist model
can be seen as a model of a neuron's, or a circuit's, or a map's, necessary
Certain kinds of sound cue have values that will have to change as an animal
matures. Pinna filtered first notch frequency values as used by the cat
for monaural sound source localization will shift as the shape of the cat's
ear changes, for example. The precise organization of the cat's superior
colliculus sound source map will presumably have to reorganize itself as
the ear grows, in the same gradual way that the owl's azimuth map adjusted
to visual shift.
There is evidence also that auditory neurons in auditory maps will change
their response characteristics as a result of increased or specialized practice.
Merzenich et al (1993) report that the tonotopic map in adult owl monkey
primary auditory cortex changes its organization significantly after the
animal has been trained to make fine frequency discriminations. The surface
area of the map responding to frequencies used in training increases; response
latencies to these frequencies decrease; and individual neurons are more
VI. SUGA'S MULTIFUNCTIONAL ELEMENTS
In a recent paper (1994) Nobuo Suga, who has been responsible
for the bat studies that give us our most detailed picture of nervous system
auditory function, says he has begun to believe that our functional descriptions
of auditory maps have been too simple minded:
A revision of the description of auditory processing in the bat's brain
is necessary with respect to whether each specialized cortical area is devoted
to processing just one aspect of an acoustic signal, because we know that
the response of a single cortical neuron, e.g., an FM-FM neuron, is affected
by several acoustic parameters other than echo delay, such as echo frequency,
Doppler shift, echo amplitude, relative target size, target direction azimuth
and elevation. Furthermore, the FM-FM neuron does respond to communication
sounds to some extent. Therefore, a single cortical neuron must be involved
in processing different types of auditory information. That is, it must
have multiple functions. If the neuron has multiple functions in signal
processing, some function may be a primary function and others may be secondary
functions. (1994, 137)
What this means in practice is that FM-FM neurons show their sharpest tuning
to echo delay but their response also shows systematic relation to echo
amplitude, echo source direction, and relative target velocity. The same
neuron will also respond weakly to high amplitude communicational calls.
It is possible, Suga says, that the function of a bat's auditory maps will
depend on whether the bat is in echolocation mode or in communication mode.
"This clearly indicates that the characterizations of single neurons
in our past experiments are partial" (1994, 136, 142).
I propose a `multifunction' theory which states that each specialized area
has a primary function for processing a particular aspect of an acoustic
signal and non-primary functions for processing others. The spatial distribution
of neural activity of this area depends upon its primary function, but the
magnitude of neural activity is influenced by its non-primary functions.
The discovery or acknowledgment of this sort of multifunctionality is the
most challenging aspect of recent auditory neuroscience. It is difficult
to picture a multifunctional processor's activity, because its units must
be imagined to be doing different things not only at the same time but in
the same space: in the same map. It becomes difficult to know what to call
what it's doing: is it `representing' hundreds of KINDS of `information'
at the same time?
Theodore Bullock of the Neurosciences Institute at UCSD talks in terms of
"multiple, simultaneous codes among the many parallel channels":
A given neuron, or even a given axon, probably conveys information at the
same time in two or more distinct codes for different recipient cells or
arrays. One post synaptic cell, for example, may read the number of spikes
in the last 0.5s, with a certain weighting for the most recent intervals,
while another post synaptic cell integrates over several seconds, and a
third is sensitive to so-called trophic factors or to irregularity of intervals
or to collaterals where impulses do not reach the terminals and only decremented,
graded, slow potentials carry information from this input channel. (Bullock
But I am not sure that talk of `codes' or even of `channels' is very useful.
What I do find useful is the approach being developed among connectionist
modelers, who say it is not necessary or even possible to say exactly what
`features' hidden units are `representing' or `detecting.' What is more
helpful is to know that such units can modify themselves to strengthen just
those connections that work for the animal. Perhaps a neuron in a bat's
cortical FM-FM area participatesto some degreein discriminating EVERYTHING
the animal is sensing at any given moment.