THE ANALOG/DIGITAL DISTINCTION IN THE PHILOSOPHY OF MIND
IV. Thermodynamic Functionalism
Connectionist computation has historical and technological links with both analog and digital computers. There are connectionist technologies which are called analog by their designers, and others called digital by their designers. But I will argue that connectionist computation is a new, third, kind of computation, one whose links with neuroscience make it more than a synthesis of its two forebears. Its most important carryover from analog computation is the form taken by its cultures of description, which are not logical/linguistic but mathematical/physical. This is so even where the connectionist processes described are discrete; and this interesting turn of events may in the end allow us to look at digital computers, and their codes, in new ways.
In what follows I will give connectionist technologies a brief introduction, describe one neuroscientist's vision of the brain as a connection machine, and outline some of the epistemological possibilities of this sort of vision. I will argue that the representational states of brains (and of connection machines to a lesser degree) may be thought of as having a pre- or sub- or non-linguistic semantics; that the mathematical/physical languages of description allow this sort of content to be conceived of in psychologically interesting ways; and that non-linguistic content becomes plausible when representational states have intrinsic content.
IV.1 Intrinsic content
Analog computers, like digital computers, are devices whose computational states represent because we have designed them to do so. Just what is taken as being represented at any time will depend on an interpretation function we are applying to the terms of an uninterpreted formalism, often by means of automated output interface devices. It may be anything we like, as long as the machine's inferential systematicities and our encoding/decoding systematicities preserve representational relevance over physical changes in the machine. In other words, the representational states of analog and digital machines have extrinsic content--content which is supplied by the machine's users, in the same way it is supplied to pencil marks or morse clicks. That it is extrinsic is obvious to the extent that it is arbitrary.
Creature representation, on the other hand, is neither arbitrary nor extrinsic. The computational states of a cat watching a bird are what they are precisely because they are the computational states of a cat watching a bird. They are the states of a cat's brain; a seal's brain will not have just this sort of connectivity between hypothalamus and visual cortex and motor cortex. There could be other sorts of brains than there are, but they would have to be part of other sorts of bodies than there are. Brains co-evolve with bodies, they are the brains of precisely the kind of body they organize: they are, in some encompassing way, the brains of bodies evolved on precisely this kind of planet, with this gravitational force and this rotational period; and they are, individually, in some entirely specific way, the brain of some individual creature with a particular genetic make-up, a particular developmental and experiential history, and a particular momentary situation. They are organs which provide their creature with flexibility of behavior, but they are not general-purpose machines.
If we were perverse enough, we could, of course, use them as general purpose machines. Given a detailed enough map of the causal interrelations of parts of a cat's brain, say, and given microelectrodes accurately enough implanted, we could use the causal systematicities of the animal brain to compute mathematical functions or to provide a coarse-grained functional analog of some other physical system. Like any other causally systematic physical processes, brain processes could be supplied with extrinsic representational content - content read into physical processes.
Intrinsic content would be the kind of content a cat's representational states have as a consequence of functioning in and for the cat. Both conscious and nonconscious cat-states can have this sort of content, and they will have it in virtue both of the materials and of the organization of materials of the physical cat-brain. I am assuming an evolution-based causal semantics of some sort here - cat functional architecture will be something like truth-discovering and truth-preserving for evolutionary reasons. This will apply both to hard-wired representational capacities and to flexible capacities modifiable with experience.
I have said that the representational states of neither analog nor digital machines have intrinsic content of this sort. Our ability to see them as representing at all is derived from our own primary, fully intrinsic representational ability, which allows us to design interpretation functions for formal systems or to set up analog components in such a way that their computational results will interest us. So both analog and digital computers have a crucial inadequacy as pictures of creature computation. They are both pictures of physical systems whose states are subject to systematic transformations that may be read as having representational relevance, but neither of them - if we rule out programmer-gods - provide any sense of what makes a brain-state a representational state in the first place. The most important thing about connectionist computational architectures is that they provide, not yet a picture, but a hint of a possibility of a picture, of how the states of a creature's central nervous system might be able to have intrinsic content.
IV.2 Connection machines
Sometimes called synthetic neural systems or neuromorphic computers, connection machines, like ordinary analog or digital machines, are given mathematical descriptions which can then be implemented in different ways. Hardware development of physically parallel machines is complicated, and a number of the more famous connectionist designs have been simulated on von Neumann architectures. Parallel machines also may resemble digital machines in having bistable switching elements whose operation is modeled in discrete mathematics. Or they may resemble analog machines in having continuously varying computational nodes. Both sorts of circuit have been implemented in very large scale integrated circuits on chips or wafers.
Connectionist nets with bistable switches at the nodes may, like McCulloch and Pitts' l943 design, execute boolean functions. Their operation may be deterministic, as in certain kinds of Hopfield nets, or it may be stochastic, as in Boltzmann machines. Processing operations may be synchronous or asynchronous. There are designs in which each processing element functions as a miniature digital computer, with router, memory, control unit, accumulator and synchronization by an external clock. In chip implementations, the bistable elements are called LTFs, linear threshold functions, which are step functions, and which are identified with Turing machines. Hinton and Sejnowski's l983 Boltzmann machine is one of the connectionist designs simulated in digital hardware while a chip realization is being developed.
A Boltzmann machine is a system of stochastic binary units in which the computational processes are iterative and local (that is, they concern the repeated mutual adjustments of neighbouring units). The behavior of an individual unit is partly stochastic and partly determined by its weighted connections with its neighbours ... in the basic Boltzmann machine, the excitatory/inhibitory weights on the various connections are fixed. An optimal global equilibrium-state (more accurately: an optimal probability-distribution, made up of the probabilities of co-excitation of many pairs of neighbouring units) is obtained by a process comparable to that of minimizing the global energy of a physical system. (Boden, l988, 218)
The settling process Boden describes takes place in response to some input across an array of input nodes which are designed to respond to the presence or absence of specific input properties. The computational task of the machine is to arrive at an output decision which from our point of view is a correct response to the input. It can only do so by responding to systematicities among input features, and these systematicities can only take causal effect if there is some mechanism which responds differentially to various distributions of feature values. The mechanism provided by a Boltzmann machine is a system of weighted connections among middle-layer nodes: the machine will be wired in such a way that the presence or absence of a feature along with the presence or absence of the other features will determine processing output. No feature can determine an outcome by itself; its computational effect will always depend on the simultaneous copresence of other features. Excitatory and inhibitory weightings of connections will ensure that the copresence of certain features has more computational effect than that of others. Since each node is a threshold device, and since it is connected to a number of others, there will be a period of uncertainty as excitation propagates through the net. The node will respond when incoming excitation values surpass its threshold value, and fail to respond when they do not. Eventually the whole net will sort itself out, with some of the bistable switches remaining on, and some remaining off. This 'settling' of the net will be the machine's computational result. The process of settling into a stable state is likened to a thermodynamic process because Hinton and Sejnowski were able to describe it by means of the Boltzmann equations of thermodynamics:
A volume of gas is a self-equilibrating system, and the net's arrival at a stable configuration of hidden unit switches is similarly seen as a self-equilibration of a physical system by means of energy transfer. As long as the Boltzmann machine is being simulated on a serial digital architecture, though, thermodynamic description is merely a mathematical structure implemented in a code which is implemented in programs running on a digital computer. We have seen that digital computers do simulate physical dynamical systems, and in this case it is simulating a dynamical system which is thought of as itself computing the result also arrived at by the digital computation.
Connection machines implemented using continuous dynamics have processing elements that can be considered operational amplifiers, and their connection weights are implemented by means of resistors. In chip implementations they will be SAFs, sigmoidal semilinear activation functions. Processing elements will demonstrate certain gain ratios, and they may be stochastic or deterministic, synchronous or asynchronous in operation.
Hopfield is credited with inspiring the first analog chip implementations of Hopfield nets around l982 (Akers, et al., 1989, l42). An analog chip version of a Hopfield net has continuous and deterministic response functions, and its circuit equations are thus coupled differential equations. The processing operations of the chip may also be given a further, dynamical, description. Its connection matrix is symmetrical, and "his symmetry, together with the asynchronous execution protocol, allows the computational dynamics to be described as a relaxation process in which Lyapunov (or 'energy') function E is minimized" (Akers, et al.,1989, l53).
Analog chips have certain computational advantages over chips with bistable transistors. Exploiting the functions, like summation, which are one-step processes in analog circuits, reduces die area requirements and power consumption, because there are fewer computational steps overall. Analog circuits offer lower resolution, but with greater speed and density. Where hardware is used to implement learning networks, that is, where connection weights are modified as a result of error feedback, 'analog depth' accommodates incremental adjustments, which makes the system more sensitive and allows it to settle into a more global, stable solution. This latter characteristic has made analog chips efficient at generating good, if not perfect, solutions to computationally complex problems such as the traveling salesman problem. Commercial uses have included voice recognition and vision systems. Like analog computers, analog connection machines have no trouble with nonlinearities, and they can be used to model the dynamics of nonlinear systems such as fluid flow. The connectionist machine is itself thought to be a nonlinear dynamical system, which evolves toward solution states that may be called attractors.
Connection machines, then, have a historical dependency on the digital technologies we use to explore their capabilities. Their overall complexity is more like the complexity of digital hardware than it is like the relative simplicity of analog computers. Some connection machines incorporate digital systems of memory and control; some connection machines involve binary states satisfying discrete forms of mathematical equations. And connection machines are like analog computers in being parallel processors which evolve toward solutions, and sometimes in having processing components which implement continuous functions. The more important similarity, as I have said, is the sort of description given processing states of the machine - descriptions given not in terms of syntactic elements and rules, but in terms of system state space vectors, thermodynamic equilibria and the like.
How are connection machines also unlike analog and digital machines? What could justify Smolensky's guess that they may lead to "a new theory" of computation?
(1) 'Settling', 'relaxation', 'cooperation' - the connectionist style of computation as typified by the Boltzmann machine - allows weighted consideration of many features at once. Where multiple layers are present, the cascading of activation from one layer to the next provides for something like a registration of higher-order relations among features as well: nodes of the first hidden unit layer can register relations among relations of first-order features. After five or six such mappings from layer to layer, activations arriving at output nodes may reflect responses to input features which are very abstract indeed. This logical depth of relativized complexity is given in what can be seen as a single step, with no need for an external control function to schedule the individual transitions.
(2) The scale of parallelism of connection machines is envisaged by ultra large scale integrated chip designers as heading toward a billion connections (Akers et al., 1989, l49).
(3) The processing effects of any component of analog and digital machines are understood in principle--we are able to say what contribution it is making to a computational result. With connection machines we are not certain what contribution is being made by any node. There are empirical ways to try to discover its effect after the event, but we do not design every detail of the computational process. And the failure of a component is not critical in connectionist computing, whereas failures of single components in analog or digital machines may result in a failure of the whole process.
(4) Thresholds in connectionist circuits are thought of differently than they are in digital machines. This is true especially where overall processing is stochastic - where the effect of the individual node's particular binary state determines nothing by itself, and decisions upstream are reached by a kind of overall averaging process. Thresholds, in other words, have statistical not logical effect, and thus we are not as tempted to think of them assigning truth values to symbols in a calculus. Quantized signal-packets are discrete, but their effect is not differentiated with respect to the computational outcome.
(5) Standard programming, with its top-down chain of command through intermediate programming languages, is not possible for connection machines. The notion of programming is replaced by several sorts of hardware organization. The first sort is the design of connections, which may be local or global, which may be one- or many-layered, which may allow for symmetrical or asymmetrical activation, which may be uniformly or differentially weighted, which may allow for feedback as well as feed-forward transmission, and which may have fixed or modifiable gain ratios. Where gain ratios or weights between nodes are modifiable a second sort of hardware organization is effected: the organization of the machine, over time, by means of sample input-and-feedback pairings. The fabric of the machine itself changes with what we're tempted to call experience - a result which is not self-programming, but a sort of self-structuring in concert with given environmental structures. We have a hardware which becomes more dedicated over time--customized by, and hence more attuned to, its conditions.
(6) Connection machines have, from their beginnings (which coincided with the beginnings of analog and digital computers in the 1940s and 50s), been built with some eye to neural plausibility. David Rumelhart of the PDP Research Group makes this motivation explicit:
IV.3 Holonomic brain theory
Can a new theory of connectionist processing lead us to a "brain metaphor of mind"? In von Neumann's day it was thought that the computationally relevant behavior of neurons was their thresholded propagation of spike trains. More recent theories, such as that of Poggio et al., give an analog account of neurons as circuit elements. In any case, the brain does consist of massively interconnected nodes in multilayered arrays. If certain key features of connectionist processing operations are also present - features such as cooperative effect, abstractive cascading, and modifiable connection weights - then connectionist theory may be a good base for the new disciplines of computational neuroscience.
Karl Pribram, the bold and eminent neuroscientist who published his controversial theory of holographic memory in the early 70s, has placed his latest bets on connectionism. In his (1991) he supports what he calls holonomic brain theory by means of notions drawn from connectionist sources. His account claims to be an account of figural perception in the first instance, but wider applications are obviously intended. I am going to recount his story of the computational brain in some detail, both because it is a very energized vision of connectionist possibilities, and because his forms of description - which are entirely nonsymbolic, nonlogical - support my thesis that connectionist computation refines the sense of a cognitive alternative formerly offered by analog computation.
In the preface of his (1991) Pribram says explicitly that one of his motivations for the present work was a desire to update his holographic brain thesis in the light of the emergence in AI of parallel distributed processing architectures.
These 'neural networks' or 'connectionist' models are similar to OCCAM, a content-addressable computational model that we (Pribram, 1971; Spinelli, 1970) developed in the late 1960s, and stem directly from the content-addressable procedures that characterize optical information processing such as holography (see e.g., Hinton, 1979; Willshaw, 1981). (Pribram,1991, xvi)
That connectionist models "stem directly" from the content-addressable procedures characterizing holographic optical processing is a claim based on mathematical grounds. The representing structures employed by optical versions of connectionist architectures are, in fact, holograms. But the larger vision underlying Pribram's equation of connectionist and holographic processing is a vision of the two processes as forms of pattern matching in which superposed waveforms result in interference patterns that are both computational results and representational structures.
Pribram sees neural events as involving a complex interplay of discrete and continuous processes. The axon hillock is a thresholding device which provides all-or-none propagation of action potentials down axonal fibres which are a nerve cell's output device. Axon fibres tend to be relatively long and they may communicate with fairly remote regions of the nervous system. These impulse discharges or spike trains spatially and temporally segment the results of the dendritic microprocess into discrete packets for communication and control of other levels of processing. These packets are more resistant to degradation and interference than the graded microprocess. They constitute the channels of communication of the processing element.
So Pribram makes a distinction between neural processing and a communication of the results of processing which also acts to organize computational activity over various levels and locations in the brain. Parallelism, convergence and divergence of axon fibres result in the propagation of spatially and temporally patterned trains of discrete bursts of activation, and Pribram, like von Neumann, thinks of these packets as digital, and as code elements which "communicate orders." Still, the system as a whole cannot be seen as digital because there is no through transmission of digital signals. At every synapse the arriving pulse will decrement into the local slow potential. The digital message is received by being convolved with analog processes. On the far side of the receiving neuron other discrete impulses will be propagated. Incoming pulses will influence the frequency of outgoing pulses, but it cannot be said that incoming pulses are reconstituted beyond the soma--only that their message has been taken under consideration.
When Pribram speaks of 'code' here, what he has in mind is not something primarily language-like. In his (1971) he mentions a number of different characteristics of 'impulse coding' which may have computational effect. They involve both spatial and temporal variations, and may include duration of bursts, overall probability of firing, variations in this probability, incrementing or decrementing of this probability, rate of change of this probability, shifts in latency, the spatial distribution of trains across arrays of fibres, and differences of timing among neighbouring ensembles. 'Codes' can be read into the computation effectiveness of such patterns of activation, just because their elements are disjoint--we can say "that pattern of bursts is telling these neurons that X" - or it is telling them to do X - but this is a manner of speech and it is redundant; whether or not we think of burst patterns as coded information is irrelevant to their causal effect. The temptation to call them code comes from the facts that they are patterns of discrete elements, and that their patterned nature has systematic representational import (of some kind). Their largest difference from codes as usually conceived is that they are not limited to one-dimensional strings but involve both space and time variations as well as their derivatives. And there is some sort of proportionality, or multidimensional simultaneous proportionalities, between the input and output of neural transduction. Another way to say this is that neural coding - if we want to call it that - involves an encoding function intrinsic to the machine that implements it, and that it may therefore operate outside of the considerations of uniformity, simplicity and seriality that determine our code designs.
Pribram also sees the packeted nature of axonal discharge as providing a necessary linearity to processes taking place at the synapses. Wave to pulse conversion at trigger zones in the axon is thought to be nonlinear, that is, it is thought to be the result of nonlinear threshold functions. In constrast, pulse to wave conversion - the incoming effect of pulses arriving at junctions - is thought to be a linear function, multiplication by a constant. And the linearity provided to junctional microprocesses by discrete incoming activation packets allows them to be described by means of Huygen's principles of wave propagation, the linear principles which underlie holographic theory. Then operations of filtering, integration and transmission can be descibed by linear differential equations and the "locus of entry of nonlinearities can be identified without jeopardizing the advantages that accrue to the overall linearity of the operation of the brain systems involved in configuring percepts" (Pribram, 1991,19).
Dendrites are the nerve cell's input devices, and Pribram takes the electrical and chemical microprocesses surrounding dendritic junctures with axons or other dendrites as constituting the local circuits which effect actual neuronal processing. These microprocesses are the analog processes described by Poggio et al., processes which involve the release and absorption of thirty-odd sorts of neurotransmitters with different sorts and rates and ranges of chemical effect, the mitigating influence of various enzymes, and various degrees of unmyelinated dendrodendritic contact within a feltwork of dendritic fibres. The overall electrical effect of these processes is the creation of patterns of charge density distribution in the tissue matrix within and between postsynaptic dendritic membranes. These charge densities are temporary microstructures, steady states of polarization amplitude which do not propagate and which Pribram calls slow potentials. Slow potential distributions across ensembles of synapses are, for Pribram, the actual neural locus of computational processing.
Pribram calls these "volumes of isopotential contours" holoscapes, which may be thought of as standing waves of activation, and which constitute the nervous system's representational medium. So representation, for Pribram, is inherently spatial: it consists of topological, configural, temporarily self-maintaining electrical structures, which are induced by a combination of genetic and experiential factors. Synaptic characteristics, like connectionist weights, are thought to be modifiable as a result of practice, but they may also be modified as a result of non-informational events such as vitamin deficiency or a full moon. There is no hint here of Pylyshyn's wedge between biological and informational causes.
Pribram's guess is that holoscapes - isopotential contours of slow potential microstructure - are the "physical correlates of a percept", and that cortical interference patterns are "coordinate with awareness". His guess has received interesting support from Freeman's connectionist-inspired work on olofaction, which has found that the "identity of an odorant is reliably discernible only in the bulbwide spatial pattern of the carrier-wave amplitude", and that "as long as we do not alter the animal's training, the same map emerges each time an animal sniffs a particular odorant, even though the carrier wave differs with each sniff" (Freeman, 1989, 80).
I will add that, where perceptual processing is involved, Pribram thinks of neuronal states as presentations rather than representations. The nervous system is presenting a perceptual situation to the organism. Processing hierarchies may then re-present processing results to other computational stages, and this re-presentation will involve some form of systematic transformation. Pribram's notion of perceptual presentation is compatible with Gibson's theory of direct perception, and Pribram's description of neuronal processing as frequency filtering can serve to fill out Gibson's notion that brain structures "resonate with" the higher-order properties of perceptual stimuli. (It is also interesting to notice a historical relation between connectionist and direct theories: Dreyfus tells us (1988, 330) that Gibson and Rosenblatt coauthored a paper in 1955.)
Pribram thinks of all neural processing as pattern matching, because it is the outcome of a superposition of two patterns:
When dendritic microprocesses are generated there will be horizontal cooperativity in
These reciprocal interactions will be pattern matching because they convolve incoming activation patterns with resident microstructure. Microstructure generated by nerve impulse arrival interacts with what is present in virtue of pacing and previous configuration; the resulting interference patterns "act as analog cross-correlation devices to produce new figures from which the patterns of departure of nerve impulses are generated" (1971, 105). In other words, slow potential structure computes "both the spatial neighbourhood interactions among neural elements, and, to some extent, the temporal interactions over a range of sites" (1971, 18). What is passed on with axonal firing patterns will thus be the effects of reinforcement and occlusion at intersections of resident and newly evoked wavefronts. Mathematically, this sort of transformation is - like holographic processes - a filtering operation, implemented by means of parallel connections among cooperating analog circuit elements.
Along with a wish to develop a connectionist version of holographic neural processing Pribram expresses a
As a consequence, holonomic brain theory offers a set of mathematical models of various aspects of neural function. Pribram says his theory is a form of probabilistic functionalism, and by this he means what, in relation to connection machines, I have called thermodynamic functionalism. Pribram makes it clear that thermodynamic minimization is a metaphor for another sort of global self-stabilization which he calls entropy, rather than energy, minimization. Hamiltonians, principles of least action, define paths in a Hilbert Space. Applied to statistical mechanics, Hamiltonians become operations which achieve minimization of uncertainty. The system settles into a state in which energy is maximally ordered - redundancies and correlations extracted, structure distinguished from noise. The resulting state will embody a maximum number of informational constraints; it will be maximally coherent. I will say more about constraint satisfaction and representational coherence in the next section. Here I only want to add that the phase space Pribram has adopted to describe the overall configuration of neural microprocesses is a 4-dimensional space providing for two coordinates in the spectral domain (this involves Fourier transforms of wave properties) and two spacetime coordinates required by the inherent spatiality of brain processing.
In concluding this description of holonomic brain theory I want to point our that computational neuroscience, as outlined by Pribram, does not envision an incommensurability between physical and cognitive language of description.
Because the computations envisioned are mathematical functions over physical states, and not, in the first instance, logical computations over task domain concepts, it is thought that physical reduction of cognitive terms may be effected bottom-up. This belief is based on a naturalist assumption that all cognitive states have intrinsic content as a function of the physical microprocesses that construct and re-construct them.
IV.4 Epistemological suggestions
Logical functionalism thinks of high-level computations over internal representations as being like the sorts of formal operations we perform on external representations - sequential reasoning by means of deductive chains, categorization by means of definitions which are sets of necessary and sufficient conditions. Any cognitive performance - any actual cognitive achievement - is thought to be effected by competence-structures which are procedures or rules having the same logical/linguistic form as instructions or rules articulated by people communicating to each other or to computers. These ways of conceiving of cognitive behavior perpetuate a form of mind/body division: languages of description based on the objects and procedures of our task domains seem to be largely incommensurable with languages developed to describe the physical behaviors of biological bodies. Classical cognitive science, inasmuch as it wants to consider cognitive behavior as symbol-using behavior, is motivated to try to drive a wedge between cognitive and non-cognitive functions in the brain. It isn't possible to keep this sort of wedge in place unless we want to think of human bodies as the residences or vehicles of angels. If we are biological beings, then cognition just is a physical function. In this section I will outline briefly some of the epistemological suggestions a thermodynamic functionalism can offer. In chapter V I will go on to discuss the relations of connectionism to codes and to languages of description.
We have seen how, in a connection machine, a representational state is a stabilized pattern of node values, and how, in holonomic brain theory, a representational state is a temporary stability in the holoscape of dendritic microprocesses. We have also seen that both sorts of pattern can be given a geometrical representation as positions in some sort of phase space whose (orthogonal or nonorthogonal) coordinates are the dimensions along which node values may vary. Paul Churchland takes the additional step of simply identifying creature representation with its geometrical description: "Our principle form of representation is the high-dimensional activation vector" (1989, ---). This identification allows him to make connectionist suggestions about the nature of categorization, generalization, and abstraction - about the inheritance by classes of properties of their superclasses, and the generalization to superclasses of the properties of classes.
Where a connectionist net is being trained to respond differentially to sample inputs, classification is automatic. A certain range of combinations of features will result in one output decision, and another range of combinations will result in another output decision. It can be said that training has partitioned the net's weight space. Within any partition, every combination of features will be represented as a single point and, moreover, the geometrical relation between point locations in state space will embody a similarity metric (P.M. Churchland, 1989, 102) A similarity space of this kind may eventually be identified with some phenomenal domain: olofactory space, color space, tonal space, motor control space. Using this geometrical formalism allows us to think of categorization as multi-dimensional and scalar.
Categorization is a form of generalization: somewhat different stimuli are given a similar response, whether this response is a name or an action or an internal processing decision. If we have layered nets we can think of partitions as being subsumed within larger partitions: where the net is trained to respond differentially to the more inclusive hypervolume, the subvolume will inherit the properties - i.e. the downstream decisions - of the superclass. Connectionist geometry also suggests how successive identifications might work. A blindfolded gardener is offered something to smell. First sniff: "It's a rose". Second sniff: "It"s an old rugosa of some kind". Third sniff: "I think it's Blanc Double de Coubert". Here the net would be fine-tuning itself, involving a finer sub-partition with each trial. And a prototype of a class can be seen as a central subvolume in a partitioned space, the volume that is most similar - on the largest number of axes - to the largest number of training instances. Donald Hebb speaks about categorization in something like these terms: he is guessing that the activity of a trained cell-assembly is "representative of that combination of events that individually organized the first-order assemblies" (1980, --) and that abstraction has to do with the practice-based organization of downstream cell-assemblies responding selectively to commonalities of first-order assemblies.
So the construction of weight distributions in a net is already a cognitive activity: it contributes to present experience an order achieved over previous experience. A trained net rapidly configures itself into the simplest, most coherent pattern consistent with input features. This response can be seen as a form of abduction - rapid inference to a best explanation in the light of species interests and individual experience. This sort of inference can be seen as semantics-driven, where calculus-plus-proof-procedural schemes of the symbolic paradigm are syntax-driven. We do not need to posit symbols because we have representational structures with intrinsic semantics - structures that satisfy logical constraints in virtue of what and where they are. The construction of a representation of an individual or a class will automatically make explicit a simultaneous configuration of relations to other individuals or classes. If we think of node values as embodying representational microfeatures, net-training will result in feature-representation that gives us ready-made the interdependencies among input features. These interdependencies are the systems of constraints a representation will have to satisfy. Units that are on together in a trained response to input will define consistency for that input. Units that are on together over larger regions, or that activate one another in trained cascades, will define consistency for larger cognitive territories. Trained sequences can be seen as giving us a form of induction: if X, then Y, for most or all of the instances encountered. Inference by activation cascades through a structured net gives us a sense of reasoning as a skill like other sorts of practical skill - a sensitive, multidimensional equilibration in the midst of complex inner and outer conditions. A matter not of flawless formal sequencing, but of considering many factors at once, hanging out in the centre of a possibility space until a solution achieves itself. George Lakoff puts it this way:
Classical cognitivists will reply that if philosophy does change it will not be for the better: the intuitive arts may do for basketball or needlework, noncognitive stuff having to do with bodies, but the banner capacities of human rationality - the ones that make us better than women and children and animals - must be explained in terms of deductions from axioms, recursive operations on symbols.
Well, it is often observed that axiomatization comes late in a game. We know how to know things long before we know how to say how we know how to know things. But it is true that connectionism is better at explaining simultaneous capacities like perception than it is at explaining recursive serial capacities like sentence generation. And connectionism does owe us an explanation of language. We may not be very good at handling the sort of recursion that occurs in the second sentence of this paragraph , but we do often 'think in sentences' and hear ourselves doing so.
(Classical cognitivists argue for a linguistic description of cognition as symbol-using on the basis that the systematicity and productivity evinced in cognitive processes can only be explained if brains are code-using in the ways digital computers may be seen as code-using. A connectionist can reply that the systematicity and productivity displayed by human cognition is seen to be very limited when we compare it to the recursive abilities of digital computers, and that this is why we seek their help for tasks requiring unlimited numbers of recursive steps. Secondly, elementary systematicity and productivity are normal features of social and sensory-motor behavior, and are not special to linguistic skills. And thirdly, the more advanced sorts of recursion we do display seem to have to be supported by internalization of external codes and other cognitive prostheses such as diagrams.)
If we think of creature brains as representing and computing in virtue of the organization of their materials, and if we think of this organization as having semantic content intrinsically, in virtue of its causal relations with a larger world, then we have no reason to think that only linguistic states have semantic content. We can imagine language functions as requiring an output transducer which maps certain intrinsically representational states onto sets of output decisions which result in the utterances of words and sentences. This is not a transduction from a physical domain to a nonphysical domain, of course - it is a transduction from brain behavior to muscle behavior. Patricia Churchland suggests that we can think of output transduction into language as the convenient provision of a precis of idiosyncratic neural complexity. She quotes C.A.Hooker to the effect that "Language will surely be seen as a surface abstraction of much richer, more generalized processes in the cortex, a convenient condensation fed to the tongue and hand for social purposes" (Hooker, cited in P.S. Churchland, 1986, 396). The implication is that our sentences need not be seen as expressing the whole of our thought - there is 'more behind them'. I will raise this possibility again when I reopen the discussion of codes in chapter V
If language is a convenient abstraction from "richer, more generalized processes in the cortex", what neural regularities can account for the regularities observed in sentences? Piaget speaks of repertoires of sensory-motor schemes, schemas that bind object-schemes to action-schemes. The noun-verb-noun form ubiquitous in human sentences does resemble ubiquitous forms of practical action: X does Y to Z. It is also not unusual for the order of elements in a sentence to reflect the order of some event, or the order in which it has come to be known. Piaget further posits schemes of assimilation by which typical cognitive sequences will be applied to new items or domains. Hebb's physiological suggestion here is that organized activity in fields and cascades of cell-assemblies will recruit other cell-assemblies active at the same time, and that this sort of recruitment results in a progressive binding-together of consistent representational content (1980, --). So linguistic output can reflect the structures of characteristic non-linguistic sequences.
Still, given that our own and other people's language also arrives as input, linguistic structure will also organize cognitive sequences. There will be a two-way interaction. Vygotsky (l962) offers a suggestive account of internalized speech which gradually, as we become more adept, becomes less social and more idiosyncratic, so that in the end the subjects of sentences are often dropped in favor of some sort of non-linguistic wave in the direction of an 'image' or other perceptual reactivation. When we 'catch ourselves thinking' this is the sort of thing we are not surprised to observe. In any case, the idea is that the convenient precis offered by language can be made available to one's own processes as well as those of other people, by being internalized as simulated inputs. These precis may also include internalizations of our uses of other cognitive prostheses such as diagrams, maps, written arithmetical calculations, or spoken instructions.
I will assume here a theory of imagining that takes it to be "brain activity in the absence of the thing thought about" (Hebb, 1980, --). There will be important cognitive advantages to be gained by being able to evoke an activation pattern in the absence of its usual cause. Pribram (1971, 370) speaks of a "repeated return to the configural form of a representation"as serving a rehearsal function and allowing the occurrence of additional distributions in memory - that is, of restimulating a representational configuration in order to make use of new connections with other representations. Hebb suggests that centrally activated simulations will allow us to hold or rework inputs, to set up new interactions among inputs, and to activate outputs in the absence of external stimuli - to give us cognitive flexibility, in short. Some restimulations may involve cell-assemblies quite close to the sensory periphery: "Ordinarily the cognitive operations operate back onto those that are sensory driven" (Pribram, 1991, xviii). Hebb suggests that some perceptual restimulation may be less inclusive "of lower-order representations" and then "the image of the whole, as a higher -order cell assembly activity, lacks detail but is still an image of the whole" (1980, 127). In other words, there may be different degrees of inclusion of the perceptual base in cognitive activity.
Piaget (in Piattelli-Palmarini, 1980, 165-67) relates simulation abilities to naming abilities by means of the development of what he calls the semiotic function. He posits a cognitive sequence which begins with imitation: a child imitates the sound of a name pronounced in the presence of the object named. The next step is deferred imitation: the child speaks the name when no one else is pronouncing it, but still in the presence of the object. Then comes symbolizing play: the child speaks the name in the absence of the object. And then full internalization of both the speech act and the perception of the object: the child mentally rehearses the name of the object together with a perceptual simulation of the object. Piaget stops there but we could go on to a short-circuited 'abstract' version where the physical instantiation of the mental rehearsal of the name cascades directly to the usual cognitive consequences of the activation of the object-assemblies.
So linguistic input will sometimes evoke and sequence - organize - non-linguistic representation and computation. And distinctive linguistic regularities - as well as distinctive cultural and social and practical regularities - can become the cascading habits of neural activation. It may be that regularities of these sorts, if centrally instantiated, can operate top-down by means of feedback which in effect assigns subroutines, or pre-tunes perceptual capacities, by providing a 'set' that tells the system how input should be taken. These centrally-imposed constraints need not be seen as linguistically imposed, but they may be.
This discussion of connectionism and language capabilities has sketched some of the ways connectionists can begin to try to account for abilities taken as central by classical cognitivists. They are hints, not theories, and all that needs to be drawn from them is the suggestion that we are not forced to explain sentential capabilities by means of internal representations that are themselves sentences.
A classical cognitivist may, at this point, say that it is those activation cascades that are organized into language-like sequences and centrally evoked that we are calling cognition proper; the rest is functional architecture. When we supply a story about connectionist language, do we open the door to cooption by logical functionalists who want to assign us to everlasting labour as implementers? Not if we can show that brain connectionism is cognitive from the bottom up - that even its microprocesses are semantic - and that language itself is a function that presupposes a deep reservoir of intrinsic creature content.
IV.5 Intrinsic content again
We have come a long way from 'analog' and 'digital'. I said in the introduction to chapter I that what makes nonsymbolic computation and representation possible in analog computers is not what makes it possible in creatures that construct their own representations in concert with a structured environment. Analog computers may operate as functional analogs to other physical systems if the causal systematicities present in their materials and organization can be modeled in a formalism which also models that other system. If the formalism concerned provides a detailed model of the causal interrelations of the working parts of the two physical systems, then the analog computer will be what I have called a working analog. In either case its nonsymbolic transformations will preserve representational relevance because causal systematicities are mapped onto inferential systematicities. So we can read the states of one physical system into the states of the other. But these representing states have only extrinsic content, content that depends on our provision of the semantic function that maps physical states onto our mediating formalism.
What is it that makes nonsymbolic computation possible in connection machines? Physically, connection machines can be seen as very complicated quasi-analog machines, whose computational systematicity stems directly from the physical organization of the machine, even when their component switches are bistable. So what makes representation and computation possible in connection machines is basically what makes it possible in ordinary analog machines. Connection machines are a step closer to intrinsic content in just one way--their physical causal systematicity is in some small degree provided by their self-organization in response to input samples. To this small extent they behave like something that is calibrated, not just to our purposes, but to their environment. This is also the key to nonsymbolic computation in brains, which are environment calibrated from top to bottom - or, I should say, from bottom to top.
Environment-calibration is not analogy: it is not a relation between physical systems both of which are modeled in some formal structure. Environment-calibration could be defined as computational efficacy constructed, phylogenetically and ontogenetically, by means of the structured nature of interactions with a world. Creature calibration is the cumulative self-organization which supplies and is built by interactive competence. It results in a structured creature, an environmentally configured animal. A creature structured in such a way may be seen as a global input-output transducer, because the whole nervous system can be seen as one very complicated circuit for delivering viable responses. Pylyshyn accuses both Gibson and the connectionists of concentrating on mere transduction. But we do not need to see the animal as either passive or non-cognitive in relation to the computation of its overall response. Intermediate stages in response computation may provide great stimulus-independence, great sensitivity to small variations, and they may well involve language areas of the brain. They can thus be seen as fully cognitive by those who care to highlight that distinction.
Direct theorists (see Michaels and Carello, 1981) point out that the concept of representation is logically eliminable from an account that gives us representations as the changed structure of a creature. Physically seen, a representation is that only inasmuch as it changes the flow of selections of alternatives in a brain. We do not really have to talk about the representation of features by nodes, but only about the response to features of nodes. This allows us to leave our psychological predicates to what I would think is their proper application as descriptions of whole persons--who do know, remember, compute, infer, classify, solve problems, test hypotheses, and represent. But we do not have a separate vocabulary for central processes and so we continue to apply a person-metaphor to brain processing. The largely unexamined presence of this metaphor may set up the classical cognitivist's greased slide from outer symbols to inner symbols.
Calibrational content is Patricia Churchland's term (1980, 8) for what I have been calling intrinsic content. She contrasts calibrational content with translational content, which is the content we assign to the signals of another organism on the strength of some systematic fit between their terms and ours. Translational content is made possible by the prior existence of calibrational content. Words and symbols can trigger structured responses because those structures are already there and are inherently semantic - whether by this we mean that they are certain sorts of conscious experience, or only that they have systematic causal relations with other structures. Pylyshyn claims (1986, 215) that "structural or topological constraints" do not "show through" in "representation-governed regularities' important to such activities as "solving cross-word puzzles, doing science, appreciating a novel". But this cannot be so, since structural and topological constraints are what give us the ability to respond to words and sentences with activation reconstructions that bind those words to perceptual experience and thus to environmental events - events which supply the pages of a novel with content, in other words.
There is a last question I should ask here. If ordinary analog and digital computers were able to modify their hardware as a result of regularities in input, would we want to ascribe calibrational content to their representational states? (I will keep the term, with due caution.) It is hard to draw a bead on the answers to this question, because the computational efficacy of both types of computers has depended on the fact that we can count on them not to change their hardware in response to regularities in the data. If they spontaneously modified their connections or binary states we would think they were malfunctioning. If an analog computer modified its resistances or capacitances as a result of a series of inputs (inputs we feed it), then it would no longer be an analog of whatever domain we want to investigate. It will be describable by some other formalism - by some equation we will have to try to extrapolate from our data. What if the analog computer gets its input settings via sensors that link it directly to the domain we want to investigate? What if its input comes from measuring devices at various points of our bridge and flood system? The answer remains the same: unless the physical laws by which we have described the dynamics of our physical system are changing, input-induced hardware alteration will constitute failure of the machine. If we want states of the machine to remain representationally relevant, our hardware must not flex.
What about digital machines? Chips are being developed which have some limited measure of ability to modify themselves. (This involves many-layered wiring with silicon interlayers that allow burn-through in selected locations; when current is present in both upper and lower wires, a short-circuit results, and this short serves to wire in an item of read-only memory store.) Will we want to grant their states intrinsic content? This sort of structural change is permanent, and to the extent that it occurs, it will create a dedicated rather than a general purpose machine. But this is not the point. The point here is that this digital machine is still running on top-down semantics, with certain slots left open for 'environmental' input. The machine is not importantly reconfigured by its input.
What if there are more general rewirings? We have several possibilities. They may result in what the compiler takes to be syntactic errors--errors in the combining and sequencing of what it is taking as syntactic elements. In this case the compiler will report the error and stop the machine, or it will enter some default state that works around the error and continues. The other possibility is that the 'error' is not a syntactic error but a semantic error - an error that does not violate the rules of a programming language, but that does enter as data some value that throws our deductions out of whack. Again, we will say the machine is not working properly.
What if our digital machine reorders itself comprehensively, in a globally systematic way? In this case we would say it has been recompiled. It is now implementing some other language than we had originally compiled it to run. And we will have no way to interpret this language. A digital computer with intrinsic content would be a maverick, a runaway, absolutely unuseable.
There are two suggestions to be extracted from these speculations about environment-calibrated content in analog and digital machines. One is that creatures must be environment-calibrated in a thorough-going way if they are environment-calibrated at all. Their whole machine, with its combination of rigidities and flexibilities, must work as an ensemble. The other suggestion is that, where computation is code-mediated, content must be assigned in a top-down fashion and then compiled right down to the floor - otherwise we are left without the key that allows us to decode our results.