Language, Spatial Cognition, and Vision

  • Annette Herskovits


One essential function of language is to refer to objects and situations in the world. This process is mediated by nonlinguistic mental representations, most prominently by perceptual representations in different modalities. Human minds have the ability to establish systematic relationships between linguistic forms and perceptually based knowledge. This grounding of linguistic symbols in perceptual representations (Hamad, 1990), though often overlooked in linguistics and artificial intelligence, is essential to understanding linguistic abilities and linguistic structure. And a good way to examine it is to investigate our ability to talk about space; the spatial world seems amenable to precise and objective description — unlike, say, the world of smells and feelings — and much is known about visual and spatial perception.


Spatial Relation Spatial Representation Spatial Cognition Perceptual Representation Good Opposite 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Activities are one of the “aspectual classes”; the others are states, achievements, and accomplishments (Vendler, 1967). An activity extends over time (contrary to achievements, such as reach,which are punctual events), but does not specify a completion point (in contrast with accomplishments, such as cross in Jo crossed the road). Google Scholar
  2. 2.
    In Jackendoff’s conceptual structure (Jackendoff, 1983; 1990), phrases such as from the bridge are said to refer to trajectories. The motion prepositions are Path-functions that map a reference object onto a trajectory. It is actually impossible to define such functions; given a preposition, there is no way to assign a unique trajectory to every given reference object.Google Scholar
  3. 3.
    Examples preceded by # are acceptable, but do not have the intended intepretation. So The snake lay up the tree trunk is acceptable but with the whole snake located toward the top of the tree trunk.Google Scholar
  4. 4.
    The sentence is acceptable with the entire snake located past the stone. These examples are adapted from Talmy (1983).Google Scholar
  5. 5.
    The difference in acceptability may be due to Figure and Ground being body parts, which frequently are treated differently from other kinds of objects in spatial sentences (Herskovits, 1986).Google Scholar
  6. 6.
    Again, Talmy (1983) uses very similar examples. See also Talmy (1996) on fictive paths.Google Scholar
  7. 7.
    Sadalla et al. (1980) found that some locations in cognitive maps are anchors; other places are seen in relation to them. One of the facts associated with the role of anchor is that subjective distance is asymmetric; subjects judge the distance from A to B longer than the distance from B to A, if A is an anchor and B is not.Google Scholar
  8. 8.
    Inness is certainly often inferred rather than directly assessed; the location of every point of the Figure need not always be checked. So ascertaining that an object is in a room often only requires making sure it is visible.Google Scholar
  9. 9.
    Rock ( 1972, p. 671) defines perception “to mean what is `noted,’ `described,’ attended to, or apprehended about a figure, albeit unsconsciously and nonverbally.” There can be awareness without perception. Experiments show that perception, so defined, is necessary for memory formation. It must also be a condition for the formation of conceptual categories dividing the range of shapes (or motions) considered.Google Scholar
  10. 10.
    One sense of at entails a canonical interaction between Figure and Ground: Jane is at the desk. This sense can be extended to chairs but not to objects playing no role in the canonical interaction: The chair/*vase is at the deskGoogle Scholar
  11. 11.
    A large-scale environment is one whose structure is revealed by integrating local observations over time” (Kuipers, 1983, p. 347).Google Scholar
  12. 12.
    Marr’s 3-D model (1982) is hierarchical: the whole shape is divided into its “immediate” parts, which are in turn divided into parts, and so on down. The location of parts is represented only with respect to the entity immediately above in the hierarchy, which allows for stability in the representation of articulated objects: the representation of a finger will be with respect to a frame of reference attached to the hand, not to the whole body. The entire shape and each part have a model axis, which gives coarse information about length and orientation.Google Scholar
  13. 13.
    The list here is a revised and augmented version of a similar list in Herskovits (1986) where examples of application of the functions not considered in this chapter can be found. See also Section 6.5 for additional illustrations.Google Scholar
  14. 14.
    Hays (1987) uses the term coercion,from programming language theory, to indicate the “forced” matching of the argument(s) of a linguistic predicate to its selection restrictions. It is always associated with metonymy (Herskovits, 1986), since the actual arguments of the predicate are geometric constructs distinct from the primary referents of the complement noun phrases.Google Scholar
  15. 15.
    I will, for conciseness, talk about Figure and Ground in what follows, when actually meaning “coerced Figure” and “coerced Ground” — that is, the values of the applicable geometry selection functions (the actual arguments of the relation across). Google Scholar
  16. 16.
    Levinson (1994) makes a similar point, using examples from Tzeltal.Google Scholar
  17. 17.
    Niyogi (1995) proposes a model of the computation of spatial relations in which the location of the focus of attention itself serves as input to the “daemons” carrying out the computation; configuring, then, would involve moving the focus from one object to the other.Google Scholar
  18. 18.
    Egocentric relations are stable under projective transformations; so, from a given vantage point, right and left in three-dimensional space always correspond to right and left in the plane of view. We can easily judge whether two objects are to the right of another; they both appear on the same (right) side of it. By contrast, two objects to the intrinsic right of a TV could project in the plane of view right and left of the TV, given that the Figures are not required to be exactly on the axes.Google Scholar
  19. 19.
    But Levinson’s study of Guugu Yimidhirr (1993) shows that not all spatial thought supporting language use is without consequences for spatial thought outside language.Google Scholar
  20. 20.
    Chapman (1991) and Niyogi (1995) use visual routines in artificial intelligence models of linguistic abilities; they argue against vision models that involve processing an entire retinal image, and assume that relational knowledge is computed only when needed by higher-level cognitive processes.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 1997

Authors and Affiliations

  • Annette Herskovits

There are no affiliations available

Personalised recommendations