Encyclopedia of Animal Cognition and Behavior

Living Edition
| Editors: Jennifer Vonk, Todd Shackelford

Depth Perception

  • Emel GençerEmail author
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-47829-6_1397-1


Auditory perception; Binocular vision; Depth cues; Depth information; Gibsonian approach to depth; Haptic perception; Motion parallax; Motion perception; Perception for action; Shape perception; Space perception; Tactile perception; Three-dimensional (3D) perception; Two-dimensional (2D) retinal images

Edwin Abbott Abbott takes us to a journey into the two-dimensional world in his 1884 novella, “Flatland: A Romance of Many Dimensions.” The third dimension, commonly considered as depth, is missed physically as well as perceptually in Flatland; besides that perception of the second dimension for the dwellers of Flatland is at some point between nowhere to bind. In fact, this situation could be generalized as impracticability of perceiving all the dimensions precisely in a physical environment unless you get out of all those dimensions composed the environment and look from a new dimension. For instance, imagine a piece of paper is resembling Abbott’s Flatland. While looking from the third dimension, all dimensions of a geometrical shape (i.e., triangle, rectangle, or circle) drawn on the paper, so the objects and residents belong to Flatland, are evident to us. Thus we can comfortably define the shape drawn on the paper and tell its location along with the direction. Unfortunately, it is not as easy for Square, a resident of Flatland who gets stuck in two dimensions, as it is for us. Square, the narrator of the book, makes the point as follows:

You, who are blessed with shade as well as light, you, who are gifted with two eyes, endowed with a knowledge of perspective, and charmed with the enjoyment of various colours, you, who can actually see an angle, and contemplate the complete circumference of a circle in the happy region of the Three Dimensions – how shall I make clear to you the extreme difficulty which we in Flatland experience in recognizing one another's configuration? … All beings in Flatland, animate or inanimate, no matter what their form, present to our view the same, or nearly the same, appearance, viz, that of a straight Line. How then can one be distinguished from another, where all appear the same? (Abbott 1884, p. 19)

Returning back from such a journey to our homeland, we live in a physical environment (or world) consists of three-dimensions: commonly called width, height, and depth. The spatial locations of all the objects in our environment and their relative positions to each other, and their shapes are defined in terms of a three-dimensional structure. Perceiving our environment in this way allows us to accurately recognize the shapes of objects and surfaces surrounds us, as well as successfully plan and coordinate the various actions we perform in everyday life. If the third dimension, also known as depth, were not available to our perception, tasks as simple as walking on the ground, grasping a coffee cup, coordinating a knife while chopping or a fork while eating spaghetti, driving a car, catching a ball, threading a needle, reaching out for hugging or kissing a loved one, and so on, would be a pain in the neck. Lack of depth perception would even have vital consequences: for instance, without depth perception you cannot discern whether you are about to step on or jump over a dangerous deep hole or simply a manhole cover, exact shapes of surfaces and objects would not be available. Without depth perception, you cannot accurately estimate whether the approaching car will crash you before you safely cross a road, in fact, you cannot even cross a road, direction of navigations would not be available. Without depth perception, a frog cannot catch flies with its tongue while starving, planning and coordinating actions would not be available.

Unlike occupants of Flatland, animals of our homeland, are equipped with perceptual systems such as tasting, smelling, haptic, auditory, and visual systems (Gibson 1979), and they live in harmony with their environment. Such symphony saves animals from travelling to the fourth dimension to perceive their physical environment as it is in three-dimensional structure.

There are three different perceptual systems that are directly related to depth perception. The first one is the haptic system that requires information to be gathered directly from the environment by touching. Touching along with reaching is especially critical for infants while learning three dimensional structures of objects and developing depth perception. It is not surprising that infants are used to love touching every single object and all surfaces with full of curiosity but this habit fades away as they are growing up. Infants’ miserable sight might be one reason of that. They cannot visually focus on objects that stand out of 8–10 inches (20–25 cm) distance from their face. Their eyes are not well coordinated, and they do not have the control of their eye-movements, neither eye-body coordination skills do they have. It takes 2 years for infants to develop a normal vision and 7 years for a fully skilled vision. On the other hand, infants start touching while in the womb, and they are born with highly sensitive skin; around 3 months of age they also start reaching. In such sense, touching is in favor for depth perception and understanding three-dimensional structure of the objects and surfaces around during childhood. Unfortunately, one handicap of haptic perception is that touching is only available within arm-length distance. Because of this boundary, haptic perception is not really practical on evaluating the long distance or the three-dimensional shape of giant objects. Touching takes a step back in the adult’s process of depth perception although it still contributes in some way. There is indeed some exceptional cases where touching is still a blue eyed boy for adult, just like for children; for example, in the case of blind people. Eşref Armağan, who is a congenitally blind painter lives in Turkey may set the best example of it. He challenged scientists’ knowledge about depth perception they had had until run into Eşref Armağan and saw his art. He uses pictorial information in his paintings to reflect depth that thought to be available only through vision by then, but haptic. Eşref Armağan has never had vision to understand such information; he is used to see all details by through his fingers.

The illusions for sighted people due to depth information in Esref’s drawings are striking. Inaccuracy in touch may be less to do with depth information than vision, and more to do with grouping areas in 2D [two-dimensional]. Heller (2003) reports many tactile illusions for the blind, but no effects that can with certainty be ascribed to pictorial depth information. In contrast, studying grouping, Heller et al. (2003) found flat forms embedded in larger forms hard to discern in touch; they were grouped with the embedding form. (Kennedy and Juricevic 2019, p. 11)

Auditory is the second perceptual system that contributes to depth perception at a considerable degree. Unlike haptic, auditory is not much convenient on discovering the three-dimensional shapes of objects and surfaces; that is not because it is impossible but it requires to develop an exclusive skill to do so like bats and dolphins do. Yet, auditory has an edge on haptic in detecting spatial location of objects that are farther away from arm-length distance. Auditory could certainly be a reliable guide to determine the relative direction of the objects along with the distance from the perceiver. Otherwise a bell on the collar would not be able to help you to find your little playful kitty while it is hiding from you. Objects which somehow get our attention in the environment do not always fall within our visual field or arm-length distance. When that is the situation, auditory system takes all responsibility in evaluating the relative position of the object. For example, imagine you are camping in a jungle, and while resting in your tent at midnight, you hear kind of a strange sound that could be made by a hostile person or a wild animal. You have nothing to defend yourself and you are about 3 miles (5 km) away from your car. You, now, have two options ahead of you: either you decide to stay in your tent quietly which is the shrewd reaction to do if the beast is just right there around the tent. If you decide the beast is not really close, you may choose to sneak off to your car. Then, you have to make sure that the beast is not in the same direction you run towards. Otherwise, you end up falling into the lap of the beast. The sound you hear is the all source of information on your hand to make such a life matter decision; if you listen to it carefully, your auditory system will reveal every detail you need like how far the beast is from you, in which direction it is moving, and how fast.
What haptic and auditory systems mean to depth perception have been discussed, so far. Each one has different advantages favorable for different aspects of depth perception. They are counterparts each of which completes deficient part of the other for a completed depth perception. In general, neither one is better than the other. However, the third one, visual system, is capable enough to be able compensate in the absence of both haptic and auditory systems by its own for the depth perception in all respect. Visual system is the one we most rely on among all perceptual systems we have and the one we are most afraid of losing.

RECENTLY, I ASKED a group of friends which sense they would miss the least if they had to lose one. Most people chose smell. No one chose vision.

If you lose vision, you lose your most important sense for knowing where things are. Vision allows us to perceive the shape of objects and their arrangement in the world around us, an ability that impacts nearly everything we do … Among all our sensory systems, vision’s contribution is uniquely important for the brain’s assembly of a sense of space. (Groh 2014, p. 7)

Visual system is important to us as playing a major role in our understanding and identifying the environment, and interacting with it. Considering the vision’s essential functioning in perception, it is understandable that vision is not a simple process. The studies that dedicated to understand visual perception dates back in time until the era of philosophers lived in ancient Greece. Not alone has it been interest of neuroscientists and psychologists, but philosophers, mathematicians, artists, and astronomers, too. Democritus, Plato, Euclid, Alhazen, Leonardo da Vinci, Rene Descartes, Johannes Kepler are only a few famous historical figures who paid attention to explain vision. People from all over the world (i.e., India, China, Arabia, Anatolia, and Europe) in different disciplines have cumulatively formed today’s understanding of vision (Groh 2014; Howard 2012).

During the early stage, even the source of the information that enabled the vision and its direction between eye and perceived object or a target point was a mystery. It was mid-eleventh century in the Common Era when the first time Alhazen pointed out that the light reflecting from object to eye contains all the information needed for the vision. Albeit he revealed the obligation of one-to-one relation between the eye and the perceived object or target point, he claimed that the perception should be formed in the lens; otherwise we would perceive the world in upright position. It took until seventeenth century when Kepler’s work revealed the optical basis of visual perception that is the discovery of the retinal image (Gilchrist 2014; Groh 2014; Howard 2012).

Today, common knowledge has it that vision starts when ambient light hits the retina of the eye. The retina captures only two-dimensional images of the three-dimensional world. Yet, we clearly perceive the world as being three-dimensional. What allows the visual system to reconstruct its view of the three-dimensional world from those two-dimensional retinal images? Though not a comprehensive explanation is revealed yet, there are some justifications that could be counted as the answer. Before continue with those notes, the heart of the problem with two-dimensional retinal image needs to be clarified further.

It is quite problematic to perceive the three-dimensional world through two-dimensional retinal images because the two-dimensional retinal images could be inherently ambiguous. The ambiguity basically arises from the missing third dimension (depth); different objects in different sizes, positions, or distances sometimes could be projected onto very similar retinal images; in other words, although they have differences in the real world, they may be conflated in the retinal images. Such ambiguity is called the inverse projection problem (Palmer 1999; Pizlo 2001).

The inverse projection problem causes failures in depth perception at the level of the retinal images. Luckily, our visual systems are capable of preventing or at least decreasing such failures by resolving the ambiguity of the inverse problem on the retinal images. Our visual system relies on different types of depth cues to make up for the losses of depth due to the inverse projection problem. In the following part, those cues are briefly noted along with their meanings in the order they generally classified. Depth cues are mainly divided into two broad categories: oculomotor and visual.

Oculomotor Cues

The term “oculomotor cues” refers to the information the brain receives from eye muscles while fixating the eyes on a target point, usually that is specifically an object. Such information is practically usable only for arm’s length distance. For example, if you fixate your eyes on an object you are holding at eye height within arm’s length, your visual system will benefit from the oculomotor information cue to decide where the position of the object is, depth-wise. If you slowly move the object towards and away from you, you can feel it through oculomotor feedback. There are two types of oculomotor information cues:


The ciliary muscles, which are located in the ciliary body around the iris of the eye and innervated by oculomotor nerves, accommodate the lens in order to focus on an object at different distances within about an arm’s length radius. When an object is nearby, the ciliary muscles contract and hence make the lens rounder. In contrast, when the object is more distant, the ciliary muscles relax and hence make the lens flatter. By doing so, the ciliary muscles, on one hand, allow us to focus on objects for clear vision; on the other hand, they provide useful feedback to our brain for the depth interpretation about the object.


Movements of the two eyes in opposite directions (when both of them move either inwards or outwards; towards or away from each other) are called vergence eye movements: To be able to fixate our eyes on an object moving toward our face, we move the eyes inward; this is called convergence. On the contrary, eyes diverge as an object gets farther away from our face; this is called divergence. Vergence eye movements help to maintain a single image in binocular vision of an object moving in the depth dimension. Moreover, the feedback from the muscles that control vergence eye movements provides depth information about the object (Mon-Williams and Tresilian 1999; Mon-Williams et al. 2000). Accommodation by the ciliary muscles (as described above) is connected to vergence eye movements – both usually happen together when an object is moving in depth. Yet, vergence provides more accurate estimations of depth perception as compared to accommodation.

Visual Cues

While oculomotor cues are considered to be internal cues, visual cues have a more external source: usually physical, psychological, and pictorial information cues. Visual cues can be categorized into two groups as binocular and monocular.

Binocular Cues

Binocular cues are taken to be the most essential cues for depth perception. Because these cues are the most powerful among other cues, our visual system primarily relies on this information as planning and performing actions (for a review, see Goodale 2011).

When we fixate both eyes on one object (or at a single point), two eyes see the same object from slightly different perspectives because of the space between them, known as the interpupillary distance (IPD), which averages about 2.5 inches (6.3 cm) for an adult human. As a result, we have two retinal images, which differ slightly from one another. Although we have two different retinal images, we experience only one visual image (under normal conditions); the brain forms a single, three-dimensional visual image from two different two-dimensional retinal images. The differences found between the two retinal images serves as a source of information for depth perception (Bishop 1970; Julesz 1971). For a better understanding of this source of information, it is necessary to first understand the geometry of the “horopter.”

When we fixate both eyes at a single point, the fixation point, together with centers of the lenses of both eyes, all lie on an imaginary circle. On this imaginary circle, we can picture two arcs with the centers of the lenses of two eyes being endpoints; the one that contains the fixation point midway along its length is called the horopter. All images along the horopter are projected onto corresponding retinal points in the two eyes, thus the horopter is also known as the set of points with zero disparity (Von Noorden 1996). All the points in the visual field that are off the horopter fall on analogous (noncorresponding) retinal points in the two eyes. Moreover, as the distance from the horopter increases, image difference between noncorresponding points in the two retinae also increases. This difference between the two retinal images is called retinal disparity. There are two types of diplopia, or double-vision: crossed disparity and uncrossed disparity. Crossed disparity is created by the images located between the horopter and the observer; these images are usually projected on the temporal hemiretina points (the outsides of the retinae). On the other hand, uncrossed disparity is created by the images that are located behind the horopter; these images are usually projected on the temporal hemiretina points (Von Noorden 1996).

The retinal disparity causes diplopia. Yet, we usually do not experience the diplopia so it is called physiological diplopia – it happens on the retinae, but not in our minds. The reason we do not experience this diplopia is that our brains fuse the retinal disparity within Panum’s Fusional Area. It is a specific area around the horopter that allows fusion. This area is narrowest at the fixation point and getting wider towards the periphery. The process of fusion is called stereopsis (Von Noorden 1996).

Monocular Information Cues

Although binocular vision is very powerful for depth perception, we can perceive depth even in the absence of binocular vision by relying on monocular information cues.

Dynamical Cues

Motion provides depth information that is available even to monocular vision. Goodale (2011) described studies which revealed that one-eyed people seemed to use retinal motions resulting from head movements during reaching actions. This suggests that dynamical cues are the second most reliable depth information cues after binocular cues. Consistent with Goodale, Boring (1942) found motion parallax, which is a kind of dynamical cue, to be the most important cue after binocular cues. On the other hand, Nakayama and Loomis (1974) considered both to be primary information cues for depth perception in their comparison of binocular cues and optical velocity information, which is a specific kind of dynamical cue. They claimed that binocular cues might be more useful for actions, such as reaching, that take place in the near environment whereas optical velocity information might be more useful for more distantly directed actions such as throwing or observing.

Motion Parallax

Motion parallax is the term used to describe optical displacement of an object in the visual field while the observer is in motion. Helmholtz’s (1925) definition of motion parallax is two-fold: first, he asserted that motion parallax information allows us to judge absolute distance of the object from our own position. While an observer moves in one-direction, points, relatively fixed in the observer’s visual field, provide a sense as if they are moving in the opposite direction and their apparent angular velocity to the observer is inversely proportional to their real distance. Thus, their real distance to the observer can be inferred from their angular velocity. Subsequently, Gibson and his colleagues (Gibson et al. 1955) showed that this is true only if the speed and direction of the observer are known. The second part of Helmholtz’s definition claims that motion parallax information allows us to discriminate objects sitting in different positions in depth. Helmholtz (1925) used a foliage and branches example: it is almost impossible to distinguish a tangle of foliage and branches if viewed in stillness. Yet immediately upon the presence of motion, we can appreciate the separation of each leaf and branch and become aware of their relations to each other in space. Gibson (Gibson et al. 1959) suggested these two approaches were independent and needed to be studied separately.

Later on Nakayama and Loomis (1974) explained how to extract depth information from optic flow based on a formula that describes the pattern in the optic flow. Optic flow is a related but broader notion than motion parallax. Nakayama and Loomis emphasized the distinction between motion parallax and optical flow information: motion parallax refers to only a small number of focal objects; optical flow information, however, is the totality of optical motions (Gibson 1950). The relative motion between an observer and the environment produces optic flow information. Because optical flow refers to all points in the visual field rather than a single focal point, it is represented by what is called a velocity field. Each vector, in the velocity field, is associated with optical motion of a point and the magnitude of the vectors is inversely correlated with the distance of the point. In other words, closer points move with greater velocity. Variation in distance generates motion parallax gradients. Hence, motion parallax provides information about the relative depth of the points.

Another inference that can be made from optical flow structure occurs when two adjacent points in the image have significantly different velocity magnitudes; if those adjacent image points are associated with two distinct points in depth, then the difference between their velocity magnitudes will be greater. Such information is a useful cue regarding the separation of distinct surfaces. Yet this information usually accompanies cues such as accretion and deletion that will be explained shortly after expansion and contraction.

Expansion and Contraction

Imagine you are moving towards an object with an angular position of 0–straight ahead. The angular velocity of the object will be 0, and so the magnitude of angular velocity will be useless for depth perception. In such a case, expansion and contraction information helps to provide depth perception. While an observer moves closer to an object, the object will appear as if it is expanding because of the change in the retinal image size of the object. Contrarily, while moving further away from the object, the object will appear as if it is contracting within the retinal image. The degree of expansion or contraction will be inversely proportional to the distance between the object and the eye. For example, assume you are in the middle of your room and looking at a tree outside the window. You are able to see only part of the tree. With each step you take towards the window, a bigger part of the tree will be available to your sight. Eventually, you will be able to see the whole tree when you reach the window. This experience could be explained as follows: while you are moving towards the window, the expansion degree of the window is greater than the expansion degree of the tree because the window is closer to you. You can use such information to infer the depth order between two objects. If you know your speed, you can also infer the absolute distance of the objects.

Accretion and Deletion

When two opaque objects are located at different distances in depth, the nearer object partially or wholly covers over or reveals the more distant object as the observer moves. (Gibson 1979). Imagine that you are sitting in a theatre, in the seventh row of seats, and trying to see the picture on the scene. Unfortunately, the head of the person sitting in front of you in the sixth row blocks the scene. If you try to lean forward, the person’s head will cover a greater part of the picture. Is his/her head getting bigger as you are leaning forward? Of course not. Actually what is happening here is exactly the same as with the window-tree example: as you are leaning forward both the picture and the person’s head are expanding however because he/she is closer to you, the degree of expansion of his/her head is greater than the degree of expending of the picture. Therefore, you are experiencing deletion in the picture on the scene. If you lean backward rather than forward, you experience accretion of the picture on the scene. Moreover, if he/she sits further from you and closer to the scene, the difference between the expansion/contraction degrees of the head and the picture will be smaller; so, the accretion/deletion will occur more slowly. Hence, the rate of accretion or deletion is able to provide relative depth information about objects. Accretion or deletion will occur not only during backward/forward movements of the observer but also during sideways movements. However, with accretion and deletion information revealed by sideways movements, you can still easily infer depth order.

Static Cues

While you are looking through a peephole to see what there is on the other side of the door or wall, binocular vision is not available at all and motion is very restricted as you are not allowed to move backward-forward or side to side. Nonetheless, thanks to static cues, so you can still experience the depth in your view.

Static cues are also called pictorial cues and often used by visual artists in two-dimensional paintings and photography to reflect the depth sense. From time to time, you may run into such paintings or photographs that make you strongly feel as if you are a part of the environment via the sensation of depth depicted in the picture. If you think you have never came across one of those, just take a break from reading this entry and check these paintings: The Dance Class by Edgar Degas (1874), A Sunday AFternoon on the Island of La Grande Jatte by Georges Seurat (1884), Cafe Terrace at Night by Vincent Van Gogh (1888), Paris Street; Rainy Day by Gustave Caillebotte (1894), and The School of Athens by Italian Renaissance artist Raphael (1511).


Considering Edgar Degas’s painting, “The Dance Class,” which ballerina is closest to you in the painting? Although it is a two-dimensional picture with no dynamical cues, you can easily say that the ballerina who wears a big red ribbon barrette on her hair on the left hand side is closest with no doubt. Your confidence relies on a basic depth cue called “occlusion”: some part of the girl, which seems closer, is partially occluding the other girls. Occlusion indicates the relative distance but not absolute distance. It is hard to tell the absolute distance between people in the picture based only on their occlusion cues.

Relative Size

If one of two similar or identical objects in an image looks bigger than the other one, it will give the impression that the one that looks bigger is closer. Reconsidering your answer above to the question regarding the closest ballerina in the painting, “The Dance Class,” the occlusion is not enough by itself to say that the ballerina who wears a big red ribbon barrette on her hair on the left hand side is closest with no doubt. She is not occluding the dancers on the right hand size but she is the biggest in terms of size although they should be around same size. Thus, your confidence relies on both depth cues called “occlusion” and “relative size.”

Note that the logic of relative-size information applies to expansion and contraction in motion, and, together with occlusion, it is also related to accretion and deletion.

Familiar Size

Our knowledge about objects can help in depth perception too. For example, if we see a toothpick, a baby, and the Eiffel Tower, all appears to be the same size, in a picture, we will think that the toothpick is closest and the Eiffel Tower is the furthest object. Moreover, we can infer that the distance between the toothpick and the baby is smaller than the distance between the baby and the Eiffel Tower.

Distance from Horizon Line

Objects and surfaces appear closer to the horizon as their distance becomes greater. Picture a moment at a beach, you are watching people swimming in the see; you do not hesitate to assume that the person looks closest the horizon is the furthest away from the beach.

Shadows, Shading, and Lightning

Shading and lighting patterns on an object or surface provide powerful cues as to the three-dimensional structure of the object or surface. You can make a circle appear as convex or concave simply changing the lightning and shading. The characteristics of cast shadows help us to determine the position of objects and the shapes of the environment. For example, a cast shadow of a flying bird cannot be connected to the bird. The cast shadow of a telephone pole does not look same on a stone as it is on a street; the stone has a convex surface whereas the street is flat.

Atmospheric Perspective

Because of the atmospheric effect, objects that are further away appear bluer and less distinct than nearer objects.

Linear Perspective

Parallel lines appear to converge towards a vanishing point on the horizon or close to the horizon. A railroad, viewed from one standing upon the tracks looking along them, might be the best example to clarify linear perspective. You can also visit and see some paintings in the gallery of Esref Armagan who is the congenital blind painter mentioned above.

Texture Gradient

Considering a walkway and a wall, two different surfaces both are consisting of similar cobblestones. Indeed both surfaces have a homogenous spatial texture. But, in the real world, we see a heterogeneous spatial structure on the walkway. Based on the rate of change in the density of the pattern, we can easily distinguish a walkway from a wall, which has a homogeneous pattern. The increasing density of the pattern indicates an increasing angle from the eye and that increasing angle yields increasing depth information.

If we express this in a more general way: the volume of gradient specifies the slant of a surface in the world with respect to the line of sight. Furthermore, simple changes in gradient can define corners and/or edges. If the change of gradient of density is smooth, it yields a corner; if there is a jump in the gradient of density, it yields an edge or, in other words, two separate surfaces (Reed 1988).

The Gibsonian Approach to Depth Perception

Earlier theories attempted to explain visual perception as a sensation-based phenomenon that found the retinal image to be central to visual perception. Those theories struggled, however, to explain three-dimensional vision based in two-dimensional retinal images. Some early theorists (e.g., George Berkeley) suggested that depth perception could not be explained based purely on the senses per se; sensation was impoverished to explain perception without relying on resource to knowledge about the world such as provided by experience. These theorists explained depth perception as occurring by first having the sensation and then inferentially constructing the space.

Gibson rejected this sensation-based approach to visual perception. Instead he introduced the information-based approach. According to the information-based approach, visual perception cannot be explained based on retinal images; perception is detecting information, which exists as invariant variables in the optical array. Optical array is the pattern of structured light reaches the eye with respect to point of observation and contains all the information for visual perception directly without need for any mental representation or memory. Retinal images are just a sampling from the optical array. Hence, information is not a static two-dimensional image on the back of the eye. Instead, it is available to be detected in the optical array and specific to surfaces and layouts. Depth is not really lost (Braund 2008).

Although Gibson eventually rejected the sensation-based approach, he had had missteps and confusions before developing his final theory. In his early studies, Gibson tested slant perception based on different texture gradients (Reed 1988). In the experiment, subjects sat in front of an aperture in a wall. Through the aperture, they were able to see the texture of an upright screen behind the wall though the edges of the screen were occluded from their sight by the wall with the aperture. The upright screen was painted different texture gradients. These gradients allowed Gibson to manipulate the patterns of retinal images in his participants. The task of the subjects was to decide the optical slant they observed on the screen. The hypothesis behind the experiment was that the gradient of texture density with no other clues available would yield an impression of a receding surface. Results revealed that subjects thought they were seeing slanting surfaces despite the fact that the surface in the back was always upright. Moreover, greater changes in the texture density resulted in greater perceived slants. However, there was a problem: the slant was always underestimated, even though texture gradients co-varied with perceived slant.

Subsequently Gibson decided to focus on perceptual psychophysics rather than sensual psychophysics. In his optical tunnel experiment, he was studying the optical conditions that give rise to surface and space perception. This was neither a physical tunnel nor a special retinal image; you will read more about what an optical tunnel is, in a moment. Gibson was no longer attempting to control the formation of retinal images but instead was controlling the stimulation available to observers’ visual systems that enabled them to obtain retinal images.

In the optical tunnel experiment, Gibson attempted to manipulate and studied optical transitions. The tunnel apparatus controlled the gradient of optical transitions in the available stimulation by different arrangements of the transitions and different observation positions. The optical tunnel was in two arrangements. Plastic sheets are arranged with alternating smooth black and white coloring, with central holes cut out carefully so that each sheet acts as an aperture for the sheets behind it. First, the tunnel creates a constant density of optical contrast, a heterogeneous, inverse gradient in the world. This looks flat, canceling gradient in the optics, homogeneous. Second, the tunnel creates an increasing density of constant, no gradient in the world, homogeneous. This displays a visual tunnel, yields gradient in the optics, heterogeneous.

In the tunnel experiment, Gibson provides the information of a surface even though there is no real surface there. The optical structure tells the same thing as optic flow structure. The gradient in the optic flow is identical. He distinguished the world and the optical structure and studied the relation between them. The optical structure is different than the world structure but they are uniquely related. Depth is not lost; there is enough information in the structure of light.

The significant change in Gibson’s methodology was that he was no longer trying to make retinal images replicate surfaces; instead, he was trying to control the stimulation available to the eyes of his observers.


Depth perception is perceiving the environment as in three-dimensional coordinate systems. It is all about knowing where things are, what they are, and what we can do with them. In this manner, depth perception provides two critical aspects of understanding our environment: clarifying the shapes and evaluating the spatial locations of the objects. Neither haptic nor auditory system alone is capable of encompassing the two aspects of depth perception. The nature of haptic confines its employability within arm-length distance. Yet, haptic system is proficient on evaluating the shapes of the surfaces and objects, so it can precisely tell us what they are and what they afford. Haptic is primarily used by infants and blind people. Auditory system is a counterpart of the haptic system; they are integral part of each other for the sake of depth perception with no strings. In other words, haptic system is strong where the auditory system is weak and vise versa. For example, haptic system works well within arm-length distance, and auditory system steps in while the distance getting greater than arm-length. Haptic clearly state what the object is whereas auditory confidently decides where the object is with respect to our position. Visual system is not affected in the absence of either one or both of haptic and auditory systems; vision still capable of promising a fully completed depth perception.



  1. Abbott, E. A. (1884). Flatland: a romance of many dimensions. Seeley & Co., Ltd., London.Google Scholar
  2. Bishop, P. O. (1970). Beginning of form vision and binocular depth discrimination in cortex. The neurosciences: Second study program (pp. 471–485). New York: Rockefeller.Google Scholar
  3. Boring, E. G. (1942). Sensation and perception in the history of experimental psychology. New York: D. Appleton-Century Company, Incorporated.Google Scholar
  4. Braund, M. J. (2008). From inference to affordance: The problem of visual depth-perception in the optical writings of Descartes, Berkeley, and Gibson. (Unpublished master’s thesis). Brock University, St. Catharines, Ontario.Google Scholar
  5. Gibson, J. J. (1950). The perception of the visual world. Boston: Houghton Mifflin.Google Scholar
  6. Gibson, J. J. (1979/1986). The ecological approach to visual perception. Hillsdale: Erlbaum.Google Scholar
  7. Gibson, J. J., Olum, P., & Rosenblatt, F. (1955). Parallax and perspective during aircraft landings. The American Journal of Psychology, 68(3), 372–385.CrossRefGoogle Scholar
  8. Gibson, E. J., Gibson, J. J., Smith, O. W., & Flock, H. (1959). Motion parallax as a determinant of perceived depth. Journal of Experimental Psychology, 58(1), 40.CrossRefGoogle Scholar
  9. Gilchrist, A. (2014). Johannes Kepler: The sky as a retinal image. Perception, 43(12), 1283–1285.  https://doi.org/10.1068/p4312ed.CrossRefPubMedGoogle Scholar
  10. Goodale, M. A. (2011). Transforming vision into action. Vision Research, 51(13), 1567–1587.CrossRefGoogle Scholar
  11. Groh, M. J. (2014). Making space: How the brain knows where things are. Cambridge, MA/London: The Belknap Press of Harvard University Press.CrossRefGoogle Scholar
  12. Heller, M. A. (2003). Haptic perceptual illusions. In Y. Hatwell, A. Streri, & E. Gentaz (Eds.), Touching for knowing (pp. 161–171). Amsterdam: John Benjamins.CrossRefGoogle Scholar
  13. Heller, M. A., Wilson, K., Steffen, H., Yoneyama, K., & Brackett, D. D. (2003). Superior haptic perceptual selectivity in late-blind and very-low-vision subjects. Perception, 32, 499–511.CrossRefGoogle Scholar
  14. Helmholtz, H. (1925). Treatise on physiological optics. III. The perceptions of vision (J.P.C. Southall, Ed.).: Optical Society of America: New York.Google Scholar
  15. Howard, I. P. (2012). Oxford psychology series. Perceiving in depth, Vol. 1. Basic mechanisms. New York: Oxford University Press.  https://doi.org/10.1093/acprof:oso/9780199764143.001.0001.CrossRefGoogle Scholar
  16. Julesz, B. (1971). Foundations of cyclopean perception. Chicago: University of Chicago Press.Google Scholar
  17. Kennedy, J., & Juricevic, I. (2019). Esref Armagan and perspective in tactile pictures. https://www.researchgate.net/publication/228505950_Esref_Armagan_and_perspective_in_tactile_pictures.
  18. Mon-Williams, M., & Tresilian, J. R. (1999). Some recent studies on the extraretinal contribution to distance perception. Perception, 28, 167–181.CrossRefGoogle Scholar
  19. Mon-Williams, M., Tresilian, J. R., & Roberts, A. (2000). Vergence provides veridical depth perception from horizontal retinal image disparities. Experimental Brain Research, 133, 407–413.CrossRefGoogle Scholar
  20. Nakayama, K., & Loomis, J. M. (1974). Optical velocity patterns: Velocity-sensitive neurons and space perception. Perception, 3, 63–80.CrossRefGoogle Scholar
  21. Palmer, S. E. (1999). Vision science: Photons to phenomenology. Cambridge, MA: The MIT Press.Google Scholar
  22. Pizlo, Z. (2001). Perception viewed as an inverse problem. Vision Research, 41(24), 3145–3161.CrossRefGoogle Scholar
  23. Reed, E. S. (1988). James J. Gibson and the psychology of perception. New Haven: Yale University Press.Google Scholar
  24. Von Noorden, G. K. (1996). Binocular vision and ocular motility: Theory and management of strabismus (5th ed.). St. Louis/Sydney: Mosby.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Süleyman Demirel UniversityIspartaTurkey

Section editors and affiliations

  • Peggy Mason
    • 1
  • Yuri Sugano
    • 2
  1. 1.University of ChicagoChicagoUSA
  2. 2.NeurobiologyUniversity of ChicagoChicagoUSA