A type of learning in which the probability of a behavior recurring is increased or decreased by the consequences that follow upon occurrence of the behavior. The three-term contingency represents the simplest conceptual model of operant conditioning (Holland and Skinner 1961).
Operant conditioning applies many techniques and procedures first investigated by E. L. Thorndike (1898), but was later refined and extended by B. F. Skinner (1938). Although operant conditioning is built on the classical conditioning work of Ivan Pavlov (1927), it is distinguished from classical conditioning in that operant conditioning deals with the modification of “voluntary” (operant) behavior. The operant is behavior that acts on the environment to produce a consequence, which is meted out by the environment in response to the operant. This response encourages the operant to either repeat or cease the behavior. Operant conditioning techniques are currently used in clinical therapy, although they are typically applied as part of cognitive behavioral therapy.
Three-term contingency consists of discriminative stimulus, operant response, and consequences of behavior (reinforcer/punisher). Discriminative stimulus is an antecedent stimulus and is defined as a cue that signals the probable consequence of an operant response. That is, it signals whether the operant response will be reinforced or punished. Certain types of results occur after an organism performs a response to a discriminative stimulus. If the results are advantageous or favorable, the response increases; however, the response decreases when the results are disadvantageous or unfavorable.
In an example of operant conditioning, a hungry pigeon is caged in an operant box, which contains a feeder that can be activated to dispense feed by the pecking of a lighted key. Initially, the pigeon walks around the inside of the box and accidentally pecks at the key, which then releases feed. Although the pigeon does not comprehend the relation between the lighted key and feed, gradually, the frequency that the pigeon moves to the location where the feed is dispensed after pecking the lighted key increases. In this case, the pigeon learns the three-term contingency, consisting of the operant box (discrimination stimulation), pecking a lighted key (operant response), and feed (reinforcer). The change in frequency with which the pigeon pecks the lighted key represents the process of operant conditioning.
Reinforcement is the process of increasing or sustaining a behavior by its consequences. Two kinds of reinforcement exist: positive and negative reinforcement.
Positive reinforcement occurs when the frequency of a behavior is increased as a result of the presentation of favorable events or outcomes, known as positive reinforcers. This form of conditioning is termed reward training.
Negative reinforcement occurs when the frequency of a behavior is increased because it is followed by the removal of unfavorable events or outcomes, known as negative reinforcers. This form of conditioning is termed escape training.
Punishment is a process by which a behavior is decreased by its consequences. There are two kinds of punishment: positive and negative.
Positive punishment occurs when a behavior is decreased because it is followed by the presentation of unfavorable events or outcomes, which are known as positive punishers. This form of conditioning is termed punishment training.
Negative punishment occurs when a behavior is decreased because it is followed by the removal of favorable events or outcomes, which are known as negative punishers. This form of conditioning is termed omission training.
Although punishment is more effective if combined with reinforcement, as appropriate and alternative behaviors can be learned, it is less effective without reinforcement because it only suppresses inappropriate behavior.
Extinction is a process whereby the positive reinforcement of a previously reinforced behavior is discontinued. Organisms may exhibit resistance to extinction, by which a response continues even after the reinforcement ceases. The greater the resistance to extinction, the longer the response will continue.
Notably, extinction may produce adverse side effects; two commonly noted effects are an increase in the frequency of the target response and an increase in aggression.
Shaping is a method for conditioning an organism to perform a new behavior. It is well described by its technical name: the method of successive approximations. To approximate something is to get close to it, and successive approximations condition an organism in small steps. Shaping works by starting with whatever the organism can already do and subsequently reinforces with closer and closer approximations to a goal.
Five simple rules for shaping are as follows: (1) Ensure the target behavior is realistic and biologically possible. (2) Specify the entering and target behaviors. (3) Plan a small chain of behavioral changes leading from the entering behavior to the target behavior. (4) If a step proves too large, break it into smaller, more manageable steps. (5) Use reinforcers in small quantities to avoid satiation.
Schedule of Reinforcement
Several types of reinforcement schedules exist. If reinforcement occurs after each desired behavior, the situation is termed continuous reinforcement. However, if reinforcement occurs only after certain desired behaviors, it is referred to as partial reinforcement. A response learned under the latter conditions is more resistant to extinction, a phenomenon called the partial reinforcement effect.
Fixed-ratio schedules are those where a response is reinforced only after a specified number of responses, while variable-ratio schedules occur when a response is reinforced after an unpredictable number of responses. Fixed-interval schedules are those where the first response is rewarded only after a specified amount of time has elapsed. Variable-interval schedules occur when a response is rewarded after an unpredictable amount of time has passed.
The response styles of subjects, whether they are pigeons in an operant box or employees in a workplace, vary based on the schedule used. Other factors being equal, a variable-ratio schedule produces the greatest number of responses from a subject in a given time period, whereas a fixed-ratio schedule fosters rapid learning of the desired response; the number of responses then remains steady, but is lower than those produced by a variable-ratio schedule. Fixed-interval schedules produce relatively few responses overall and a drop in number of responses immediately following reinforcement, although the number increases as the time for reinforcement nears. Variable-interval schedules result in slower learning of the response, followed by a steady number of responses, but produce fewer than those resulting from the fixed-ratio schedule.
Notably, the type of reinforcement schedule will have an impact on how quickly a behavior is extinguished.
Operant Conditioning Therapy
Operant conditioning therapy is a form of behavioral therapy that utilizes the procedures of shaping, token economy, chaining, response cost, time out, and stimulus control.
Token economy uses a token as a reinforcer. Tokens begin as essentially neutral stimuli and are of minor significance in and of themselves. However, as tokens become increasingly associated with the reinforcers for which they are exchanged, the tokens themselves can become mildly reinforcing. Chaining is yet another procedure that is based on shaping, but it is used to condition an entire complex series of different responses, not just one.
Response cost represents the removal of a positive reinforcer after the occurrence of an undesirable response. Time out is the time during which a discriminative stimulus is not available. Stimulus control is the process of controlling discriminative stimulus.
References and Further Reading
- Holland, J. G., & Skinner, B. F. (1961). The analysis of behavior: A program for self-instruction. New York: McGraw-Hill.Google Scholar
- Mazur, J. E. (2006). Learning and behavior (6th ed.). Upper Saddle River: Prentice Hall.Google Scholar
- Mednick, S. A., Higgins, J., & Kirschenbaum, J. (1975). Psychology: Explorations in behavior and experience. New York: Wiley.Google Scholar
- Pavlov, I. P. (1927). Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex. Oxford, UK: Oxford University Press.Google Scholar
- Reynolds, G. S. (1975). A primer of operant conditioning (Rev. ed.). Glenview: Scott Foresman.Google Scholar
- Robbins, S. J., Schwartz, B., & Wasserman, E. A. (2001). Psychology of learning and behavior (5th ed.). New York: W. W. Norton.Google Scholar
- Skinner, B. F. (1938). The behavior of organisms: An experimental analysis. New York: Appleton-Century.Google Scholar
- Skinner, B. F. (1953). Science and human behavior. New York: Macmillan.Google Scholar
- Thorndike, E. L. (1898). Animal intelligence: An experimental study of the associative processes in animals (Psychological review monograph supplement, Vol. 2, No. 4, Whole No. 8). Lancaster: Macmillan.Google Scholar