Posted by fiddlemath on August 7, 2013

Don’t Shoot the Dog, by Karen Pryor, is an excellent book on training via positive conditioning. It’s explicitly about animal training, but it’s clearly applicable to training yourself and other humans, too. I kept notes when I thought I was learning something new; this is my summary of things learned.

1. Reinforcement

1.1. Reinforcement vs. Rewards

A reinforcer is anything that tends to increase the probability that the act will occur again. You can make an action more frequent with positive reinforcement; you cannot reinforce behavior that isn’t happening.

An “aversive” is anything a person or animal will work to avoid. A negative reinforcer is an aversive that can be halted or avoided by changing behavior – preferably, as soon as the new behavior starts, the aversive stops.

Spends a while talking about “reward” and “punishment”, and how most people’s notions of these vary widely from positive and negative reinforcement.

1.2. Timing reinforcers

The timing of a reinforcer carries precise information. A “yes” at the right moment during a lesson is far more effective and informative than thorough praise five minutes later. Reinforcing too late is one of the most common difficulties – actions happen quickly, and you might be reinforcing the wrong things. It may even help to have someone else watch you for late reinforcers. “Laggardly reinforcement is the beginning trainers biggest problem.”

Reinforcing too late is common in rewarding behavior; reinforcing too early is common in trying to be encouraging – like getting kids to do schoolwork, or think they’re smart. Both train unexpected or random behavior.

Timing is also important for negative reinforcement. If the negative reinforcer doesn’t cease exactly when behavior has been modified, it neither reinforces nor informs.

1.3. Scheduling reinforcers

Reinforcers should be as small as will be noticed. A small mouthful of food for an animal or a single M&M for a human; a smile, a pat, a single “good”. If you keep the reinforcer small, the trainee will tire of it less quickly.

In animals, you can get about 20 reinforcements in per session, and no more than about 80 per day, before they lose interest. This varies by species and age.

1.4. Jackpots

Surprising or “unearned” positive reinforcers of, say, ten times the usual size, might relieve a feeling of oppression, resentment, or sullen inaction. Apparently, this isn’t really understood.

1.5. Conditioned reinforcers

A conditioned reinforcer is a signal deliberately presented before or during the delivery of a primary reinforcer. (clickers, whistles, “good dog”). A conditioned reinforcer lets you pinpoint the timing of reinforcement – essentially, a way to bring the reinforcement immediately to the timing of the action, even if doing so with the primary reinforcer is impossible. (e.g., training dolphins to jump)

This signal should be reserved for this purpose – don’t give it outside of reinforcement training. You can give all the love and affection you like outside of training; but reserve the “good job” signal to signal correctly learned behavior. (Consider: even small children resent false praise!)

Train a conditioned reinforcer by pairing it with multiple other reinforcers – this way, the learner reacts positively even if they’re currently satiated with the food, water, or praise you gave it in training. In training, these signals should be followed, when first possible, by a primary reinforcer. You can tell when an animal recognizes the “good job” signal; it startles, and starts to seek the primary reinforcer.

The “good job” signal will end offered behavior. As such, you may need a second signals to say “good, keep going.” This keep-going signal doesn’t have to be trained directly with a primary reinforcer; the learner will soon start to recognize it as signalling an intermediate step if it’s followed often enough by the “good job” signal.

A conditioned negative reinforcer, if needed, should always be a warning that current behavior will lead to immediate punishment. It must precede actual punishment, not just be shouting while punishing (as with a choke chain).

1.6. Reinforcement Schedules

Constant reinforcement is only needed during learning. To maintain a behavior, it’s important to switch to variable reinforcement. Dropping reinforcement suddenly will quickly extinguish the behavior; gradually moving to longer variable reinforcement will make the behavior persist.

Better, once the behavior is on variable reinforcement, you can selectively reinforce the best varieties of the learned behavior, shaping it. (e.g. training dolphins to jump high)

1.7. Exceptions to Variable Reinforcement

Do not put puzzles or tests on variable reinforcement. Roughly speaking, the learner is uncertain that they performed correctly, and needs the feedback.

1.8. Long Reinforcement Schedules

Use variable, not fixed, schedules. Extremely long schedules sometimes lead to extinction, too – usually at a metabolic boundary, where the reward isn’t worth the energy to do the needed work.

Long reinforcement schedules also lead to “slow starts”; the behavior will happen, but even animals will put off starting the long sequence of behaviors as the schedule gets longer. Pryor seems to equate this to procrastination, and says that you can beat it by introducing a reinforcer for getting started.

1.9. Reinforcing yourself

Do it, do it. Most of us are vastly under-reinforced for our behavior.

2. Shaping

Shaping: teach successive approximations to desired behavior. Works because even trained behavior is variable.

Establish intermediate goals, starting with a behavior that already occurs sometimes.

2.1. The Ten Laws of Shaping

2.1.1. Raise criteria in increments small enough that the subject always has a realistic chance for reinforcement.

Raise criteria within the range that the subject is already achieving. Not a function of the subject’s ability, but a function a the subject’s current behavior.

2.1.2. Train one aspect of a behavior at a time; don’t try to shape for two criteria simultaneously.

“At a time”, here, means to only work on one criterion at a time, to avoid confusing the subject. If the task can be broken into separate components, do that. (e.g. putting involves getting angle and distance right. You can shape these separately!)

2.1.3. Put the current response on a variable schedule of reinforcement before adding or raising criteria.

If a learner has been earning reinforcers predictably, simply skipping reinforcers might be confusing. You have to train that, too – and train it before you can train other criteria. Once the subject has learned that a skipped reinforcer doesn’t mean “wrong”, just “try again”, you can start selecting for a new criterion.

2.1.4. When introducing a new criterion, temporarily relax the old ones.

Well-learned behavior may fall apart when learning new skills. Shape the new skills anyway; reinforce the combination of skills once all the skills have been recently learned (and you’re sometimes getting it).

2.1.5. Plan ahead of your subject, so that you’re prepared.

If your subject makes a sudden leap forward, you want to know what to reinforce. (Some especially intelligent animals, and certainly people, can learn to anticipate your program of shaping, and just go ahead and do each step immediately.) Even if these occasions are rare, they’re exciting for you and the subject, and you want to be prepared to use that excitement if it happens.

2.1.6. Stick to one shaper per behavior.

Different trainers will have different ideas of the same criterion.

2.1.7. If one shaping procedure is not eliciting progress, find another.

These things aren’t magic; there’s plenty of programs leading to the same behavior.

2.1.8. Don’t interrupt a training session gratuitously; that constitutes a punishment.

This really only applies to formal training – giving lessons, training an animal – not informal settings, where smiles and fluid interaction suffice. Your attention matters; in a training setting, failing to give a reinforcer must be a considered action, and this means you have to attend all behavior. Removing your attention is a rebuke; don’t do it frivolously.

2.1.9. If behavior deteriorates, “go back to kindergarten”; quickly review the whole shaping process with a series of easily earned reinforcers.

Sometimes you’re actually working under slightly changed circumstances, and that change of context “loosens” the training. You might not even know what’s changed. If trained behavior seems to be completely shot, just review the whole process. This can be quick.

2.1.10. End each session on a high note, if possible; but in any case, quit while you’re ahead.

Different subjects can take different amounts of time. (An hour may be about as long as a human can usefully learn; it’s certainly a traditional period in many contexts.) What to stop on is more important. Always quit while you’re ahead, both to end sessions and to switch behaviors. Move on as soon as some progress has been achieved. (Peak-end rule!) If you work on a behavior where the subject is tired, and behavior deteriorates, you’ll untrain the behavior! If it’s not yet a good time to end the training session, this is a good time to move to a different behavior. In fact, if you stop on that high note, and let that accomplishment be the most prominent memory, you’ll often see better performance in the next session than the best performance you just saw. If a session isn’t going to hit a high note – fatigue will set in before achievement, say – then end the session with an easy, guaranteed way to earn a reinforcer, so the whole session is remembered as being reinforcing. End with easy play or other games. Never introduce new material late in a session.

2.2. Shortcuts: Targeting, Mimicry, and Modeling

Targeting: teach an animal to touch its nose to a target. Then, you can elicit lots of other behavior by moving the target appropriately.

Mimicry: many, kids especially, will learn well by copying behavior. Dogs are bad at this; cats are quite good at copying other cats, and occasionally noncats. (You may be able to train a cat by training a dog in front of the cat.)

If you want to demonstrate a gross physical skill to someone, do it with your back turned to them; if you want to demo a fine right-handed skill to a left-handed person, do it facing them.

Modeling: put the subject, manually, through the desired action. This is actually not great with alone; you need to add shaping as well – reinforce effort on the subject’s part. This way, you can shape the skill, while fading away the modeling.

2.3. Special Subjects (e.g. yourself)

“The single most useful device in self-reinforcement, I found, was record keeping […]. I needed to record performance in such a way that improvement could be seen at a glance. I used graphs. Thus my guilt over a lapse could be assuaged by looking at the graphs and seeing that, even so, I was doing much better now than I had been six months previously.”

Training by computer, especially, can work well, because the reinforcement the program gives actually works.

2.4. Shaping Without Words

In formal training, when the subject is a willing party to being shaped, you can give instruction with words, and then shape them. This is fine, and helps.

In informal situations… people resent being shaped. Particularly if you’re shaping away behaviors they currently endorse for whatever reason. (Anger, sadness, ranting, whatever.) Be careful to notice tiny improvements in behavior, and be careful not to talk about the reinforcement unless you actually have the subject’s permission. (Probably an interesting conversation here.) In particular, if you talk about it, you’re bribing them; they learn to take actions for promised rewards, instead of learning the impulse preverbally. Don’t brag about it later, either.

(Manipulating others to make them stronger is a great idea. Manipulating others to your own ends, at their expense, is evil. Seriously fucked-up, nasty nasty evil shit. Incidentally, if I learn anyone’s using what I’ve taught here to screw with someone else’s life in destructive ways, I will go make their life interesting, see if I don’t.)

3. Stimulus Control

Training response to stimuli – like commands, or “triggers”.

To establish a cue, start with the behavior first. You can’t train the behavior if it’s not already happening. Once the behavior is trained, and on a variable schedule, you can start to reinforce the behavior only when it’s cued.

You can introduce the cue in many ways:

produce the cue just as the behavior is starting and reinforce completing the behavior alternate between cue and no cue; reinforce only the behavior that follows the cue. shape response to the cue as a behavior itself; then shape that behavior into what you wish to train. “Once your learner understands the rules, new cues can be attached to new behaviors practically instantly this way.”

3.1. Rules of stimulus control

Perfect stimulus control is defined by four (obvious) conditions; but each of these may need to be trained independently:

The behavior always occurs immediately after the stimulus The behavior never occurs, in training or work, without the stimulus The behavior never occurs in response to a different stimulus No other behavior occurs in response to this stimulus. Pick signals that can be easily perceived.

You can train multiple signals for the same behavior, just not multiple behaviors for the same stimulus.

Nonprimary signals just need to have enough magnitude to be noticed; shouting isn’t even useful, unless your unpleasantness is an intentional aversive. Once a signal is trained, you can fade it until it’s barely perceptible, and still get the right response. (e.g. Clever Hans)

If you wish, you can shape speed of response. But be sure you’re training to a steady response-time criterion.

Behavior chains: a well-reinforced signal is an opportunity for reinforcement, so it becomes a desirable event itself – so you can reinforce a behavior by presenting the stimulus for another behavior. These are common – possibly, anything that you remember linearly is like this; songs, the alphabet, taking a shower, getting dressed… and, in fact, you can debug them as you would debug a behavior chain.

In part, you should train behavior chains backwards; and don’t start training behavior n-1 until behavior n is learned and on cue. (e.g. Teaching a dog to play frisbee)

4. Untraining

Eight methods to get rid of unwanted behavior (with four good fairies.)

  • Shoot the dog
  • Punishment (works badly!)
  • Negative reinforcement
  • Extinction
  • Train incompatible behavior
  • Put the behavior on cue, and then never give the cue (!)
  • Shape the absence – train the action’s opposite
  • Change the motivation – if you can understand why the unwanted behavior is happening, you can remove its cause. (e.g. don’t shop with hungry, tired children)

Note that some problems have multiple causes, and may require multiple solutions. One can use these techniques to alleviate physical and chemical addictions, but it’s tough; these are firmly addressing behavioral concerns.

(e.g., biting fingernails – Train yourself to notice that you’re about to bite your fingernails; train that as a cue to do something incompatible. I now trim them when I notice that, so long as a fingernail clipper is handy. Also, try to remove the stress that causes the fingernail biting in the first place.)

5. Reinforcement in the Real World; 6. Clicker Training

Lots of examples of the above principles. Good to read to check your understanding, but I think there’s no new high-level ideas here.

General Notes

Reinforcement vs. reward and punishment

The difference between “reinforcement” and the usual notion of “reward” and “punishment” is pretty stark. To increase the likelihood of an act, a reinforcer must happen as close in time to an even as you can get it. Rewards like year-end bonuses aren’t doing that; these may make goals more valuable, but they’re doing very little indeed to shape behavior on a subverbal level. Similarly, punishment that doesn’t stop immediately when the undesired behavior stops won’t shape behavior. Punishments outside of negative reinforcers will not yield predictable results.

Note: self-punishment is particularly useless; you train down the act of punishing yourself more than you train down whatever behavior you’re punishing. This is unpleasant and useless!

Reinforcement trains the entire context!

You don’t get to pick what in the current context you’re reinforcing. You’re reinforcing everything in the trainee’s context at the time. If you only remember to click up a desired behavior when the trainee is in your living room, you’ll train being in the living room. Whenever you’re giving positive reinforcement for anything, interactively, you’re also training up interacting with you. For a pet this may be desirable and adorable; in other cases, less so.

And, vice versa – if you use lots of negative reinforcement, but only when the trainee is near you, then aside from whatever specific behaviors you think you’re training, you’re training not being near you. If this isn’t what you want to train, then you might not want to do negative reinforcement like this…

This is probably part of “fallout”, forward-referenced to Chapter 4. Should have a look.

The Training Game

Two players, subject and trainer. (Expect group to get tired in about 6 rounds.)

Send the subject out of the room. Select a trainer, choose a behavior, and bring the subject back. Instruct the subject to “be active”: move around, be energetic. Without talking, and using only one nonverbal reinforcer (a clicker, a whistle, clapping or snapping), get the subject to perform your action.

Instruct the subject to return to the entry, after the first few reinforcements.

Quote: “The subject gets to discover that in this form of learning, brains don’t help. It doesn’t matter what you are thinking about; if you just keep moving around, collecting whistle sounds, your body will find out what to do without “your” help. This is an absolutely excruciating experience for brilliant, intellectual people.” If you’re the subject, don’t analyze too much, just go with what seems vaguely indicated.