Article 6 of 5 in CodeBlu Editorial Guides

Building Better Training

Published:: May 25, 2026
Last updated:: May 25, 2026

training
curriculum
procurement
evaluation

On this page

Table of Contents
1. The Problem This Guide Solves
2. Auditing the Training You Already Have
3. The Gap Between Policy and Practice
4. Evidence-Based Curriculum Design
5. Measurement and Assessment
6. A Procurement Guide for Training Providers
7. Putting It Together: A Modernization Sequence
8. Quick Reference Summary
9. Bibliography

The Problem This Guide Solves
Auditing the Training You Already Have
The Gap Between Policy and Practice
Evidence-Based Curriculum Design
Measurement and Assessment
A Procurement Guide for Training Providers
Putting It Together: A Modernization Sequence
Quick Reference Summary
Bibliography

1. The Problem This Guide Solves

Most agencies are not failing to train de-escalation. They are training it, often a great deal of it, mandated by the state, scheduled into in-service, documented in hours. The problem a modern training coordinator faces is rarely "we have no de-escalation training." It is "we have de-escalation training, and we cannot say with any confidence that it works."

That uncertainty is reasonable, because the way the field has historically delivered de-escalation training, lecture-based, one-time, measured in seat hours, and assessed with a knowledge quiz, produces training that is easy to document and hard to verify. An agency can prove its officers attended. It cannot prove its officers can do anything they could not do before.

The other five guides in this series describe what to teach: the discipline of de-escalation, mental health crisis response, the modern use-of-force framework, crisis communication, and officer wellness. This guide is different. It is about how to teach, how to audit what an agency already has, how to design or buy something better, and how to measure whether it worked. It is written for the people who make those decisions: training coordinators, training commanders, and the procurement staff who write and evaluate the contracts.

The argument running through it is simple and is supported by the evidence in Guide 1: de-escalation is a perishable skill, and skills are built by practice, feedback, and reinforcement, not by exposure to content. A modern curriculum is one that is designed and measured accordingly. Everything below follows from that.

Pull quote. "The question is not whether your agency trains de-escalation. It almost certainly does. The question is whether you can show what your officers can do afterward that they could not do before." A framing for training modernization.

2. Auditing the Training You Already Have

Modernization starts with an honest look at the current state. An agency cannot fix what it has not examined. The following is an audit framework: a structured set of questions a training coordinator can use to evaluate existing de-escalation training. It is organized into five areas.

2.1 Content audit: is the right thing being taught?

Does the curriculum reflect the current frameworks, the decision-making-model approach described in Guide 3, rather than an outdated rigid continuum?
Does it integrate de-escalation with use of force and with tactics, or teach them as separate, disconnected blocks?
Does it cover crisis recognition and mental health response, as in Guide 2, not only general communication?
Is it honest about the limits of de-escalation, or does it oversell it, the failure mode described in Guide 1?
Is the content current? When was it last revised, and against what?

2.2 Method audit: how is it being delivered?

What is the ratio of lecture to practice? An honest hour count of each, not the total.
Do officers actually practice the skill, talking, listening, deciding, or do they watch and discuss?
Is the practice conducted under realistic stress, or only in calm classroom conditions?
How many repetitions does an individual officer get? Not the class, the individual.
Does every officer get to practice, or do a few volunteers practice while the rest observe?

2.3 Reinforcement audit: does it persist?

Is de-escalation a one-time event, an academy block or an annual day, or is it reinforced across the year and the career?
Is there any refresher, any distributed practice, any structure that addresses skill decay?
Are skills reinforced through routine supervision and after-action review of real calls, or only in formal training?

2.4 Assessment audit: how do you know it worked?

How is the training currently assessed? A sign-in sheet, a knowledge quiz, a skills demonstration, or nothing?
Does assessment measure what officers know or what they can do?
Is there any rubric, any consistent criteria, or does assessment depend on the individual instructor's judgment?
Is any data collected, and if so, does anyone look at it?

2.5 Alignment audit: does it match policy and law?

Does the training match the agency's current use-of-force and de-escalation policy, or has policy moved while training stayed still, or the reverse?
Does it reflect current law and current state P.O.S.T. requirements?
Do the instructors teach what the policy says, or what they personally believe?

Quick reference: the five-area audit. Content (is the right thing taught?), Method (is it practiced, under stress?), Reinforcement (does it persist?), Assessment (do you know it worked?), Alignment (does it match policy and law?). An honest answer in all five areas tells a coordinator exactly where modernization has to start.

The most common audit finding, and a coordinator should expect it, is that the content is reasonable and the method is the problem: good material delivered as a lecture, once, with a quiz. That finding is good news, because method is fixable.

3. The Gap Between Policy and Practice

Every agency has a de-escalation policy. The policy is not the problem. The gap between the policy and what officers actually do on the street is the problem, and a modernization effort that does not name and target that gap will not close it.

3.1 Why the gap exists

The policy-practice gap is not usually caused by bad officers ignoring good policy. It is caused by structural disconnects, and naming them shows where to intervene.

Training teaches knowledge; the street demands skill. An officer can know the policy perfectly and still be unable to perform it under stress, because knowing and doing are different capacities built in different ways. A lecture closes the knowledge gap and leaves the skill gap wide open.

Skills decay; policy does not. The policy stays in the binder. The skill, untrained and unreinforced, fades. Months after the annual training, the officer's actual capability has dropped well below what the policy assumes.

Field culture overrides classroom content. New officers learn "how we really do it" from field training officers and from the informal culture of the shift. If that culture has not absorbed the policy, the classroom loses. This is why modernization cannot stop at the classroom.

The reward system contradicts the policy. If the agency's policy praises de-escalation but its informal culture and its evaluations reward fast clearance, arrest numbers, and "taking charge," officers will follow the rewards. Policy is words; incentives are behavior.

Conditions make the policy hard to follow. If policy assumes officers will slow down and call for resources, but staffing means officers are alone and rushed, the policy is asking for something the conditions do not allow.

3.2 What closes the gap

The gap closes when training builds actual skill, not just knowledge; when skill is reinforced so it does not decay; when field training and frontline supervision carry the same message as the classroom; when the agency's incentives and evaluations reward de-escalation behavior; and when leadership, per Guide 5 on trauma-informed leadership, models and protects the practice. No single one of these is sufficient. A new curriculum dropped into an agency whose culture and incentives still point the other way will not survive contact with the street.

Pull quote. "Officers do not do what the policy says. They do what they have been trained to do, reminded to do, rewarded for doing, and shown by their supervisors. A curriculum is one of five levers, and pulling only one of them moves very little." A framing of the policy-practice gap.

3.3 The implication for a coordinator

This section carries a hard message for a training coordinator: a better curriculum is necessary and it is not, by itself, sufficient. The coordinator should buy or build a better curriculum, and should also be honest with command staff that curriculum is one lever among five. The most rigorous training in the country, dropped into a culture and an incentive structure that contradict it, will produce a well-documented gap. This does not argue against better training. It argues for pairing it with the other four levers.

4. Evidence-Based Curriculum Design

If the audit in Section 2 finds, as it usually does, that method is the weak point, this section is the fix. It sets out the principles of training design that the evidence supports, drawn from adult learning research, skill-acquisition research, and the police-training evidence summarized in Guide 1.

4.1 Train skills as skills

The foundational principle. De-escalation, crisis communication, and use-of-force decision-making are skills, not bodies of knowledge. Skills are acquired by doing the thing, repeatedly, with feedback. No one learns to shoot, drive, or apply a control technique from a lecture, and de-escalation is no different. A curriculum whose dominant method is lecture is training the wrong type of capability. The single highest-value design change most agencies can make is to shift the lecture-to-practice ratio decisively toward practice.

4.2 Use deliberate practice

Not all practice is equal. Deliberate practice, the form of practice that actually builds expertise, has specific features: it targets a defined skill, it operates at the edge of current ability, it provides immediate and specific feedback, and it is repeated with correction. A scenario an officer runs once, with a vague "good job" afterward, is not deliberate practice. A scenario an officer runs, receives specific feedback on, and runs again with that feedback is. Curriculum design should build in the loop: attempt, specific feedback, correction, repeat.

4.3 Practice under realistic stress

Skills are, to a significant degree, state-dependent: a skill rehearsed only in a calm state does not transfer reliably to a high-arousal state. An officer who has practiced active listening only in a relaxed classroom discussion has not practiced the skill that matters, which is active listening while a person is screaming. Effective curricula introduce realistic emotional and time pressure into practice. This is the central rationale for scenario-based and simulation training, and it is the specific reason CodeBlu delivers practice as realistic voice scenarios rather than as reading and quizzes.

4.4 Maximize repetitions per officer

A recurring waste in police training is the role-play where two officers practice while twenty observe. Observation has some value; it is not practice. Curriculum design should be evaluated on repetitions per individual officer, not per class. This is one of the structural advantages of simulation and AI-driven practice: every officer can run many scenarios, on their own schedule, without waiting for a role-play partner, an instructor, or a room. An agency comparing options should ask, bluntly, how many real repetitions each officer gets.

4.5 Space practice over time

Skills decay, and massed practice, everything crammed into one day, produces a spike that fades fast. Distributed practice, the same total time spread across weeks and months, produces better retention. This argues against the one-time annual training day and in favor of shorter, repeated practice across the year. It is also an argument for a delivery method that makes frequent short practice logistically possible, which classroom-and-instructor models struggle with and on-demand models support.

4.6 Make it relevant and respect the adult learner

Adult learners, and experienced officers especially, engage with training that is clearly relevant to their actual job and that respects their experience. Scenarios should reflect the calls officers actually run, the ordinary welfare check and disorderly call as much as the dramatic event. Training that feels disconnected from the job, or that talks down to veteran officers, is tuned out regardless of its content. CodeBlu's scenario library is built around routine, recognizable calls for this reason.

4.7 Teach a decision-making process, not a script

As Guide 3 and Guide 4 both argue, real encounters do not follow scripts. A curriculum should teach a transferable decision-making process, such as the Critical Decision-Making Model, and the underlying principles of communication, so officers can generate the right response to a situation the curriculum never specifically showed them. A curriculum that teaches scripts produces officers who can handle the scripted scenario and freeze on the one next door.

4.8 Integrate, do not silo

The encounter does not announce which topic applies. De-escalation, crisis recognition, communication, use-of-force decision-making, and officer self-regulation all happen at once in a real call. Curriculum that teaches each as an isolated block, with separate instructors and no connective tissue, leaves the integration to the officer. Better curricula use scenarios that exercise several domains at once, which is how the field actually presents them.

Quick reference: evidence-based design principles. Train skills as skills (practice, not lecture). Use deliberate practice (targeted, edge-of-ability, immediate feedback, repeated). Practice under realistic stress. Maximize repetitions per individual officer. Space practice across time. Keep it relevant and respect the adult learner. Teach a decision process, not a script. Integrate domains rather than siloing them.

5. Measurement and Assessment

A curriculum that cannot be measured cannot be improved, defended, or proven. Yet measurement is the weakest part of de-escalation training at most agencies, and it is the part procurement staff and chiefs most need to strengthen, because it is what turns a training expense into a defensible, demonstrable investment.

5.1 Why "hours trained" is not a measure of anything

The most common training metric is hours delivered. Hours are easy to count and easy to report to a state, and they measure input, not outcome. An agency that delivered forty hours of de-escalation training knows it delivered forty hours. It does not know whether a single officer can de-escalate better than before. Hours are a compliance metric. They are not a performance metric, and they should never be mistaken for one.

5.2 A levels-of-evaluation framework

A useful structure for thinking about training measurement, drawn from established training-evaluation models such as the Kirkpatrick framework, distinguishes several levels of evidence, each harder to capture and more meaningful than the last.

Reaction: did officers find it useful? The easiest to collect, via a post-course survey. It is real information, officers who found training irrelevant disengage, but it measures satisfaction, not skill.
Learning: did officers gain knowledge or skill? This requires a before-and-after comparison. For a skill, this means a skills assessment, not only a knowledge quiz. A knowledge quiz measures the knowledge level; it does not measure whether the officer can perform.
Behavior: are officers doing it differently on the job? The level that matters most and is hardest to capture. It requires looking at real-world behavior: field observation, supervisor assessment, body-worn-camera review, and after-action review of actual calls.
Results: did agency outcomes change? The ultimate question: did use-of-force incidents, injuries to officers and citizens, complaints, or related outcomes move? This is where the Louisville ICAT evaluation in Guide 1 operated, and it is the level that justifies the whole effort.

A serious measurement program collects more than one level, and is honest that the higher levels are harder and slower. An agency that only ever measures reaction and hours is, in evaluation terms, measuring almost nothing about effectiveness.

5.3 Assess skill with a rubric

If de-escalation is a skill, it can and should be assessed against a rubric, the same way a defensive-tactics or firearms qualification is assessed against defined standards. A rubric defines observable behaviors across the dimensions that matter, communication, assessment and decision-making, tactical safety and positioning, empathy, and problem-solving, and rates each on a defined scale. A rubric does three things a vague judgment cannot: it makes assessment consistent across different evaluators, it makes feedback specific enough to act on, and it makes progress measurable over time. CodeBlu's after-action review is built as exactly this kind of structured, multi-dimension rubric, applied consistently to every scenario an officer runs.

5.4 Measure the individual, longitudinally

Class-level and event-level measurement hides the individuals who most need attention. The officer who is struggling is invisible in a class average. Modern training systems can track an individual officer's performance across many practice repetitions over time, which turns assessment from a single pass-or-fail gate into a development record: this officer has improved on communication, still struggles with reassessment under time pressure, and needs targeted work there. That longitudinal, individual picture is far more useful to a coordinator, and far more defensible to a reviewer, than a stack of completion certificates.

5.5 Use the data, and be honest about it

Measurement only matters if someone acts on it. Assessment data should feed back into the curriculum, weak areas across many officers point to a curriculum fix, and into supervision, an individual's pattern points to coaching. An agency should also be honest about what its data can and cannot show. A correlation between a training rollout and a drop in use of force is encouraging and is not, by itself, proof of causation, which is exactly why the randomized design of the Louisville study mattered. A coordinator who presents results honestly, including the uncertainty, builds more credibility than one who overclaims.

Quick reference: measurement. Hours trained is a compliance metric, not a performance metric. Evaluate at multiple levels: reaction, learning, behavior, and results. Assess skill against a rubric, not a knowledge quiz. Track individuals longitudinally, not just class averages. Feed the data back into curriculum and supervision. Report results honestly, including what the data cannot prove.

6. A Procurement Guide for Training Providers

Most agencies will buy at least part of their de-escalation training rather than build it. Procurement is therefore a training-quality decision, and the people writing and scoring the contracts are, in effect, making curriculum decisions. This section is a practical guide to evaluating a training provider. It applies to any provider, including CodeBlu, and an agency should hold every vendor, this one included, to the questions below.

6.1 Questions about evidence

What is the evidence base for your program specifically? Not "de-escalation is evidence-based" in general, but evidence for this provider's product. Be appropriately skeptical of a vendor who cites the field's research as if it were their own.
Has your program been evaluated independently? By whom, with what design, with what result? An honest vendor will distinguish between an independent evaluation and an in-house testimonial.
What outcomes do you claim, and what is the basis? A vendor claiming a specific percentage reduction in use of force should be able to say exactly where that number comes from and how it was measured.
What do you not claim? A vendor who admits the limits of their product, and the limits of de-escalation training generally, per Guide 1, is showing a kind of credibility a vendor who promises everything is not.

6.2 Questions about method

How much of the officer's time is practice versus passive content? Apply Section 4. A product that is mostly video and quiz is mostly the old model in new packaging.
How many real repetitions does each officer get? Per individual, not per class.
Does the practice involve realistic stress and genuine interaction, or is it recognition and multiple choice?
Does it teach a decision-making process or a script?
How does it handle reinforcement and spacing over time, or is it another one-time event?

6.3 Questions about assessment

How does the product assess skill? Against a rubric? On what dimensions? Does it measure doing or only knowing?
What data does the agency get back? Individual and longitudinal, or just completion records?
Can the agency see who is struggling and on what?
Does the assessment data support the agency's own measurement and accreditation needs?

6.4 Questions about fit, integrity, and operations

Does it align with our policy, our state P.O.S.T. requirements, and our state law? A national product may need configuration to match local policy and statutes. Ask how.
Can it be customized to the agency's own policies, scenarios, and context?
What are the honest total costs, including setup, per-seat or licensing fees, configuration, administration, and the officer time required?
What are the logistics? Scheduling, technology, infrastructure, support, and the burden on the training unit.
How are claims about other organizations and frameworks stated? A provider should credit the public frameworks it draws on accurately and should not claim partnership, certification, or endorsement it does not have. A vendor who overstates its relationships with respected institutions is telling the agency something about how it handles the truth.
Data, privacy, and security. Officer performance data is sensitive. Where is it stored, who can see it, how is it protected, and who owns it?

6.5 The integration point

The strongest position for an agency is usually not buying a single product as the entire training program, and not rejecting outside products to build everything in-house. It is integration: a clear-eyed agency curriculum, owned by the agency, into which bought components are fitted where they are strongest. A provider that delivers high-repetition, stress-realistic, well-assessed practice is solving the method problem the audit usually identifies. The agency still owns content currency, policy alignment, reinforcement, field-training and supervision consistency, and the incentive structure from Section 3. A good vendor relationship is honest about that division of labor. A vendor who claims to be the entire solution is overclaiming.

Pull quote. "Hold every training vendor, including the one you are leaning toward, to the same questions: what is the evidence for your product specifically, how much of it is real practice, how do you measure skill, and what do you not claim. A provider worth buying will welcome those questions." A procurement principle for de-escalation training.

7. Putting It Together: A Modernization Sequence

The preceding sections can be assembled into a practical sequence for a coordinator leading a modernization effort. It is offered as a sensible order of operations, not as a mandate.

Step 1: Audit honestly. Run the five-area audit from Section 2. Resist the urge to grade the agency generously. The audit's value is entirely in its honesty.

Step 2: Name the gap. Apply Section 3. Identify where the policy-practice gap lives in this agency: is it skill, reinforcement, field culture, incentives, or conditions? Usually it is several.

Step 3: Fix method first. Most audits find content is acceptable and method is the failure. Shifting from lecture to deliberate, stress-realistic, high-repetition, spaced practice is typically the highest-value change, and Section 4 is the design specification.

Step 4: Build measurement in from the start. Do not bolt assessment on afterward. Decide, before rollout, what will be measured and at which levels (Section 5), so there is a baseline to compare against.

Step 5: Buy deliberately. Where the agency buys components, use the Section 6 questions on every vendor, and integrate purchased components into an agency-owned curriculum rather than outsourcing the curriculum itself.

Step 6: Align the other levers. Curriculum is one of five levers. Work with command to align field training, frontline supervision, the incentive and evaluation structure, and leadership modeling, or the new curriculum will not survive contact with the street.

Step 7: Reinforce and iterate. Treat the curriculum as a living thing. Use the measurement data to revise it, keep practice distributed across the year and the career, and re-audit periodically.

Quick reference: the modernization sequence. Audit honestly. Name the gap. Fix method first. Build measurement in from the start. Buy deliberately. Align the other four levers. Reinforce and iterate. A curriculum is a process the agency owns, not a product it installs once.

8. Quick Reference Summary

The one-page version.

The real problem is rarely the absence of de-escalation training. It is training an agency cannot prove works, because it is lecture-based, one-time, and measured in hours.

Audit five areas: Content, Method, Reinforcement, Assessment, Alignment. The usual finding is that content is fine and method is broken, which is good news, because method is fixable.

The policy-practice gap is structural: training builds knowledge not skill, skills decay, field culture overrides the classroom, incentives contradict the policy, and conditions make the policy hard to follow. A better curriculum is one of five levers.

Evidence-based design: train skills as skills with deliberate practice, under realistic stress, with maximum repetitions per individual officer, spaced across time, relevant to real calls, teaching a decision process rather than a script, integrating domains.

Measurement: hours trained proves nothing about performance. Evaluate at the reaction, learning, behavior, and results levels. Assess skill against a rubric. Track individuals longitudinally. Report honestly, including what the data cannot prove.

Procurement: hold every vendor, including the favored one, to the same questions: evidence for this product specifically, how much is real practice, how skill is measured, what they do not claim, and how honestly they describe their relationships with other organizations.

Modernization sequence: audit honestly, name the gap, fix method first, build in measurement, buy deliberately, align the other levers, reinforce and iterate.

9. Bibliography

Publicly available sources. Attributions require SME confirmation of editions, dates, and exact wording before customer release.

President's Task Force on 21st Century Policing. Final Report. Office of Community Oriented Policing Services, 2015.
Police Executive Research Forum. ICAT: Integrating Communications, Assessment, and Tactics. A Training Guide for Defusing Critical Incidents. PERF, 2016.
Engel, R. S., McManus, H. D., and Herold, T. D. "Does de-escalation training work? A systematic review and call for evidence in police use-of-force reform." Criminology & Public Policy, vol. 19, no. 3, 2020.
Engel, R. S., Corsaro, N., Isaza, G. T., and McManus, H. D. "Assessing the impact of de-escalation training on police behavior: Reducing police use of force in the Louisville, KY Metro Police Department." Criminology & Public Policy, 2022.
Kirkpatrick training-evaluation model and subsequent training-evaluation literature.
Research on deliberate practice and skill acquisition, associated with K. Anders Ericsson and others.
Adult learning theory and instructional-design literature applied to professional training.
State Peace Officer Standards and Training (P.O.S.T.) in-service training requirements.

Related CodeBlu guides: The Modern Officer's Guide to De-Escalation | Mental Health Crisis Response | Use of Force | Crisis Communication | The Officer's Wellness Imperative

Related CodeBlu scenarios: the scenario library is the practice layer this guide describes: high-repetition, stress-realistic, voice-based, and assessed against a consistent rubric.

Table of Contents