Pre/post and matched-cohort testing for executive coaching ROI
by Mentor Group

If you want credible evidence that executive coaching is working, you need a fair comparison and transparent maths. This guide shows how to design a simple pre/post test with a matched cohort, convert improvements into value, and report results in a way leaders accept.
It stays light on statistics but grounded in good practice. For context, research syntheses find that executive coaching improves goal focus, self‑efficacy and resilience—the mechanisms that drive execution and, in turn, business outcomes. See Frontiers in Psychology (2023 meta‑analysis) and Theeboom et al. (2013).
1) What you’re testing (and why)
You’re asking a practical question: “Did coached leaders improve more than similar, non‑coached leaders over the same period?” That is the essence of a pre/post comparison with a matched cohort.
Matching reduces bias by making the coached and comparison groups look alike on key features (e.g., tenure, team size, market). If you can collect two time points (before and after) for both groups, you can also apply a simple difference‑in‑differences view to estimate how much extra change is associated with coaching.
2) Step‑by‑step design
Step A — Define the outcomes and the line of sight
Pick 3–5 leading indicators (e.g., coaching cadence/quality, decision cycle time, practice telemetry, forecast hygiene, psychological capital) and 1–2 lagging outcomes (e.g., win rate, forecast error, retention).
Step B — Select cohorts
Choose the coached cohort (e.g., 10–20 leaders) and a comparison cohort of similar size. Match on tenure, team size, region/market, and baseline performance. Propensity score matching is helpful when you have many factors; see UCLA’s overview of propensity score matching
Step C — Baseline
Collect 8–12 weeks of pre‑coaching data for both cohorts on each metric. Freeze definitions and data sources.
Step D — Intervention window
Run coaching for 12–16 weeks. Instrument behaviour changes lightly (e.g., 1:1 cadence %, decision cycle time, practice reps, hygiene checks, short self‑efficacy/resilience scales). For evidence that these capacities move with coaching, see Frontiers (2023 RCT meta‑analysis)
Step E — Post‑window measurement
Collect the same metrics again for both cohorts over the final 4–8 weeks of the intervention window.
Step F — Analysis
1) Compute pre → post change for each person; then average by cohort.
2) Estimate the coaching effect as (change_coached − change_comparison). This is the intuition behind difference‑in‑differences; see Columbia’s methods guide.
3) Sanity‑check with plots (before/after by cohort) and commentary on any confounders (e.g., territory changes).
Step G — Convert to value (agreed “money bridges”)
- Win rate → bookings (e.g., a +3–4 point lift at steady volume is material).
- Forecast error → resource allocation (less wasted pursuit time, better prioritisation).
- Retention → avoided replacement cost (fees, ramp time, lost productivity).
- Time saved → capacity value (hours × value per hour).
3) A small worked example
Suppose the coached cohort’s median win rate moves from 22% → 26% (+4 pts) while the comparison cohort moves 22% → 23% (+1 pt). The estimated coaching effect is +3 pts. On a £8m yearly pipeline at a £40k average deal size, +3 pts equates to roughly £240k incremental bookings (directional illustration).
Add value from improved forecast hygiene (fewer pushed deals) and time saved from faster decisions. Keep all assumptions visible in a single table.
4) Reporting (one page, monthly)
- Scope: who’s in each cohort and why they were matched that way.
- Movement: leading indicators → lagging outcomes (simple charts).
- Attribution: pre/post + matched cohort; add a difference‑in‑differences view.
- Benefits: value via agreed conversions; ROI and payback maths.
- Risks/notes: any confounders or data caveats.
5) Common pitfalls (and fixes)
- Changing metric definitions mid‑stream → Freeze definitions at baseline and don’t backfill.
- Unbalanced cohorts → Tighten matching or re‑weight; document any unavoidable differences.
- One‑off shocks (pricing change, territory split) → Note them and, if possible, exclude outlier periods from the post window.
- Over‑claiming causality → Present the estimate with plain‑English caveats and triangulate with qualitative evidence (manager notes, call reviews).
6) Governance and privacy
Keep data access limited, aggregate sensitive people metrics, and follow your DPA/ISO processes. Be explicit about purpose: helping leaders make better decisions and creating value responsibly.
Bottom Line
Q: What is a pre/post matched-cohort test for coaching impact?
A: It compares change over time for coached leaders against a similar, non‑coached group. Matching reduces bias; a difference‑in‑differences view estimates the additional change associated with coaching.
Q: How do we select a fair comparison group?
A: Match on tenure, team size, market and baseline performance. Where many factors exist, propensity score matching helps create balance between cohorts.
Q: What data should we collect and for how long?
A: Capture 8–12 weeks of baseline data, then 12–16 weeks during coaching, and repeat measures in the final 4–8 weeks. Track leading indicators (cadence, decision cycle time, practice, hygiene, psychological capital) and lagging outcomes (win rate, forecast error, retention).
Q: How do we convert improvements into financial value?
A: Agree conversions with Finance: win rate to bookings, forecast error to resource allocation, retention to avoided replacement cost, and time saved to capacity value.
Q: What pitfalls should we avoid?
A: Changing metric definitions mid‑stream, unbalanced cohorts, one‑off shocks during the window, and over‑claiming causality. Freeze definitions, document differences, note shocks, and present estimates with clear caveats.