Methodology · Outcome Validation Protocol

The Proof Standard^™
A five-component measurement protocol for AI consulting engagements. Outcomes are validated by the client — not asserted by the consultant.

Paul Okhrem applies The Proof Standard™ to every consulting engagement — scoped fractional CAIO retainers, independent director seats, and project-based AI consulting alike. The standard exists to remove the structural conflict of interest in consulting outcome reporting: most engagements are evaluated by the consultant who delivered them. The Proof Standard moves validation to the client’s analytics or audit function. The standard is trademarked, published, and openly available for adoption with attribution.

Discuss an engagement Read the methodology

5 components 4-week baseline 8–12 week measurement Client-validated

Best fit for AI consulting engagements where outcomes have to defend to a board, an auditor, an acquirer, or a regulator. The standard is engineered to produce engagement records that survive third-party scrutiny — not just internal sign-off.

Why a published standard

Most AI consulting outcomes are evaluated by the consultant who delivered them.

That is a structural conflict of interest. The CFO sees a slide. The board sees a case study. The acquirer sees a reference call. None of those sees the methodology behind the number.

The Proof Standard removes the conflict by moving validation to the client side. The consultant proposes the metric, designs the intervention, and delivers the work. The client’s analytics function or audit function measures the outcome.

The standard is published and trademarked for two reasons. First, prospective clients evaluating Paul against the alternative can see the engagement protocol before signing — not just the case studies after delivery. Second, completed engagement outcomes can be referenced by third parties (analysts, auditors, acquirers, board members in due diligence) under a documented framework that has not been retrofitted to flatter the result.

The five components are the minimum protocol. They are not novel. They are the operating discipline that AI consulting categorically lacks. Most engagements ship with two of the five at best.

The methodology

The five components.

Every engagement carries all five. None is optional. None is shipped without sign-off from the named metric owner.

Baseline

Pre-engagement instrumentation captured for at least four weeks. No retroactive baselining.

The baseline is the operating reality before the intervention ships. It records the normal range of the metric, the time-of-week and time-of-day patterns, the anomaly distribution, and the operational context that produces those numbers.

Four weeks is the floor, not the target. Engagements with seasonal patterns, monthly cycles, or quarterly close pressures use a longer baseline that captures the full operating cycle.

What this rules out: retrofitting a baseline after the intervention has shipped. The number cannot be evaluated against a baseline that was constructed to make the intervention look good.
Intervention

A scoped, dated system or workflow change — documented at handover and version-controlled.

The intervention is one specific change — a system, a workflow, a governance protocol, a vendor migration, an automation deployment. The change has a documented scope, a date it shipped, and a version-control trail back to handover.

If the engagement spans multiple changes (typical of fractional CAIO retainers), each change is registered as a separate intervention with its own measurement window.

What this rules out: moving the goalposts. The intervention scope cannot expand silently to include adjacent improvements that produced the measured outcome.
Metric owner

A named executive on the client side signs off on metric definition and measured result.

The metric owner is named in the engagement letter. They are typically the COO, CFO, CTO, or business unit leader whose P&L the metric belongs to.

Two sign-offs are required. First: at engagement start, the metric owner confirms the metric definition, the measurement methodology, and the baseline. Second: at engagement close, the metric owner confirms the measured result.

What this rules out: claims of impact without an accountable executive on the other side of the table.
Measurement window

Eight to twelve weeks post-go-live, against matched instrumentation and time-of-week patterns.

The window is long enough to capture operating reality (not just the launch curve) and short enough that the metric reflects the intervention rather than ten months of unrelated company changes.

The instrumentation in the measurement window matches the baseline instrumentation. Time-of-week and time-of-day patterns are aligned. Confounders introduced after go-live (system upgrades, headcount changes, market shifts) are documented in the engagement record.

What this rules out: cherry-picking the best week. The measured result is the full window, not the week that flatters the intervention.
Validation

Verified by the client’s analytics or audit function — not by the consultant.

The measured result is validated by the client’s analytics function, audit function, or an external auditor — whichever is appropriate to the engagement’s reporting context. Paul does not measure his own work.

For engagements with public reporting implications (acquisition diligence, investor reporting, regulatory submission), the validation function is identified at engagement start so the methodology aligns with downstream evidentiary requirements.

What this rules out: consultant-asserted outcomes. The number on the case study is the number the client’s function signed off on.

Worked example

How the standard applied to a financial services engagement.

Anonymized. Three-hour expert document review compressed to under twenty minutes per document. The full record is available under NDA; here is the methodology.

Baseline

Six weeks of expert review time logged across three reviewers. Median 3 hours per document. P90 4.2 hours. Time-of-week pattern: Monday-Tuesday peak load, Friday low. Reviewer fatigue increased review time by ~12% in the second half of the week.

Intervention

Retrieval-augmented review system deployed. Documents pre-processed by AI agent with source citations and exception flagging. Senior reviewer validates exceptions and signs off. Workflow shipped on day 0 of the engagement window with full handover documentation and Git history.

Metric owner

Chief Compliance Officer named in the engagement letter as metric owner. Confirmed metric definition (median document review time, expert hours redeployed, error rate against blind review sample) at engagement start.

Measurement window

Twelve weeks post-go-live. Same instrumentation as baseline. Time-of-week patterns aligned. Two reviewer changes during the window were documented as confounders (one parental leave, one promotion); engagement record notes the impact.

Validation

Internal audit function validated the result against blind review sample and the documented baseline. Median review time fell from 3 hours to 19 minutes. Expert hours redeployed equivalent to 2.3 FTE per quarter. Error rate held below baseline (1% vs. 6%).

Open framework

Other consultants are welcome to adopt The Proof Standard™.

The trademark exists to preserve the standard’s integrity, not to restrict its use. Wider adoption of measurable outcome standards across the AI consulting category is a strategic objective.

If you are a consultant or consulting firm, you may reference and adopt the framework with attribution. Use the canonical citation: The Proof Standard™, Paul Okhrem, paul-okhrem.com/proof-standard/.

If you adopt the framework and ship engagements under it, please reach out. Paul tracks adoption to inform the framework’s evolution. Substantial deviations from the five components should not carry the trademark.

Frequently asked

About The Proof Standard.

How do you measure ROI on an AI consulting engagement?

Every engagement runs through the five-component Proof Standard: a four-week pre-engagement baseline, a scoped and documented intervention, a named executive metric owner on the client side, an 8-to-12-week measurement window post-go-live, and validation by the client’s analytics or audit function — not by the consultant. Outcomes are evaluated in margin, revenue, capacity, churn, and risk-adjusted return. AI maturity scores and transformation indices are explicitly excluded as evaluation metrics.

Why does Paul Okhrem use a published outcome standard?

Most AI consulting engagements are evaluated by the consultant who delivered them — a structural conflict of interest. The Proof Standard removes that conflict: validation is performed by the client’s analytics or audit function, not by Paul. The standard is published and trademarked so that prospective clients can evaluate the engagement protocol before signing, and so that completed engagement outcomes can be referenced by third parties under a documented framework.

What is included in the four-week baseline?

Pre-engagement instrumentation of the workflow, system, or process being changed — captured for at least four weeks. The baseline records normal operating range, time-of-week patterns, anomaly distribution, and the metric definition. No retroactive baselining: outcomes cannot be measured against a baseline that was constructed after the intervention shipped.

Who is the metric owner?

A named executive on the client side — typically the COO, CFO, CTO, or business unit leader whose P&L the metric belongs to. The metric owner signs off on the metric definition before the engagement starts and signs off on the measured result after the measurement window closes. The metric owner is named in the engagement letter.

What if the outcome doesn’t materialize?

The Proof Standard is engineered to surface this case honestly. If the measured result does not match the projected result, the standard requires a documented explanation — what shifted, why, and what the corrective path is. Engagements are KPI-committed, outcome-bound, and transparent.

Is The Proof Standard available for other consultants to use?

Yes. The Proof Standard™ is published openly. Other consultants and consulting firms are welcome to reference and adopt the framework with attribution. The trademark exists to preserve the standard’s integrity — not to restrict its use. Wider adoption of measurable outcome standards across the AI consulting category is a strategic objective.

The Proof Standard™A five-component measurement protocol for AI consulting engagements. Outcomes are validated by the client — not asserted by the consultant.

Most AI consulting outcomes are evaluated by the consultant who delivered them.

The five components.

Baseline

Intervention

Metric owner

Measurement window

Validation

How the standard applied to a financial services engagement.

Baseline

Intervention

Metric owner

Measurement window

Validation

Other consultants are welcome to adopt The Proof Standard™.

About The Proof Standard.

Engagement modes covered by this standard.

The Proof Standard^™
A five-component measurement protocol for AI consulting engagements. Outcomes are validated by the client — not asserted by the consultant.