Your Plant Doesn’t Have a Reliability Problem. It Has a Definition Problem.

Operational ExcellenceMaintenance ManagementMaintenance Strategyoperations leadership

May 24

Written By Kyle Gredvig

Most plants don’t have a reliability problem.

They have a definition problem. (And they try to solve it with a binder. Naturally.)

You can’t “prioritize work” when everyone’s using different math in their head.

So the system drifts, the backlog lies, and the same bad actor kicks the door in every week.

The real problem (and why “Paper Lean” keeps winning)

If your reliability program lives in a PowerPoint deck and a 400-page SOP library, it’s not a program.

It’s a costume.

Here’s the reality: the floor doesn’t care what you intended.

It only responds to what you defined, ranked, executed, and verified.

So understand: operational excellence is not “doing more maintenance.”

It’s running a repeatable system that turns chaos into priorities—and priorities into results.

Phase 1: The Foundation (Definition & Alignment)

Before you touch an asset, you set the Rules of Engagement.

Because “high impact” is meaningless until you force it to mean something.

Inputs

Site-specific financial data (margin per unit)
Labor rates
Safety protocols

Tool Needed

Impact Definition Table

Activity

Define “Low / Medium / High” for two pillars:

Operational Loss (time thresholds)
- Example: Low < 2 hrs, Med 2–8 hrs, High > 8 hrs
Repair Cost (dollar thresholds)
- Example: Low < $500, Med $500–$5k, High > $5k

Output

A localized Risk Strategy Matrix everyone agrees on (yes, even Production).

This is where the grown-up work starts.

Because once the thresholds are set, excuses get louder (which is how you know you did it right).

Phase 2: Problem Identification & Ranking

You don’t fix everything at once.

You fix what matters.

Inputs

Maintenance logs (SAP/CMMS)
Operator “nuisance” reports
Scrap/rework data

Tool Needed

The Problem Ranking Matrix

Activity

List your known issues (example: “Heat Sealer Bad Seals”).

Then plot them by Frequency (Weekly/Monthly/Yearly) vs Impact (from Phase 1).

Output

A prioritized Bad Actor List—your real to-do list.

This is how you stop letting the loudest person in the morning meeting run your maintenance strategy. (A bold concept.)

Phase 3: Root Cause Analysis (RCA)

Now you take the #1 item and you do the work nobody wants to do: you ask “why” until the story gets inconvenient.

Inputs

The top bad actor (not the whole zoo)

Tools Needed

Fishbone (Ishikawa) Template
5-Why Worksheet

Activity

Brainstorm across categories: Man, Machine, Method, Material, Measurement, Mother Nature.

Example (Heat Sealer Bad Seals):

Machine: heat tape worn / heating element drift
Method: conveyor speed mismatch
Material: bag contamination or film variation

Output

Identified Root Causes (not symptoms)

If your RCA ends with “operator error,” congratulations—you just built a blame machine.

Real RCA produces causes you can engineer out, standardize out, or train out.

Phase 4: Strategy Assignment & Tool Selection

Once you know the root cause, you choose the prevention tool.

Not your favorite tool. The right tool.

Inputs

Root causes
Implementation factors (lead time, complexity, cost)

Tool Needed

Strategy Selection Logic Tree

Activity

Pick the best strategy:

PM (Preventative): time-based replacement (swap heat tape every 30 days)
PdM (Predictive): technology-based monitoring (IR sensor on sealer temp)
Operator Care: inspection/cleaning standard (clean bag area every 4 hours)
Redesign: engineering change (upgrade element, shielding, guides, controls)

Output

An updated Asset Maintenance Plan

This is where “calendar roulette” dies—because you stop scheduling work by tradition and start scheduling it by logic. (The plant might actually survive this.)

Phase 5: Execution & Loop-Back

This is the part that separates frameworks from fairy tales.

Inputs

The new PM/inspection tasks

Tools Needed

Standard Operating Procedure (SOP) Templates
Feedback Loop Sheet

Activity

Train operators/maintenance on the new strategy.

Then watch the frequency.

Did “Weekly” become “Quarterly”?

Or did the team just get better at re-labeling the same defect?

Output

A Continuous Improvement (CI) Report that proves the loop closed.

If you don’t loop back, you didn’t improve anything.

You just moved the problem to a new folder.

The “Master Tool Bag” (what to build next)

If you want this to run without heroics, you need a small set of repeatable tools:

Site Input Sheet
- Captures the “ballpark” financial/time thresholds so Phase 1 is real.
Risk Ranking Matrix
- Your 3x3 or 5x5 grid that makes prioritization visible (and arguable in daylight).
RCA Workbook
- Fishbone + 5-Why in one place—so RCA doesn’t die in someone’s notebook.
Strategy Assist Sheet
- A cheat sheet that helps leaders choose PM vs inspection vs redesign based on lead time and complexity.

This replaces the binder religion with something better: a system that actually runs.

Common failure modes (how plants sabotage this)

Vague thresholds (“high impact” = “I feel like it”)
Ranking by volume instead of frequency x impact
RCA theater (meetings that output “retrain operators” and call it a day)
Tool favoritism (PM everything, because… tradition)
No enforcement loop (SOPs exist, but reality doesn’t comply)

Paper Lean loves these mistakes.

They keep the slide decks looking good while the bearings keep dying.

What to do this week (3–7 actions)

Build a one-page Impact Definition Table with time + cost thresholds.
Get Production + Maintenance to sign off on the Risk Strategy Matrix (yes, sign off).
Pull your top 20 chronic issues and create a Problem Ranking Matrix.
Select the #1 bad actor and run Fishbone + 5-Why until you have engineerable causes.
Use a Strategy Selection Logic Tree to choose the prevention strategy (document why).
Update the Asset Maintenance Plan and create the SOP + inspection points.
Set a 30-day loop-back: prove the frequency moved—or admit it didn’t.

Mic drop: Reliability isn’t a program you announce—it’s a system you enforce.

ReliabilityMaintenance StrategyAsset ManagementOperational ExcellenceCMMSroot cause analysis

Kyle Gredvig

Your Plant Doesn’t Have a Reliability Problem. It Has a Definition Problem.

The real problem (and why “Paper Lean” keeps winning)

Phase 1: The Foundation (Definition & Alignment)

Phase 2: Problem Identification & Ranking

Phase 3: Root Cause Analysis (RCA)

Phase 4: Strategy Assignment & Tool Selection

Phase 5: Execution & Loop-Back

The “Master Tool Bag” (what to build next)

Common failure modes (how plants sabotage this)

What to do this week (3–7 actions)

You Can’t Copy-Paste Grit

How to Lead with Strategic Friction (Without Becoming a Jerk)

The Manufacturing Mix