Your Plant Doesn’t Have a Reliability Problem. It Has a Definition Problem.
Most plants don’t have a reliability problem.
They have a definition problem. (And they try to solve it with a binder. Naturally.)
You can’t “prioritize work” when everyone’s using different math in their head.
So the system drifts, the backlog lies, and the same bad actor kicks the door in every week.
The real problem (and why “Paper Lean” keeps winning)
If your reliability program lives in a PowerPoint deck and a 400-page SOP library, it’s not a program.
It’s a costume.
Here’s the reality: the floor doesn’t care what you intended.
It only responds to what you defined, ranked, executed, and verified.
So understand: operational excellence is not “doing more maintenance.”
It’s running a repeatable system that turns chaos into priorities—and priorities into results.
Phase 1: The Foundation (Definition & Alignment)
Before you touch an asset, you set the Rules of Engagement.
Because “high impact” is meaningless until you force it to mean something.
Inputs
Site-specific financial data (margin per unit)
Labor rates
Safety protocols
Tool Needed
Impact Definition Table
Activity
Define “Low / Medium / High” for two pillars:
Operational Loss (time thresholds)
Example: Low < 2 hrs, Med 2–8 hrs, High > 8 hrs
Repair Cost (dollar thresholds)
Example: Low < $500, Med $500–$5k, High > $5k
Output
A localized Risk Strategy Matrix everyone agrees on (yes, even Production).
This is where the grown-up work starts.
Because once the thresholds are set, excuses get louder (which is how you know you did it right).
Phase 2: Problem Identification & Ranking
You don’t fix everything at once.
You fix what matters.
Inputs
Maintenance logs (SAP/CMMS)
Operator “nuisance” reports
Scrap/rework data
Tool Needed
The Problem Ranking Matrix
Activity
List your known issues (example: “Heat Sealer Bad Seals”).
Then plot them by Frequency (Weekly/Monthly/Yearly) vs Impact (from Phase 1).
Output
A prioritized Bad Actor List—your real to-do list.
This is how you stop letting the loudest person in the morning meeting run your maintenance strategy. (A bold concept.)
Phase 3: Root Cause Analysis (RCA)
Now you take the #1 item and you do the work nobody wants to do: you ask “why” until the story gets inconvenient.
Inputs
The top bad actor (not the whole zoo)
Tools Needed
Fishbone (Ishikawa) Template
5-Why Worksheet
Activity
Brainstorm across categories: Man, Machine, Method, Material, Measurement, Mother Nature.
Example (Heat Sealer Bad Seals):
Machine: heat tape worn / heating element drift
Method: conveyor speed mismatch
Material: bag contamination or film variation
Output
Identified Root Causes (not symptoms)
If your RCA ends with “operator error,” congratulations—you just built a blame machine.
Real RCA produces causes you can engineer out, standardize out, or train out.
Phase 4: Strategy Assignment & Tool Selection
Once you know the root cause, you choose the prevention tool.
Not your favorite tool. The right tool.
Inputs
Root causes
Implementation factors (lead time, complexity, cost)
Tool Needed
Strategy Selection Logic Tree
Activity
Pick the best strategy:
PM (Preventative): time-based replacement (swap heat tape every 30 days)
PdM (Predictive): technology-based monitoring (IR sensor on sealer temp)
Operator Care: inspection/cleaning standard (clean bag area every 4 hours)
Redesign: engineering change (upgrade element, shielding, guides, controls)
Output
An updated Asset Maintenance Plan
This is where “calendar roulette” dies—because you stop scheduling work by tradition and start scheduling it by logic. (The plant might actually survive this.)
Phase 5: Execution & Loop-Back
This is the part that separates frameworks from fairy tales.
Inputs
The new PM/inspection tasks
Tools Needed
Standard Operating Procedure (SOP) Templates
Feedback Loop Sheet
Activity
Train operators/maintenance on the new strategy.
Then watch the frequency.
Did “Weekly” become “Quarterly”?
Or did the team just get better at re-labeling the same defect?
Output
A Continuous Improvement (CI) Report that proves the loop closed.
If you don’t loop back, you didn’t improve anything.
You just moved the problem to a new folder.
The “Master Tool Bag” (what to build next)
If you want this to run without heroics, you need a small set of repeatable tools:
Site Input Sheet
Captures the “ballpark” financial/time thresholds so Phase 1 is real.
Risk Ranking Matrix
Your 3x3 or 5x5 grid that makes prioritization visible (and arguable in daylight).
RCA Workbook
Fishbone + 5-Why in one place—so RCA doesn’t die in someone’s notebook.
Strategy Assist Sheet
A cheat sheet that helps leaders choose PM vs inspection vs redesign based on lead time and complexity.
This replaces the binder religion with something better: a system that actually runs.
Common failure modes (how plants sabotage this)
Vague thresholds (“high impact” = “I feel like it”)
Ranking by volume instead of frequency x impact
RCA theater (meetings that output “retrain operators” and call it a day)
Tool favoritism (PM everything, because… tradition)
No enforcement loop (SOPs exist, but reality doesn’t comply)
Paper Lean loves these mistakes.
They keep the slide decks looking good while the bearings keep dying.
What to do this week (3–7 actions)
Build a one-page Impact Definition Table with time + cost thresholds.
Get Production + Maintenance to sign off on the Risk Strategy Matrix (yes, sign off).
Pull your top 20 chronic issues and create a Problem Ranking Matrix.
Select the #1 bad actor and run Fishbone + 5-Why until you have engineerable causes.
Use a Strategy Selection Logic Tree to choose the prevention strategy (document why).
Update the Asset Maintenance Plan and create the SOP + inspection points.
Set a 30-day loop-back: prove the frequency moved—or admit it didn’t.
Mic drop: Reliability isn’t a program you announce—it’s a system you enforce.