Your Plant Doesn’t Have a Reliability Problem. It Has a Definition Problem.

Most plants don’t have a reliability problem.

They have a definition problem. (And they try to solve it with a binder. Naturally.)

You can’t “prioritize work” when everyone’s using different math in their head.

So the system drifts, the backlog lies, and the same bad actor kicks the door in every week.

The real problem (and why “Paper Lean” keeps winning)

If your reliability program lives in a PowerPoint deck and a 400-page SOP library, it’s not a program.

It’s a costume.

Here’s the reality: the floor doesn’t care what you intended.

It only responds to what you defined, ranked, executed, and verified.

So understand: operational excellence is not “doing more maintenance.”

It’s running a repeatable system that turns chaos into priorities—and priorities into results.

Phase 1: The Foundation (Definition & Alignment)

Before you touch an asset, you set the Rules of Engagement.

Because “high impact” is meaningless until you force it to mean something.

Inputs

  • Site-specific financial data (margin per unit)

  • Labor rates

  • Safety protocols

Tool Needed

  • Impact Definition Table

Activity

Define “Low / Medium / High” for two pillars:

  1. Operational Loss (time thresholds)

    • Example: Low < 2 hrs, Med 2–8 hrs, High > 8 hrs

  2. Repair Cost (dollar thresholds)

    • Example: Low < $500, Med $500–$5k, High > $5k

Output

  • A localized Risk Strategy Matrix everyone agrees on (yes, even Production).

This is where the grown-up work starts.

Because once the thresholds are set, excuses get louder (which is how you know you did it right).

Phase 2: Problem Identification & Ranking

You don’t fix everything at once.

You fix what matters.

Inputs

  • Maintenance logs (SAP/CMMS)

  • Operator “nuisance” reports

  • Scrap/rework data

Tool Needed

  • The Problem Ranking Matrix

Activity

List your known issues (example: “Heat Sealer Bad Seals”).

Then plot them by Frequency (Weekly/Monthly/Yearly) vs Impact (from Phase 1).

Output

  • A prioritized Bad Actor List—your real to-do list.

This is how you stop letting the loudest person in the morning meeting run your maintenance strategy. (A bold concept.)

Phase 3: Root Cause Analysis (RCA)

Now you take the #1 item and you do the work nobody wants to do: you ask “why” until the story gets inconvenient.

Inputs

  • The top bad actor (not the whole zoo)

Tools Needed

  • Fishbone (Ishikawa) Template

  • 5-Why Worksheet

Activity

Brainstorm across categories: Man, Machine, Method, Material, Measurement, Mother Nature.

Example (Heat Sealer Bad Seals):

  • Machine: heat tape worn / heating element drift

  • Method: conveyor speed mismatch

  • Material: bag contamination or film variation

Output

  • Identified Root Causes (not symptoms)

If your RCA ends with “operator error,” congratulations—you just built a blame machine.

Real RCA produces causes you can engineer out, standardize out, or train out.

Phase 4: Strategy Assignment & Tool Selection

Once you know the root cause, you choose the prevention tool.

Not your favorite tool. The right tool.

Inputs

  • Root causes

  • Implementation factors (lead time, complexity, cost)

Tool Needed

  • Strategy Selection Logic Tree

Activity

Pick the best strategy:

  • PM (Preventative): time-based replacement (swap heat tape every 30 days)

  • PdM (Predictive): technology-based monitoring (IR sensor on sealer temp)

  • Operator Care: inspection/cleaning standard (clean bag area every 4 hours)

  • Redesign: engineering change (upgrade element, shielding, guides, controls)

Output

  • An updated Asset Maintenance Plan

This is where “calendar roulette” dies—because you stop scheduling work by tradition and start scheduling it by logic. (The plant might actually survive this.)

Phase 5: Execution & Loop-Back

This is the part that separates frameworks from fairy tales.

Inputs

  • The new PM/inspection tasks

Tools Needed

  • Standard Operating Procedure (SOP) Templates

  • Feedback Loop Sheet

Activity

Train operators/maintenance on the new strategy.

Then watch the frequency.

Did “Weekly” become “Quarterly”?

Or did the team just get better at re-labeling the same defect?

Output

  • A Continuous Improvement (CI) Report that proves the loop closed.

If you don’t loop back, you didn’t improve anything.

You just moved the problem to a new folder.

The “Master Tool Bag” (what to build next)

If you want this to run without heroics, you need a small set of repeatable tools:

  1. Site Input Sheet

    • Captures the “ballpark” financial/time thresholds so Phase 1 is real.

  2. Risk Ranking Matrix

    • Your 3x3 or 5x5 grid that makes prioritization visible (and arguable in daylight).

  3. RCA Workbook

    • Fishbone + 5-Why in one place—so RCA doesn’t die in someone’s notebook.

  4. Strategy Assist Sheet

    • A cheat sheet that helps leaders choose PM vs inspection vs redesign based on lead time and complexity.

This replaces the binder religion with something better: a system that actually runs.

Common failure modes (how plants sabotage this)

  • Vague thresholds (“high impact” = “I feel like it”)

  • Ranking by volume instead of frequency x impact

  • RCA theater (meetings that output “retrain operators” and call it a day)

  • Tool favoritism (PM everything, because… tradition)

  • No enforcement loop (SOPs exist, but reality doesn’t comply)

Paper Lean loves these mistakes.

They keep the slide decks looking good while the bearings keep dying.

What to do this week (3–7 actions)

  1. Build a one-page Impact Definition Table with time + cost thresholds.

  2. Get Production + Maintenance to sign off on the Risk Strategy Matrix (yes, sign off).

  3. Pull your top 20 chronic issues and create a Problem Ranking Matrix.

  4. Select the #1 bad actor and run Fishbone + 5-Why until you have engineerable causes.

  5. Use a Strategy Selection Logic Tree to choose the prevention strategy (document why).

  6. Update the Asset Maintenance Plan and create the SOP + inspection points.

  7. Set a 30-day loop-back: prove the frequency moved—or admit it didn’t.

Mic drop: Reliability isn’t a program you announce—it’s a system you enforce.

Next
Next

How to Lead with Strategic Friction (Without Becoming a Jerk)