How to Build a Living Maintenance Strategy (That Actually Reduces Risk)

Maintenance ManagementMaintenance StrategyProcess Improvement

May 8

Most maintenance “strategies” are just a wish list with a binder.

They look organized. They feel responsible. And they fail the first time a “minor” part takes down the main line.

Here’s the reality: maintenance isn’t a checklist. It’s risk management with grease under its nails (and consequences that show up at 2:00 AM).

The real problem: the set-and-forget villain

The villain is the belief that if you document maintenance, you’ve managed maintenance.

You print PMs, hand them out, and call it a strategy. Then the floor teaches you the lesson you refused to price: risk doesn’t care about your paperwork.

So understand: a real strategy behaves like an investment portfolio.

You don’t spread money evenly across every stock “to be fair.” You allocate capital where the downside is violent.

In maintenance, your capital is time, parts, planning bandwidth, and attention.

The framework: a Living Strategy (not a chore list)

1) The “Secondary Criticality” Trap

The biggest criticality mistake is ranking an asset in a vacuum.

You see a $50 cooling fan and mark it “Green Zone.” Then that fan fails, the $100k VFD it cools cooks itself, and you buy 12 hours of downtime on the main line.

That fan was never green.

The fix is simple (and unforgiving): don’t rank by replacement cost.

Rank by total impact of absence—including failure chains.

If a “small” component can take down a “big” system, it inherits the system’s criticality.

2) The “Maturity Gate”: Earning the Right to Predict

Everyone wants PdM.

Sensors. Dashboards. Alerts. The shiny object.

But if your basics aren’t real, PdM is just an expensive way to generate more noise (and more excuses).

I’ve watched plants spend $50k on vibration sensors while techs were still using the wrong grease or skipping oil changes.

That’s not modernization.

That’s putting a stethoscope on a patient you refuse to feed.

The rule: you must pass the Standard Work Gate.

If you’re not ~95% compliant on basic time-based maintenance—on time, done correctly, verified—you haven’t earned prediction.

You don’t buy a Ferrari when you can’t change the oil in the truck.

3) The storeroom: the physicality of risk

Your storeroom is the checkbook of your strategy.

If a Red Zone asset has a proprietary motor with a 16-week lead time and it’s not in the crib, your “strategy” is a bedtime story.

This is where leaders have to get honest.

You don’t stock parts because it’s fun.

You stock them because the risk tax of not having them is higher than the carrying cost.

Use your RPS (Risk Priority Score) to make that tradeoff explicit.

And yes, it will annoy someone in finance (briefly).

Downtime will annoy them longer.

4) Closing the loop: the Post-Mortem Requirement

A strategy is only “living” if it learns from the blood on the floor.

Two audits make this real:

The “Miss” Audit (required): Every unplanned failure on a Red or Yellow asset triggers a 5-Why Root Cause Analysis.
- Did the strategy fail?
- Or did we fail the strategy (compliance, parts, planning, training)?
The “Over-Maintenance” Audit (required): If you’ve done a TBM inspection 50 times and never found a fault, you’re over-maintaining.
- Extend the interval.
- Simplify the check.
- Or kill it.

Preventive work that never prevents anything is just ritual.

Tooling + deliverables (what to build, what it produces, what it replaces)

Build these assets like a system, not a paperwork campaign:

Failure-chain criticality model (Total Impact scoring)
- Produces: a force-ranked list of Red/Yellow/Green assets and components that inherit criticality via dependency
- Replaces: “replacement cost” scoring and gut-feel rankings
Standard Work Gate dashboard (compliance that’s verified, not self-reported)
- Produces: a real readiness signal for PdM adoption
- Replaces: paper compliance and calendar roulette
RPS-backed stocking policy (crib rules tied to risk and lead time)
- Produces: a defensible min/max and insurance-stock list
- Replaces: hope-based purchasing and emergency freight
RCA + action closure loop for Red/Yellow misses
- Produces: systemic fixes, not repeat failures
- Replaces: war stories and hero maintenance
Over-maintenance kill list
- Produces: reclaimed labor hours and fewer low-value PMs
- Replaces: sacred-cow PM routes

Common failure modes (how this gets sabotaged)

Ghost signing: PMs “completed” because the system punishes honesty.
Paper compliance: metrics say 98% while the floor says “we didn’t touch that.”
Shiny-object PdM: sensors deployed while lubrication basics are still optional.
Calendar roulette: critical work scheduled… then constantly displaced by “urgent” noise.
Storeroom theater: critical spares missing, but the ERP says “inventory accurate.”

What to do this week (real actions)

Pick one top downtime system and map the failure chain—identify the “small parts” that can kill it.
Re-score those components by Total Impact, not cost.
Audit TBM compliance on that system: on-time, correct, verified (not self-attested).
Identify one long-lead critical spare and decide: stock, dual-source, or accept the risk tax (explicitly).
Require a 5-Why for the next unplanned miss on a Red/Yellow asset—no exceptions.
Find one PM that never finds defects and extend/kill it (prove you’re allowed to improve the system).

Maintenance isn’t a chore list—it’s the business hedging its own risk, in public, with receipts.

ReliabilityMaintenance StrategyAsset Managementasset criticalityRisk MatrixRisk MitigationstoreroomStandard Work

Kyle Gredvig