The Living Strategy: Moving Beyond Break-Fix to Maintenance That Sticks

Operational ExcellenceMaintenance ManagementProcess ImprovementOperational Efficiency

Oct 6

You wouldn't run your business without a sales plan, a marketing plan, or a balance sheet. So why are so many operational leaders content to run their most expensive assets without a formalized, adaptable maintenance strategy?

The true path to operational excellence isn't found in a binder of unused procedures; it’s in a Living Strategy—a continuous, data-driven system that protects your assets, prevents crisis, and maximizes profitability.

The cost of not having one isn't just a repair bill. It's an entire facility grinding to a halt, a missed deadline that tanks customer trust, and a reactive, stressful work environment. It's time to stop the chaos and treat your maintenance strategy like the critical business plan it is.

The First Step: Stop Treating All Assets the Same

The most fundamental mistake we see is applying a blanket maintenance schedule to every piece of equipment, from the mission-critical press to the non-essential office air conditioner. This leads to one of two expensive outcomes: over-maintaining low-risk assets or, far worse, neglecting high-risk assets until it's too late.

To make an effective strategy, you must first define your risk using Asset Criticality.

How to Use the Criticality Risk Matrix

Maintenance is a finite resource. To deploy it intelligently, you need to understand the Consequence of failure combined with the Probability of it occurring. This is the foundation of your strategy.

Factor Definition Examples Severity (Consequence) What is the pain, impact, or cost if this asset fails? Catastrophic (Safety, environmental breach, long downtime) vs. Minor (Less than 4 hours of downtime, low cost). Frequency (Probability) How often is the asset likely to fail based on history, age, or operational conditions? Frequent (Happening often, high probability) vs. Improbable (Highly unlikely, once in a lifetime).

Decoding the Criticality Score: What the Numbers Mean

The numbers inside the matrix (1 through 20) represent the asset's Risk Priority Score (RPS). They are an indexed rank based on the multiplication of Severity of failure by the Frequency of failure, with 1 being the absolute highest risk. The color coding tells you the required level of maintenance action:

Score Range Action Level Priority Level Maintenance Strategy Required Red (Scores 1-5) HIGH Risk Immediate Intervention Demands strategic intervention using sophisticated proactive measures like Predictive Maintenance (PdM) to prevent catastrophic failure. Yellow (Scores 6-11) MEDIUM Risk Scheduled Priority Requires robust, time-based PMs and consistent inspections (your defined "Medium" priority) to minimize the chance of escalation. Green/Accept (Scores 12-20) LOW Risk Routine Monitoring Strategy can be minimal, often limited to basic inspections or a run-to-fail approach, as consequences are minor and manageable.

How to Assign the Criticality Ranks

The final RPS is generated by assigning a numerical rank to both Severity and Frequency based on the definitions provided in the chart's axes.

1. Ranking the Severity (The Consequence)

This step assesses the magnitude of the loss if the asset fails. Severity uses an inverse ranking, where the worst outcome gets the highest numerical rank for the calculation.

Severity Level (Your Chart) Numerical Rank Definition Focus Catastrophic (I) 4 Death, permanent disability, significant environmental breach, or total loss of production (Downtime>2 Days). Highest Priority Consequence. Critical (II) 3 Personal injury, major damage ($1M−$10M), or significant production loss (Downtime>24 Hrs). Marginal (III) 2 Moderate cost ($10K−$100K), or minor loss of availability (Downtime<24 Hrs). Minor (IV) 1 Low cost (Damage<$10K), or minimal downtime (Downtime<4 Hrs). Lowest Priority Consequence.

2. Ranking the Frequency (The Probability)

This step assesses the likelihood of the failure occurring. Frequency uses a direct ranking, where the highest probability gets the highest numerical rank.

Frequency Level (Your Chart) Numerical Rank Definition Focus Frequent 5 Highly likely to occur (≥1 per 1,000 hrs). Highest Probability. Probable 4 Likely to occur (≥1 per 10,000 hrs). Occasional 3 Could occur sometimes (≥1 per 100,000 hrs). Remote 2 Unlikely to occur (≥1 per 1,000,000 hrs). Improbable 1 Extremely rare (≥1 per 10,000,000 hrs). Lowest Probability.

3. Calculating the Risk Priority Score (RPS)

Once you have the numerical ranks, the score for the cell where they intersect is determined by their product, which is then mapped to the final 1-20 priority list:

RPS (Index)=Severity Rank×Frequency Rank

Example: A critical pump with a history of breakdown every few months would be rated:

Severity: Critical (II) →3
Frequency: Frequent →5
Product: 3×5=15. This product (15) maps to Score 3 (HIGH) in your matrix, demanding immediate, aggressive maintenance action.

The final 1 to 20 number simply gives you a single, objective rank to compare assets across your entire operation, ensuring resources are allocated where the risk (Severity×Frequency) is highest.

The Maintenance Strategy Spectrum: Your Tool Box

Your strategy is your instruction manual for each asset based on its criticality score. It's not one thing; it's a spectrum of proactive measures.

1. The Essential Foundation: Time-Based Maintenance (TBM)

Also known as Scheduled Maintenance or simple Preventative Maintenance (PMs), this is the starting line for nearly every proactive strategy. Tasks like lubrication, changing filters, and simple inspections are performed on a fixed schedule (every 3 months, every 1,000 hours, etc.) regardless of the asset’s actual condition.

Best For: Assets where wear and tear is predictable and where the TBM cost is low. It's the minimum required to keep warranties valid and catch obvious issues.

2. The Smart Investment: Predictive Maintenance (PdM)

When you move to your HIGH criticality assets, you need to eliminate unnecessary downtime and over-maintenance. This is where Predictive Maintenance (PdM) comes in.

PdM uses technologies like vibration analysis, thermal imaging, and oil particle counting to monitor the actual condition of the asset. Maintenance is only performed when a leading indicator shows a problem developing—not just because the calendar says so.

Best For: High-risk, expensive, and complex equipment. PdM converts your maintenance from a cost center into a strategic tool that maximizes run time.

3. The Future: Prescriptive Maintenance (RxM)

The most advanced organizations are moving toward Prescriptive Maintenance. This builds on PdM by using advanced analytics and machine learning to not only predict when a failure will occur, but also to prescribe the optimal action to prevent it. In some systems, it can even automate the work order or adjust operating parameters without human intervention.

Change That Sticks: Why Strategy Must Be a Living Document

A great maintenance strategy isn't something you write once, file away, and forget. It must follow the philosophy of Making Change That Sticks—it has to be audited, improved, and culturally adopted.

Data Must Drive the Strategy

You are always learning about your assets. Did your TBM on a critical gearbox fail sooner than expected? You must analyze the data and adjust the strategy. Perhaps a simple time-based task should be upgraded to a Condition-Based Monitoring (CBM) technology. Did a low-risk asset prove more robust than anticipated? Scale back its maintenance to save labor time.

The data from your work orders and failure reports must feed back into the risk matrix and the defined strategy for each asset.

The Overlooked Strategic Pillars

A maintenance strategy is more than just PM checklists; it requires operational support:

Storeroom Protocol: Your strategy is useless if the required spare parts have a 9-month lead time and aren't on the shelf when you need them. The risk matrix must dictate your stocking decisions. If an asset is HIGH risk, you must have a high-cost spare part on hand.
SOP Documentation: The strategy for your highest-risk assets must be documented in a clear Standard Operating Procedure (SOP). This ensures that when a failure occurs, your team knows exactly how to respond to minimize the consequence and get the fix done right the first time.

Conclusion: Beyond the Binder, Building Resilience

The difference between a failing operation and a resilient, profitable one is the difference between chaos and a documented, adaptable maintenance strategy. Start with the risk matrix, define the right approach for each asset, and commit to letting your data keep the strategy alive.

Operational ExcellenceManufacturingContinuous ImprovementLean ManufacturingProductivityMaintenance StrategyAsset ManagementPredictive MaintenanceRisk Matrix

Kyle Gredvig