Infrastructure & Operations

Backups are easy to buy and hard to operationalise

Why recoverability should be treated as a leadership and governance question rather than a technical comfort signal.

Infrastructure & Operations backupsrecoveryservice resilienceinfrastructure leadership

In view

  • Topic: Infrastructure & Operations
  • Maturity: carefully framed publication
  • Edited for publication and safe disclosure.

Operational lens

  • Pillar: Infrastructure & Operations
  • Format: Operational essay
  • Reading time: 7 minutes

Editorial note

Carefully framed
  • Some examples are deliberately abstracted to keep the judgement useful without exposing private systems, people, weaknesses or operational detail.
  • Exact backup schedules or retention settings
  • Recovery validation routines or operational proof points
  • Failover design, recovery sequence or service priority order

1. Grounded opening

Most organisations can point to backup technology. Fewer can explain their recovery position with the same confidence.

That gap matters more than people like to admit. Backup conversations are often reassuring by default. Status reports exist. Protection is assumed to be in place. Renewal costs are being paid. On paper, it all sounds responsible. The language creates comfort before the operating discipline has earned it.

I have become increasingly sceptical of resilience claims that begin and end with possession of backup tooling. Buying the technology is usually the easy part. The harder part is deciding what level of recoverability is actually being claimed, who reviews whether that claim still holds and what happens when a comfortable assumption turns out to be weaker than it sounded in a meeting.

That is why I do not think backup work sits neatly inside infrastructure procurement or routine admin. In live environments, it is a leadership question about what level of recoverability the organisation is genuinely prepared to own.

2. What the issue actually is

The weak version of the problem is that some backup arrangements are old, fragmented or underpowered.

That is true, but it is not the point that interests me most.

The stronger version is that organisations often confuse the existence of backups with the existence of a recovery position. Those are not the same thing. A copy can exist and still be poorly understood, weakly reviewed, badly aligned to the service it is meant to support or too dependent on assumption rather than operational proof.

That is where the real control problem lives.

Backup work becomes serious only when it answers operational questions cleanly. What level of dependency is the organisation really carrying? What kind of interruption would become unacceptable quickly? What evidence do we have that the recovery position is still behaving as expected? Who owns the answer when it is not?

Once you look at it that way, the conversation changes. It stops being mainly about products and storage locations. It becomes about recoverability, ownership, review discipline and the organisation’s appetite for living with uncertainty.

3. Why it matters in practice

This matters because the quality of backup work is rarely judged on an ordinary day.

It is judged when something fails awkwardly, when a service has to be recovered under pressure, when routine maintenance exposes how old an assumption has become, or when leadership wants an honest answer about whether a critical function could be restored well enough to support the organisation.

At that point, vague comfort disappears quickly.

If the backup position is strong, the organisation may still be dealing with an ugly event, but it is at least dealing with it from an understood position. If it is weak, the problem becomes larger than the original fault. Ownership becomes blurred. Confidence drops. Time is lost to checking whether the copies are usable, current or sufficient. The conversation moves from recovery into damage control.

That is why I think backup discipline belongs firmly at Head of IT level. A weak backup position does not stay a technical inconvenience for long. It quickly becomes a service question, a governance question and a credibility question. The organisation is no longer deciding only whether data is recoverable. It is revealing whether it understood its own dependency on that data in the first place.

4. What had to be balanced

The awkward part is that backup design always competes with something else.

There is the pressure to modernise infrastructure without turning every improvement into a long resilience programme. There is the cost of stronger assurance, the operational reality of limited change windows and the fact that leadership expectations are rarely uniform across every service. There is also the burden of reviewing and maintaining the whole position after the initial project enthusiasm has gone elsewhere.

That means sensible backup design is mostly trade-off work.

Some choices favour simplicity. Some favour stronger assurance. Some reduce day-to-day overhead while making recovery assumptions harder to defend. Some improve confidence but create heavier operational dependence elsewhere. None of that removes the need for the organisation to understand what it is really choosing.

That is why I am wary of backup conversations that sound too neat. The important part is not pretending the trade-offs disappear. It is making them explicit enough that the organisation can choose them honestly.

5. What changed or what the work clarified

What this work clarified for me is that backup technology becomes useful only when recovery thinking becomes operational.

That sounds obvious, but it is easy to miss in practice. A team can spend meaningful time improving resilience tooling and still leave itself with a weaker recovery position than it thinks if the surrounding discipline stays vague.

The most useful shift was moving from “do we have backups?” to “what recovery position are we actually claiming?” That forces harder questions. Is the ownership model clear? Are the review signals trusted or merely assumed? Are the stated expectations proportionate to the organisation’s dependence on the service, or simply inherited from whatever was easiest to set up at the time?

It also sharpened a broader point that comes up elsewhere in infrastructure work. A lot of resilience weakness first appears as comfort. The organisation feels protected because the language around protection is familiar. It is only later, usually under pressure, that the difference between having backup technology and having recovery discipline becomes visible.

That is why I see backup work as part of service leadership rather than background infrastructure housekeeping. A stronger backup position is not just a better technical arrangement. It is a more honest answer to the question of what the organisation can recover, how confidently and under what assumptions.

6. What stayed messy

No backup position becomes perfectly tidy.

Some parts of the estate remain harder to restore cleanly than anyone would like. Some dependencies are awkward because they span responsibilities and assumptions that were never designed to line up neatly. Some recovery expectations remain partly political, because what matters most to users and what matters most to leadership do not always line up neatly in the first conversation.

There is also a cultural problem that never goes away completely. Backup review is rarely the most glamorous operational work, so it can drift unless somebody keeps it close to service ownership. People are usually happy to approve the idea of resilience. They are less enthusiastic about the ongoing discipline of checking, reviewing, documenting and occasionally admitting that a comfortable assumption needs to be challenged.

That is not a reason to lower the standard. It is a reminder that the real work begins after the design has been bought.

7. Broader lesson

The broader lesson is that backup maturity should be judged by recoverability, not by how many layers of technology are involved.

That is an important distinction because infrastructure teams can sound very reassuring while still leaving the organisation with more uncertainty than it realises. A more elaborate protection stack is not automatically a strong recovery position. Reassuring language is not automatically evidence that the service can be brought back in a way the organisation would accept on a bad day.

Once you judge it by recoverability, the standard becomes stricter and more useful. Ownership matters more. Review discipline matters more. Service prioritisation matters more. Honest communication about limits matters more.

That is where infrastructure judgement becomes governance judgement. The organisation is not just deciding what it wants to protect. It is deciding what level of recovery confidence it is prepared to stand behind.

8. Closing

I do not think backup work is undervalued because people forget it exists. I think it is undervalued because people mistake buying it for finishing it.

That is the mistake.

Backups become credible when leadership can describe the recovery position without mistaking hope for discipline. That requires ownership, review discipline and a firmer account of what the organisation is actually prepared to stand behind.

That is when backup work stops being a purchase and becomes an operating standard.

About the publication

I write about infrastructure, security, governance and service delivery in complex organisations, with a focus on how decisions hold up under real operational pressure.