Single Point of Failure (SPOF) Audit
Client Self-Assessment
Purpose
This audit identifies where delivery, reliability, or growth depends too heavily on a single person, decision path, or system. Single points of failure increase execution risk, slow response under pressure, and make scale fragile.
What this is
A fast, practical assessment across Policy, People, and Technology that highlights risk concentration and prioritizes high-leverage fixes.
What this is not
Not a compliance exercise.
Not a tool inventory.
Not a re-organization.
How to Use This Audit
- Complete the checklist (15–30 minutes)
- Score each section
- Review red flags
- Prioritize fixes starting with the highest score
1. Policy (Governance and Decision Flow)
Check all that apply
☐ Key decisions depend on one person being available
☐ Decision authority is implicit or undocumented
☐ Objections can block progress without alternatives or deadlines
☐ Client commitments exceed internal capacity controls
Healthy signals
- Clear decision ownership
- Written escalation paths
- Time-bound objections
- Defaults that allow work to proceed
Red flag
If one absence causes decision paralysis, you have a policy single point of failure.
2. People (Knowledge, Ownership, Capacity)
Check all that apply
☐ Only one person can deploy or fix production
☐ One person owns a client relationship end-to-end
☐ Critical knowledge lives in people, not documentation
☐ Senior staff routinely step in to unblock work
Healthy signals
- At least two owners per critical responsibility
- Clear primary and secondary ownership
- Runbooks for recurring work
- Predictable handoffs
Red flag
“Ask them, they’re the only one who knows” indicates a people single point of failure.
3. Technology (Systems and Infrastructure)
Check all that apply
☐ One admin account or credential controls production
☐ No tested rollback or recovery path
☐ Infrastructure changes are manual
☐ Monitoring alerts go to one person
Healthy signals
- Infrastructure defined as code
- Centralized access and secrets
- Tested recovery paths
- Group-based alerting
Red flag
“Don’t touch that system” indicates a technology single point of failure.
SPOF Risk Scoring
Score each area from 0 to 2
- 0 = No meaningful risk
- 1 = Partial or emerging risk
- 2 = Clear single point of failure
Record your scores
- Policy:
- People:
- Technology:
Interpretation
- 0–2 → Healthy
- 3–4 → Latent risk (address proactively)
- 5–6 → Active execution risk (address immediately)
What to Fix First (80/20 Guidance)
Start with changes that:
- Reduce dependency on individuals
- Clarify decision authority
- Make recovery boring and repeatable
Common high-leverage fixes:
- Add a secondary owner
- Write a one-page runbook
- Introduce a default decision rule
- Route alerts to a group
- Automate one manual step
Write an Executive Summary (Optional)
Our highest execution risk comes from [Policy / People / Technology], specifically [X]. This creates fragility during normal operations and significant risk under stress. Addressing [Y] will materially reduce dependency on individuals and restore predictable execution.
Why this matters
Single points of failure rarely show up during calm periods. They surface under pressure; during incidents, growth, or key absences. This audit helps remove hidden fragility before it turns into outages, missed deadlines, or burnout.
Next step
Use this audit as a baseline and re-run it quarterly or after major organizational or technical changes.
See Related Work
We'll show work that's relevant to the context and risks you're facing. We review together to ensure relevance and context.
Discuss Relevent Work