Incident postmortem
Definition
The written document produced after a production incident review, capturing the timeline of events, root cause analysis, contributing factors, impact assessment, and corrective action items — serving as both an accountability record and an organizational learning artifact.
Incident postmortem and incident review describe related but distinct things: the review is the meeting; the postmortem is the document. The distinction matters because the document lives on. Well-written incident postmortems accumulate into an organization's institutional memory — patterns emerge, recurring root causes get addressed, and new team members learn from failures they didn't experience.
Standard incident postmortem sections:
- Summary: Two to three sentences. What happened, when, impact magnitude.
- Timeline: Chronological event log with timestamps.
- Root cause: The single underlying cause, written precisely.
- Contributing factors: Other conditions that enabled the root cause or amplified the impact.
- Impact: Users affected, duration, revenue or data implications.
- Action items: Changes to be made, with owner and due date.
- Lessons learned: Takeaways for the team and organization.
Good postmortem writing: Precise, factual, blameless. Avoid passive voice that obscures agency. Name systems and processes, not people. 'The deploy pipeline allowed the change to reach production without a canary stage' is more useful than 'the engineer deployed without testing.'
Distribution: Incident postmortems should be shared with the team, relevant stakeholders, and leadership. Public postmortems (shared with customers) are a trust-building practice for companies that commit to radical transparency.
Start the draft from the whiteboard: snap the incident review timeline and root cause diagrams with BoardSnap, then use the structured summary as the draft skeleton.
Examples
- Google's SRE team maintains a library of internal incident postmortems that teams consult when investigating similar issues — the accumulated knowledge prevents repeat incidents.
- A cloud infrastructure provider publishes sanitized incident postmortems on its status page within 72 hours of any significant outage.
- A startup writes its first formal incident postmortem after a database migration gone wrong — the document becomes the template for all future reviews.
- A team's incident postmortem reveals that the same contributing factor — no circuit breaker on the payment processor API — appeared in three separate incidents over 18 months.
Snap a incident postmortem. Ship its actions.
BoardSnap turns any whiteboard — including this one — into a summary and action plan.