Database outage

Incident Report for Demio

Postmortem

Summary

  • A broken database table caused a partial system outage.

Impact

A part of the Demio components didn’t work:

  • Registration Page. Visitors couldn’t register for Events.
  • Webinar Room. The join links didn’t work.
  • Event Notification and Reminder Emails.
  • Event Activity page.
  • Event Registration reports and exports didn’t work.
  • User Dashboard, Billing. All user updates were rolled back to the previous state due to Database recovery.

Root Cause Analysis

  • One of the database tables was damaged due to an internal database engine bug that appeared during the table modification.

Resolution and Recovery

  • We restored the database from a “continuous backup” snapshot.
  • All new and modified data during the outage period was lost because we rolled back the database to a previous state in the past.
  • The entire system recovery time - 3 hours and 15 minutes.

Actions Points

  • We discovered and reproduced the root cause reason that caused the outage.
  • We found out the way how to avoid this scenario using the same database engine version.
  • We made a plan to upgrade the current database engine version where the root cause is fixed.
Posted Apr 18, 2024 - 09:37 EDT

Resolved

One of the database tables was damaged during the table modification. The database was rolled back and restored.
Posted Apr 05, 2024 - 08:30 EDT