Released Webinar Room component code changes caused regression and partial outage for a specific group of users
Impact
Attendees couldn’t join some automated events properly.
Only scheduled automated events with private chat preferences were affected. All other events worked fine.
Root Cause Analysis
The engineering team worked on fixing one bug and unintentionally introduced a code change that caused the given incident.
The QA team didn’t test the new code change properly and missed a new bug in the updated code. The end-to-end automated tests didn’t detect the system behavior change in the Staging environment.
The updated code was deployed to Production, which caused regression.
Resolution and Recovery
The engineering team identified the root cause of the regression and applied a new hotfix that solved the problem.
Actions Points
We will improve our QA end-to-end automatic testing script to cover more combinations of event settings.
We will improve our manual QA testing process to avoid missing such cases.
We will continue improving our codebase to make it more reliable and fault-tolerant to newly introduced changes.
Posted Aug 21, 2024 - 09:28 EDT
Resolved
The issue is fixed
Posted Aug 20, 2024 - 05:37 EDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Aug 20, 2024 - 05:07 EDT
Identified
The issue is affecting only automated events with "Private" chat prefrence. We've identified the cause of the issue and are currently testing a fix.
Posted Aug 20, 2024 - 04:21 EDT
Update
The issue is affecting some old automated events - as a workaround, we recommend creating a brand new automated event until the issue is fixed.