Cluster B - IMAP/POP/Webmail
Incident Report for OpenSRS
Postmortem

Incident Date: August 27, 2020
Incident Number: PR-1273

On August 27, 2020, at 2:34 PM ET, the Tucows Hosted Email platform experienced service interruptions impacting IMAP, POP and Webmail in cluster B.

The engineering team identified the issue with an unknown bug on a stable kernel version causing file system lockups. 

At 3:56 PM ET, The Engineering team successfully restored the services by manually failing-over and upgrading the impacted device's file system.

At 5:36 PM ET, A second service interruption was observed and lasted for 12 minutes due to the same kernel systems bug. 

On August 28, 2020 at 1:00 AM ET, Tucows performed emergency maintenance in order to further stabilize the email systems in cluster B.

Tucows is in contact with external vendors to investigate the cause and develop a plan to roll out a permanent solution to address the identified systems bug.

 

Thank you,

Tucows Engineering Team

Posted Aug 31, 2020 - 15:07 UTC

Resolved
This incident has been resolved.
Posted Aug 27, 2020 - 22:44 UTC
Monitoring
We have identified that the issue was caused due to instability as a result of the previous Webmail outage. The team has addressed the stability issue and Webmail has recovered successfully.

Incident Start Time: 08-27-2020 21:36:00 UTC
Incident End Time: 08-27-2020 21:48:00 UTC
Total Duration: 12 minutes
Posted Aug 27, 2020 - 22:34 UTC
Update
We are currently experiencing a service degradation affecting Cluster B. The engineering team has been engaged and is investigating the issue.

Client Impact: Cluster B users will be unable to log in using IMAP, POP, Webmail.
Posted Aug 27, 2020 - 21:55 UTC
Investigating
Intermittent login failure
Posted Aug 27, 2020 - 21:51 UTC
This incident affected: Hosted Email (Cluster B, Webmail).