Hostedemail Cluster A: IMAP, POP, and Webmail
Incident Report for OpenSRS
Postmortem

https://www.opensrsstatus.com/incidents/wtst5h6c1kbf

Incident Date: October 19, 2021 Incident Number: PR-2476

On October 19, 2021, at 11:18 PM ET, Tucows’ Hosted Email platform experienced service interruption impacting retrieving emails in POP/IMAP/Webmail and inbound email delays.

The service interruption was caused due to a kernel bug on the affected network storage device causing high load on the system.

On October 20, 2021 at 1:29 AM ET, The Engineering team successfully restored the services by restarting the affected systems to alleviate the high load.

At 10:40 PM ET, A second service interruption was observed and lasted for 23 minutes due to the same kernel systems bug.

At 11:03 PM ET, the engineering team performed a restart of the affected network storage devices to stabilize the hosted email environment.

Tucows is in the process of investigating the cause and develop a plan to roll out a permanent solution to address the identified systems bug.

Tucows is committed to continue with the hosted email migration efforts into the new cloud to maintain a scalable and stable hosted email environment.

Thank you, Tucows Engineering Team

Posted Oct 22, 2021 - 13:57 UTC

Resolved
All of Cluster A services have now been restored. We are marking this incident as resolved. We will keep monitoring the services.

Incident Start Time: 10-20-2021 03:18:00 UTC
Incident Start Time: 10-20-2021 05:29:00 UTC
Total Duration: 2 hours, 12 minutes
Posted Oct 20, 2021 - 05:57 UTC
Investigating
Email services (IMAP/POP/Webmail/Inbound emails) on Cluster A are offline again. Our Engineering team is investigating the cause, and we will provide updates as they are available.
Posted Oct 20, 2021 - 05:07 UTC
Update
Please note, we are still noticing intermittent issues regarding Cluster A IMAP, POP, and Webmail.
We will update when we have more information or when the issue is resolved.
Posted Oct 20, 2021 - 04:38 UTC
Monitoring
All services have been restored as of 03:54UTC We are in the monitoring state as of now. And will update further when we consider this resolved. We again thank you for your patience.
Posted Oct 20, 2021 - 04:13 UTC
Investigating
We are currently experiencing an incident that is impacting Hostedemail on Cluster A: IMAP, POP, and Webmail.
Our engineering team has been engaged and are investigating. We will post updates shortly and appreciate your patience.
Posted Oct 20, 2021 - 03:48 UTC
This incident affected: Hosted Email (Cluster A, Webmail).