API Degradation/Timeouts
Incident Report for OpenSRS
Postmortem

Incident Date: June 5, 2022
Incident Number: PR-3206

On June 5, 2022 at 12:30 PM ET, Tucows’ Domains platform experienced intermittent service interruption impacting OpenSRS and Enom. During this incident users were unable to purchase domains, perform domain lookups, and there were delays in registrations and transfers.

The service interruption was caused by an unexpected behaviour on one of the storage devices causing API timeouts.

On June 6, 2022 at 6:54 PM ET, The engineering team increased the timeout configuration to reduce order processing delays. At 7:55 PM ET, they performed a code change and restarted the affected device to stabilize the impacted environments.

Tucows has an immediate plan to investigate the root cause of the unexpected behaviour. 

Tucows is reviewing the architecture to further prevent future interruptions.

Tucows is to review and enhance monitoring for better visibility and to address the issue in a timely manner. 

Thank you,

Tucows Engineering Team

Posted Jun 27, 2022 - 18:26 UTC

Resolved
Our engineering team has applied a code fix and restarted the affected nodes resolving the issues with registration/transfer orders from being processed.

Incident Start Time: 06-05-2022 16:30:00
Incident End Time: 06-06-2022 23:55:00
Total Duration:1 day, 7 hours, 25 minutes
Posted Jun 07, 2022 - 01:01 UTC
Update
Our engineering team is rolling out a deployment in order to resolve the issue.

Updates will continue to come as they progress.
Posted Jun 06, 2022 - 21:43 UTC
Update
Our engineering and dev team continues to troubleshoot the issue. More updates will be provided once we have them available.
Posted Jun 06, 2022 - 17:40 UTC
Investigating
We are currently experiencing an incident that is impacting domain lookups, this may delay registrations/transfer orders from processing. Our Dev team has been engaged and currently investigating the issue. We will provide updates once we have them available.
Posted Jun 06, 2022 - 15:58 UTC
This incident affected: APIs (OpenSRS API, OpenHRS API).