SystemDNS Nameservers
Incident Report for OpenSRS
Postmortem

On March 14th, 2024 at 11:34 am EST, Tucows’ Domains DNS platform stopped responding to requests due to a data inconsistency between our DNS server and the underlying database.

Tucows continuously improves its DNS platform and its services. During a routine software update a data issue was identified that affected domain resolution on our System DNS platform.

DNS & Network Engineering teams identified the issue on the spot and started mitigating the problem. When it became clear that resolution of the issue would be time consuming the engineering team initiated a fail-over procedure to resolve the issue which was fully completed at 1:07 pm EST.

We are taking multiple steps to prevent a recurrence and improve reliability in the future.

  • Improve tooling to identify and resolve possible data race conditions.
  • Improve fail-over process to initiate faster resolution.

We apologize for the inconvenience this service disruption may have caused.

Thank you,

Tucows Domains Operations Team

Posted Mar 19, 2024 - 13:48 UTC

Resolved
Our engineering team has deployed a fix. Services have been restored.

Start Time: 03/14/24 15:43 UTC
End Time: 03/14/24 17:09 UTC
Posted Mar 14, 2024 - 17:32 UTC
Identified
Our engineers have identified the problem and are working to resolve it currently.
Posted Mar 14, 2024 - 16:21 UTC
Update
We are currently experiencing a minor incident that is impacting domains. We have encountered an issue with our nameservers that is impacting our domains services. Our engineering team has been engaged and is currently investigating.
Posted Mar 14, 2024 - 16:06 UTC
Investigating
We are investigating an issue regarding our SystemDNS.com nameservers.

We will provide an update once we have additional information.
Posted Mar 14, 2024 - 15:50 UTC
This incident affected: DNS (SystemDNS).