Network Connectivity Issues
Incident Report for OpenSRS
Postmortem

Last night we experienced a sequence of events that led to intermittent degraded performance across many OpenSRS services. At 1AM UTC we were the target of a sophisticated DNS attack that was followed by an unrelated double failure of core network equipment at our main Canadian data center, caused by an undocumented software limitation. We were able to quickly recover from the equipment failure but continued to experience the DNS attack until 13:10 UTC, when the attack was stopped and systems started responding reliably again. The network equipment failure made it more difficult for us to identify that we were under a DNS attack and impacted our response time.

This complex combination of events impacted the following services:

-OpenSRS API services

-Cluster A and B mail services

-opensrs.com, which includes help@opensrs.com email and live chat

-DNS service through systemdns.com and mdnsservice.com

-Hosted Registrar Services (HRS)

We are working on creating more separation between components of our DNS infrastructure that will improve resilience to similar attacks in the future.

We encourage you to check for orders that may have failed during the service interruption. We also expect to see inbound mail delivery delays, as third-party email providers will have queued up mail for delivery to our system. Delayed mail will be delivered throughout the day.

We have been monitoring the stability of our services and now consider the issue resolved. We will continue to monitor the problem and provide updates if the status changes. More information can be found on the OpenSRS Status page.

If you have any technical difficulties with any OpenSRS services, please contact our support team so we can investigate the problem. We apologize for any inconvenience this issue has caused.

Posted Sep 29, 2017 - 19:51 UTC

Resolved
We have not seen any further issues on any of our systems. We now consider this issue resolved.
Posted Sep 29, 2017 - 14:51 UTC
Monitoring
We've been monitoring the stability of our services and now consider this issue resolved. We will continue to monitor the problem and provide an update if the status changes.

We expect to see inbound mail delivery delays as third party mail providers will have queued up mail for delivery to our system. Delayed mail will be delivered throughout the day as per the third party mail providers delayed sending policy.

If you have any technical difficulties with any OpenSRS services please contact our support team so we can investigate the problem.

Thank you for your patience as we worked through this issue.
Posted Sep 29, 2017 - 13:10 UTC
Update
We are happy to report that this time the following services are responding reliably:
OpenSRS API Services
Cluster A and B mail services
opensrs.com which includes help@opensrs.com email and live chat
DNS service through systemdns.com and mdnsservice.com

Services that are still affected:
DNS Syncing - updates to any DNS records using our systemdns.com and mdnsservice.com
HRS Control Panel (not manage.opensrs.com)

Throughout the service interruption third party mail providers will have queued up mail for delivery to our system. Delayed mail will be delivered throughout the day as per the third party mail providers delayed sending policy.

We are still monitoring the systems to ensure they remain fully functional and will continue to provide updates until we are confident this is fully resolved.
Posted Sep 29, 2017 - 12:27 UTC
Update
Our operations team has been all hands on deck this morning working to resolve the issue. We are now seeing significant improvement to the intermittent DNS issue impacting all OpenSRS services, however services are not fully restored yet.

The operations team is working as quickly as possible to fully restore service but no ETA has been publish.

We will continue to provide updates through this communication channel as we progress through the issue.

Thank you for your patience.
Posted Sep 29, 2017 - 10:55 UTC
Update
OpenSRS DNS service continues to experience intermittent issues. As of right now there is no estimated time for resolution. The services which are impacted are:
opensrs.com including help@opensrs.com and live chat
API Services
Cluster A and B
DNS service through systemdns.com and mdnsservice.com
Posted Sep 29, 2017 - 10:05 UTC
Update
OpenSRS DNS service continues to experience intermittent issues. We don't have an estimated time for resolution right now. The services which are impacted are:
opensrs.com including help@opensrs.com and live chat
API Services
Cluster A and B
DNS service through systemdns.com and mdnsservice.com
Posted Sep 29, 2017 - 09:13 UTC
Update
OpenSRS DNS service continues to experience issues intermittently. Due to the nature of the issue all services are impacted which includes:
opensrs.com including help@opensrs.com and live chat
API Services
Cluster A and B
DNS service through systemdns.com and mdnsservice.com

Thank you in advance for your patience as we continue to work through this service disruption.
Posted Sep 29, 2017 - 08:04 UTC
Update
DNS services are still recovering and may respond slowly or intermittently. Because of this we consider access to the following services degraded:
opensrs.com including help@opensrs.com and live chat
API Services
Cluster A and B
DNS service through systemdns.com and mdnsservice.com
Posted Sep 29, 2017 - 07:01 UTC
Update
The operations team has made progress towards resolution and most services have been restored. We still see reports where some sites and services are offline intermittently or due to DNS caching but at this time the following services are restored:

Cluster A and B webmail services
opensrs.com which includes support via help@opensrs.com and live chat
DNS service through systemdns.com and mdnsservice.com
OpenSRS API Services
Posted Sep 29, 2017 - 05:49 UTC
Update
OpenSRS DNS services are currently offline. This will affect the ability to reach hostedemail services, API, opensrs.com including help@opensrs.com support and live chat.

Our operations team continues to work on the issue and we will continue to provide updates as they become available.
Posted Sep 29, 2017 - 05:19 UTC
Update
The network issue is also impacting OpenSRS DNS services on systemdns.com, mdnsservice.com as well as API. Our operations team is still working on resolving the issue as quickly as possible. We will continue to update the status as we work through this issue.
Posted Sep 29, 2017 - 03:52 UTC
Update
Our operations team continues to work on stabilizing the network. Control panels and email service are now loading as expected, but API connections may still be degraded for some users.
Posted Sep 29, 2017 - 02:48 UTC
Identified
Our operations team has identified network connectivity issues. This is causing degraded performance loading our control panels as well as webmail. We are working to restore service as quickly as possible and will update this status once we have more information.
Posted Sep 29, 2017 - 01:44 UTC
This incident affected: Hosted Email (Cluster A, Cluster B, Webmail), APIs (OpenSRS API, OpenHRS API, Email API), Control Panels (Reseller Control Panel, Classic RWI, End User Control Panel, Storefront), DNS (SystemDNS, Domain Forwarding), and SSL, Reseller Support.