Incident Summary
On May 1, 2023, at 5:00 PM EDT, our system alerted us to a surge in request volume on Cluster
A of our Hosted Email solution. Our Development Operations (DevOps) Team began
investigating and eventually engaged our Security Operations (SecOps) Team, who confirmed
that a large-scale Distributed Denial of Service (DDoS) attack was underway and moved to
manually block the offending IPs. We successfully mitigated the attack, however the
unprecedented volume of requests overwhelmed our authentication service, which caused
failures to log in and to send and receive email. It also exposed a bug in our statistics file that
was consuming excessive memory resources, which further delayed our recovery.
By May 2, 2023, at 2:10 PM EDT, our DevOps Team had implemented fixes for the
authentication service issue and the statistics file bug. We then began to slowly ramp up
processing of our email service to full capacity. By May 2, 2023, at 8 AM EDT, the backlog of
pending inbound and outbound email was completely cleared.
Timeline of Events
May 1, 2023, 5:00 PM EDT - Our system alerted us to a surge in request volume on Cluster A of
our Hosted Email service. Our DevOps Team began investigating. Also around this time, our
Support Team began to see customer reports about service issues.
May 1, 2023, 5:03 PM EDT - Our Support Team posted the first status page update about the
incident, informing customers that we were experiencing service issues on Cluster A and
actively investigating the root cause.
May 1, 2023, 5:36 PM EDT - Our SecOps Team was engaged to further investigate, as the
number of requests to Cluster A was growing aggressively.
May 1, 2023, 5:47 PM EDT - Our SecOps Team confirmed it was a DDoS attack and began to
manually block the abusive IPs.
May 1, 2023, 6:00 PM EDT - The DDoS attack was successfully mitigated. However, our
DevOps Team was still seeing log-in issues and email request failures. They continued to
investigate.
May 2, 2023, 6:19 AM EDT - We pinpointed an issue with our authentication service. Our
DevOps Team began to explore potential solutions.
May 2, 2023, 1:45 PM EDT - We discovered that memory utilization was growing faster than
expected and identified a bug in our statistics process as the cause.
May 2, 2023, 2:10 PM EDT - DevOps promoted fixes for both the authentication service issue
and the stats file bug. We then began to slowly ramp up processing of our hosted email service
to full capacity. Users began to experience successful log-in attempts, and our service began to
process the backlog of pending inbound and outbound email requests.
May 2, 2023, 7:00 PM EDT - We officially marked the incident as closed. The backlog of
pending email requests was completely cleared by May 3, 2023, at 8:00 AM EDT.
Impact Analysis
The root cause of this service interruption was an authentication service failure caused by
unprecedentedly high traffic during a DDoS attack. Our recovery time was then further delayed
by a bug in our system wherein a statistics file was excessively writing to memory. The
authentication service is a critical component of the Hosted Email infrastructure; most other
services within Hosted Email run through the authentication service in order to maintain a
secure environment and access the metadata required to process requests. Consequently,
when it began to fail, the service impact was substantial.
Once the necessary fixes were deployed and Hosted Email was made fully operational, there
remained a backlog of pending email requests that had accumulated during the downtime. Our
system protects against email loss by creating a queue of inbound and outbound emails. During
general operation, this queue is incredibly small. However, during the event, a sizable backlog
was created, which took our service — once fully restored— 8 hours to clear. No data or emails
were lost. All backlogged email was time-stamped according to when it was delivered, as per
standard operating procedure.
Response and Mitigation
Our DevOps Team started investigating the surge in traffic on May 1, 2023, at 5:03 PM EDT, in
response to a system alert. The rate of requests to our system began to increase, and on May
1, 2023, at 5:36 PM EDT, our DevOps Team engaged our SecOps Team to further investigate.
SecOps concluded it was a DDoS attack and, at approximately 6 PM EDT on May 1, 2023,
started to block the IPs responsible for the spike in requests. This action successfully mitigated
the attack, and no further spikes in request volume occurred. Concurrently with this increase in
request volume, our Support Team saw an increasing number of reports of Webmail log-in
failures and failures to send and receive email.
When our Hosted Email service did not recover as expected following the DDoS attack, DevOps
began investigating the cause of the continued service interruption. By May 2, 2023, at 6:19 AM
EDT, they had pinpointed the issue with our authentication service, and soon after, they
discovered the bug with our stats process. On May 2, 2023, at 2:10 PM EDT, they put in place a
fix for both issues. The authentication traffic was split between two services, instead of being
directed through a single service. The bug was addressed by correcting the problematic code.
Lessons Learned
The root cause of the outage was a failure of our authentication service to sufficiently scale to
accommodate the severe spike in request volume. Prior to this event, the authentication service
had been identified as a service that needed to be better optimized. This incident will expedite
the process of rebuilding this service as the limitations have been clearly demonstrated.
Conclusion
While this service interruption was precipitated by a DDoS attack, the root cause was the
inability of our authentication service to adequately scale. We’re confident in the steps we’re
taking to mitigate this specific issue. This incident had a significant impact on our resellers and
their customers, and we are committed to addressing your concerns and questions.
We value our customer relationships, many of which are decades long, and we want to continue
to nurture and build long-lasting partnerships.
If you have any questions or feedback, please contact our Customer Service Team.
Thank you,
Tucows Domains Team