Spreedly API Errors
Incident Report for Spreedly
Postmortem

November 11th, 2021 — Intermittent Core 500 errors due to DDoS on id.spreedly.com

A Distributed Denial of Service attack was targeted at Spreedly for the purposes of extortion, and this attack caused a degradation of service for transactions. Spreedly implemented a robust and comprehensive rate limiting policy in response to this attack which mitigated the attack and allowed services to return to normal operation.

What Happened

On November 11th and 12th, 2021, Spreedly’s id service (https://id.spreedly.com/) was the target of a Distributed Denial of Service (DDoS) attack, specifically in the form of volumetric HTTPS requests to legitimate and publicly available (unauthenticated) URL endpoints. This attack coincided with receiving extortion demands.

During the DDoS attack, request volumes exceeding 10,000x our usual transactions/second throughput exhausted available resources on the idsystem, resulting in the administrative console and reporting dashboard being largely unavailable. Because the core transaction processing system depends on the id system, a percentage of requests to core received "500 errors" during each new wave of the attack.

Spreedly responded by initially blocking and then rate-limiting the attackers’ requests. The DDoS attack pivoted to additional endpoints resulting in a second wave of service degradation. Spreedly again blocked those requests, then subsequently implemented broad rate-limiting on the id service preventing the success of further attacks.

Next Steps

Spreedly is dedicated to providing the most robust service possible to our clients and will continue to improve the reliability of its services through additional caching/retry mechanisms and further decoupling of our application interdependencies.

Posted Nov 16, 2021 - 18:54 EST

Resolved
During a period of monitoring, Spreedly engineers identified recurrences of 500 errors to the Core API between 20:52 - 21:04 UTC and 22:01 - 22:12 UTC.

After mitigating the additional errors, all systems are confirmed to be stabilized and functioning as of 22:25 UTC. The incident is being considered resolved.

A post incident review will be published with additional details. We apologize for any inconvenience and disruption to service.
Posted Nov 11, 2021 - 17:57 EST
Monitoring
A fix has been implemented addressing the 500 errors on Spreedly's Core API.

We are actively monitoring the results.
Posted Nov 11, 2021 - 15:18 EST
Identified
We have identified the cause of the API errors.

We are currently working on implementing a fix.

Updates will be provided as they become available.
Posted Nov 11, 2021 - 15:02 EST
Investigating
We have identified an issue causing intermittent 500 errors on Spreedly's Core API.

This is impacting some transactions and requests to Spreedly's API.

Updates will be provided as they become available.
Posted Nov 11, 2021 - 14:53 EST
This incident affected: Core Transactional API.