Spreedly API Errors
Incident Report for Spreedly
Postmortem

June 8, 2021 — Edge network partner outage

Spreedly’s edge network partner, which routes all traffic to Spreedly from around the globe, had a major service outage lasting approximately 49 minutes, immediately followed by a partial service outage lasting approximately 24 minutes, for a total outage time of up to 73 minutes. Most web requests to Spreedly applications, including transaction processing requests, during this outage were served error pages. Requests which received these error pages did not reach Spreedly’s applications.

What Happened

Beginning at 09:47 UTC (5:47AM EDT) on Tuesday June 8, 2021, our edge network provider reports serving errors to approximately 85% of all requests across their network. Spreedly automated systems detected this outage and alerted and mobilized the appropriate response teams. This outage prevented most requests to the Spreedly platform from succeeding. Spreedly attempted to route around the edge network partner, but our partner’s systems recovered before our attempt completed, so we canceled the attempt. The edge network provider began to recover at 10:36 UTC (6:36 AM EDT), and normal service was restored by 11:00 UTC (7:00AM EDT). The edge network partner also reports that a permanent fix for the defect which originally caused the outage began to be released at 17:25 UTC.

Next Steps

Spreedly is investigating approaches to improve its network resiliency at the edge, as well as enhancing its downtime monitoring and incident resolution procedures.

Posted Jun 15, 2021 - 17:13 EDT

Resolved
After deploying the fix, all systems appear to be stabilized and functioning. The incident is being considered resolved.

We are still investigating to understand the specific causes of the incident and any residual impact. A post incident review will be published.

We apologize for any inconvenience and disruption to service.
Posted Jun 08, 2021 - 08:49 EDT
Monitoring
A fix has been implemented and traffic is resuming to normal levels.

We are actively monitoring the results.
Posted Jun 08, 2021 - 07:05 EDT
Identified
We have identified the cause of the API errors.

We are currently being impacted by an outage from our content distribution provider, and are working on implementing a fix.

Updates will be provided as they become available.
Posted Jun 08, 2021 - 06:47 EDT
Investigating
We have identified an issue causing intermittent 503 errors on Spreedly's Core API.

This is impacting all transactions and requests to Spreedly's API.

Updates will be provided as they become available.
Posted Jun 08, 2021 - 06:21 EDT
This incident affected: Core Transactional API.