Spreedly logo
  • Operational
  • Degraded Performance
  • Partial Outage
  • Major Outage
  • Maintenance
Spreedly API Errors
Incident Report for Spreedly
Postmortem

Intermittent network failures within Spreedly’s cloud infrastructure led to communication failures between some internal services. Once a core component of the Spreedly infrastructure was unreachable for a period of time, all remaining traffic was refused, resulting in customers receiving a “502” error response to their request. Customers were unable to process transactions for approximately 27 minutes.   

What Happened

To maximize overall system availability and security, some core Spreedly components operate with ephemeral memory storage. The contents of such storage is only available via network connection (not disk). During an intermittent network failure within AWS’s infrastructure, the storage was unavailable and the core system was subsequently unable to process requests. A majority of customers’ transactions were affected between 13:00 UTC and  13:27 UTC. Impacted transactions were those that received a “502 Gateway Unreachable” response. 

Approximately an hour later, intermittent network failures within AWS continued but this occurrence was limited to a degradation of service—as opposed to a full outage—of less than eight minutes. Impacted customers were those that received a “500 Internal Server Error” response between 14:40 UTC and 14:47 UTC.

 

Actions

  • Improve the automatic failover process and service restart capabilities to build additional resiliency into the application environment.
  • Improve our criteria for considering an incident fully resolved.
Posted Feb 10, 2021 - 16:50 EST

Resolved
After deploying the fix, all systems appear to be stabilized and functioning. The incident is being considered resolved.

We are still investigating to understand the specific causes of the incident and will publish a post-mortem.

We apologize for any inconvenience and disruption to service.
Posted Feb 08, 2021 - 08:39 EST
Monitoring
A fix has been implemented addressing the API errors.

We are actively monitoring the results.
Posted Feb 08, 2021 - 08:27 EST
Investigating
We have identified an issue causing intermittent 500 errors on Spreedly's Core API.

This is impacting transactions and requests to Spreedly's API.

Updates will be provided as they become available.
Posted Feb 08, 2021 - 08:20 EST
This incident affected: Core Transactional API.