Intermittent network failures within Spreedly’s cloud infrastructure led to communication failures between some internal services. Once a core component of the Spreedly infrastructure was unreachable for a period of time, all remaining traffic was refused, resulting in customers receiving a “502” error response to their request. Customers were unable to process transactions for approximately 27 minutes.
To maximize overall system availability and security, some core Spreedly components operate with ephemeral memory storage. The contents of such storage is only available via network connection (not disk). During an intermittent network failure within AWS’s infrastructure, the storage was unavailable and the core system was subsequently unable to process requests. A majority of customers’ transactions were affected between 13:00 UTC and 13:27 UTC. Impacted transactions were those that received a “502 Gateway Unreachable” response.
Approximately an hour later, intermittent network failures within AWS continued but this occurrence was limited to a degradation of service—as opposed to a full outage—of less than eight minutes. Impacted customers were those that received a “500 Internal Server Error” response between 14:40 UTC and 14:47 UTC.