Spreedly API Errors
Incident Report for Spreedly
Postmortem

October 29, 2021 — Intermittent 500 response codes from Spreedly's Core API

At 2021-10-29 1:00AM UTC requests to the Spreedly Core API intermittently returned 500 error response codes for a period of approximately 5 minutes. An automated recovery of Spreedly’s internal systems was triggered and all systems resumed normal operations at approximately 1:05AM UTC.

What Happened

At 2021-10-29 1:02AM UTC internal monitoring detected an elevated number of error responses being returned from the Spreedly Core API. Engineers were paged and began investigating. The issue arose due to a dependent internal system becoming partially unavailable beginning at 1:00AM UTC. An automated antivirus scan that runs on this dependent system resulted in constrained resources on a subset of hosts, this then resulted in those hosts being deemed “unhealthy” by the automated health check process and removed from service. New hosts were automatically brought into service and normal operations resumed at approximately 1:05AM UTC.
Approximately 4,500 requests received a 500 error response during this time.

Next Steps

Spreedly engineers have made changes to mitigate the effects of the automated antivirus scan such that it should no longer cause the system to become unresponsive.

Posted Nov 12, 2021 - 15:07 EST

Resolved
Spreedly systems alerted to an intermittent failure and recovered automatically. The incident is being considered resolved.

We are still investigating to understand the specific causes of the incident and any residual impact. A post incident review will be published.

We apologize for any inconvenience and disruption to service.
Posted Oct 29, 2021 - 22:44 EDT
Monitoring
A fix has been implemented addressing the API errors.

We are actively monitoring the results.
Posted Oct 29, 2021 - 21:54 EDT
Investigating
We have identified an issue that was causing intermittent 500 errors on Spreedly's Core API.

Updates will be provided as they become available.
Posted Oct 29, 2021 - 21:54 EDT
This incident affected: Core Transactional API.