Spreedly API Errors
Incident Report for Spreedly

At 09:54 UTC on 2021-09-18, Spreedly detected an increase in 500 response codes being returned to customers. Spreedly immediately investigated the issue, and identified that a portion of requests for one instance of the Core API service were returning 500 errors due to DNS lookup failures to our service provider. DNS service issues corrected for this host by 10:18:00 UTC, and the service resumed normal operation.

What Happened

At 09:54 UTC on 2021-09-18 Internal monitoring detected an unusual number of errors being returned by the Core API service to customers, and engineers were paged and began investigation. The issue was isolated to a subset of requests to a single host in the cluster, which resolved without intervention at 10:18:00 UTC. Over the course of this 24 minute degradation of service approximately 7% of all requests were affected by DNS lookup failures to Spreedly’s upstream DNS service.

Next Steps

Spreedly will pursue additional monitoring and resiliency improvements to DNS services within the API environment with a goal of reducing recovery time in the face of upstream service failures.

Posted Sep 24, 2021 - 15:20 EDT

After deploying the fix, all systems appear to be stabilized and functioning. The incident is being considered resolved.

We are still investigating to understand the specific causes of the incident and any residual impact. A post incident review will be published.

We apologize for any inconvenience and disruption to service.
Posted Sep 18, 2021 - 07:23 EDT
A fix has been implemented addressing the API errors.

We are actively monitoring the results.
Posted Sep 18, 2021 - 07:12 EDT
We have identified the cause of the API errors.

Updates will be provided as they become available.
Posted Sep 18, 2021 - 07:11 EDT
We have identified an issue that was causing intermittent 500 errors on Spreedly's Core API.

This was impacting a small percentage of transactions and requests to Spreedly's API.

These errors seem to have subsided but we are currently investigating their cause.

Updates will be provided as they become available.
Posted Sep 18, 2021 - 06:32 EDT
This incident affected: Core Transactional API.