Spreedly logo
  • Operational
  • Degraded Performance
  • Partial Outage
  • Major Outage
  • Maintenance
502 Bad Gateway Errors
Incident Report for Spreedly
Postmortem

On January 25th, an application code deployment resulted in health check failures which impacted our computing ability to service other API requests properly.

What Happened

On 2023-01-25 22:41 UTC , an application code deployment removed application health endpoints that were used for critical infrastructure monitoring purposes. As a result, otherwise healthy instances were falsely marked as requiring termination, and new replacement instances were created. Beginning at 2023-01-25 23:09 UTC there were at times too few instances available to serve the volume of traffic, which impacted our ability to properly service API requests. Spreedly reverted the problematic code and instantiated new instances, which restored service levels for all customers.

Next Steps

Spreedly is committed to holistically reviewing and reimagining our change release processes and culture, with a focus on better documentation and cross-team training on cross-API interface requirements, deployment monitoring, and increased automatic paging for actionable alerts.

Conclusion

We deeply apologize to our customers for this interruption to service and the impacts on their business which they have entrusted to Spreedly.

Posted Jan 31, 2023 - 12:26 EST

Resolved
We have confirmed that this issue has been resolved and customers are no longer experiencing 502 Bad Gateway errors.

A post incident review will be published.

We apologize for the impact this issue caused to affected customers.
Posted Jan 25, 2023 - 20:47 EST
Monitoring
We have implemented a fix and services are restored. We are currently monitoring and confirming with impacted customers that they are no longer experiencing these errors.
Posted Jan 25, 2023 - 19:56 EST
Identified
We have identified an issue resulting in customers receiving a 502 Bad Gateway Error. Our team is working to resolve this issue currently and we will update shortly.
Posted Jan 25, 2023 - 19:15 EST
This incident affected: Core Transactional API.