On February 28th at 18:30 UTC, a deployment resulted in a required database table to become locked and unavailable to the Spreedly API. This resulted in Spreedly’s transactional API returning elevated 500 and 502 error responses. A total of 9% of overall API requests were affected for a duration of approximately 34 minutes. The core API recovered after the database was returned to normal operations.
At 18:30 UTC as part of a code deployment, schema updates were made to a database which automatically locked the database; this disruption prevented core Spreedly APIs from performing functions required to process transactions nominally. Errors with core API began at 18:31 UTC, and the Spreedly team was alerted via automated monitoring at 18:33 UTC, and identified the issue. After rolling back the deployment and removing the database lock, no further errors were observed after 19:05 UTC.
Spreedly has updated their internal roadmap with specific items intended to lessen the reliance on secondary systems for the transactional API and improve monitoring and resiliency when disruptions to secondary systems occur.