Between 13:50UTC April 30 and 16:31UTC May 1 code was deployed that caused all transactions at the Checkout.com gateway to report success regardless of actual results.
At approximately 13:50UTC on April 30th an update was applied to the production system containing API version upgrade changes to the Checkout.com gateway. This update had insufficient logic to determine whether certain transactions failed, resulting primarily in authorizations which failed at the gateway, but that Spreedly indicated were successful. Followup transactions based on these erroneous authorizations also failed and were continually retried. Code fixing this update was applied 16:31UTC on May 1st for a total duration of 26.5 hours.
The Checkout.com API changes introduced a particularly tricky error case that we did not account for before deploying the code. Tests did not account for the combination of factors that led to this error and were not available to help diagnose the issue as it was happening. In addition, our monitoring processes did not report this situation as an issue due subtlety of its expression and we were only aware of the problem when reported by customers.
We will evaluate the following activities to prevent this situation from happening in future: 1) perform more in depth review of API upgrade documentation to ensure that the impact of ActiveMerchant changes are sufficiently understood, 2) construct and deploy more robust tests to account for both success and failure detection, and 3) implement additional monitoring processes to detect this subtle type of issue rather than relying on customer reporting.
We apologize for any disruption this incident may have caused. You rely on Spreedly to report transaction status accurately and we will take steps to ensure you can conduct your business confidently.