During a routine maintenance event, a piece of software responsible for reporting activity to systems used to generate customer reports and API listing endpoints silently failed to start. During the 40 minutes in which the software was non-functional, API calls for listing transactions and payment methods created in this timeframe would not have been reported. This reporting issue did not affect the outcome of customer transactions, only the ability for Spreedly systems to report that data to customers. After correcting the issue, Spreedly backfilled all available payment method and transaction data to reporting systems.
On 2024-05-30 at 15:15 UTC, production traffic was switched to a newly updated system responsible for generating customer activity data. This system silently failed to produce the necessary information, which resulted in a data gap. Upon automated alerting at 15:37, the team halted their maintenance work and reverted the system to a known good state by 15:50. After confirming that all systems were functioning normally and that no transactions were impacted, the team gathered all available data and replayed the data into the system so that reporting systems would accurately reflect that data.
Spreedly will continue to work to eliminate the possibility of silent failures of this functionality, testing to prevent non-functional reporting to receive production traffic, and introduce additional automated monitoring that will more quickly identify any loss of reporting volume.
Spreedly values the trust our customers place in us to handle one of the most critical aspects of their business, and welcome the opportunity to discuss any questions, concerns, or comments you have regarding this incident. Please reach out to our support staff or your account management team with your concerns.