Degraded performance with Core API

Incident Report for Spreedly

Postmortem

April 6th, 2023 — Failure during vendor maintenance resulted in core errors

A vendor service in the critical path for some Core transactions had an unrelated failure during maintenance which resulted in errors to customers.

What Happened

One of our external vendors regularly performs routine maintenance on databases in order to maintain security and reliability. While performing such a maintenance, on April 6th, 21:37 UTC the vendor automatically migrated our application from one database copy to another (a standard operation). This failover coincided with an incident in their application engine, which prevented our application from restarting with the new database. As a result, a critical secondary service was unavailable and resulted in customer errors from 21:41 to 22:05 for some classes of transactions (such as tokenizing new cards). Once the vendor resolved their application engine issue, our application successfully started and normal operation resumed.

Next Steps

We are in the process of moving this critical secondary service databases to a new hosting provider, which will provide more control over database scale and maintenance windows.

Conclusion

We apologize for this disruption to service, and will continue to drive internal resiliency and availability initiatives to reduce the impact of 3rd party outages.

Posted Apr 27, 2023 - 12:12 EDT

Resolved

We have confirmed the third-party component maintenance is complete and our systems are functioning normally.

This incident is now resolved.

Posted Apr 06, 2023 - 18:31 EDT

Monitoring

We have identified the source of this degradation to be related to a third-party component maintenance. The impact has been alleviated and core transactions are now functioning normally.

We will continue to monitor for a period of time to ensure no further impact.

Posted Apr 06, 2023 - 18:21 EDT

Investigating

We are aware that some customers are experiencing a degraded performance with our Core Transactional API and we are actively investigating.

We will update as soon as we have more details on this issue.

Posted Apr 06, 2023 - 18:07 EDT

This incident affected: Core Transactional API.