Increase in Internal Server Errors on USA Epay
Incident Report for Spreedly
Postmortem

Between Jan 8, 2020 6:14 PM UTC and Jan 9, 7:53 PM UTC, transactions using the USAePay gateway integration crashed with an Internal Server Error.

What Happened

On Jan 8, 2020 at 6:14 PM UTC, Spreedly engineers began deploying a change to the application, to improve an unrelated gateway integration. After the deploy process completed, requests to Spreedly that used the USAePay gateway type failed with an HTTP 500 Internal Server Error, indicating a crash in the Spreedly core application. At 6:30 PM UTC, our automated error monitoring software first alerted us to this error condition.

While investigating this crash, we attempted to redeploy a new version of the code with no functional changes, to narrow the scope of the investigation. Despite this redeploy, the application was still showing alerts on the USAePay gateway integration. We then discovered that the deployment process was defective and no longer caused the application to load the new code. We repaired the defect in the deployment process, and completed a successful deploy on Jan 9, 7:53PM UTC. Once the new version of the application was loaded, the errors in the USAePay gateway integration stopped.

Next Steps

We will evaluate the following activities to prevent this situation from happening in the future:

  1. Investigate and correct the condition that causes a small percentage of deployments to crash on certain gateway integrations until the application is reloaded.
  2. Investigate how the deploy process developed a defect preventing the application from reloading.

Conclusion

We apologize for any disruption this incident may have caused and are taking steps to ensure that our systems continue to be resilient for our customers.

Posted Jan 21, 2020 - 12:03 EST

Resolved
A fix for this issue has been deployed. The outage began Jan 8, 2020 6:14 PM UTC until Jan 9, 7:53 PM UTC.

We have confirmed that Internal Server Errors on USA Epay have stopped.
Posted Jan 09, 2020 - 16:23 EST
Monitoring
A fix has been implemented. We are monitoring the results. Currently, USA Epay 500 errors have stopped.
Posted Jan 09, 2020 - 15:40 EST
Investigating
We are currently investigating a recent increase in 500 errors on the USA Epay gateway.
Posted Jan 09, 2020 - 15:07 EST
This incident affected: Core Transactional API.