Dashboard experiencing delays
Incident Report for Spreedly
Postmortem

A high load of test transactions from a customer caused the post-transaction data pipeline populating the Dashboard data source to drop a subset of its messages.

What Happened

On April 7th, 2020 at 10:10 am EDT, Spreedly began receiving alerts from our core systems which indicated that they were under unusually high load. The requests which caused this load were test transactions originating from a customer.

At 10:24 am EDT, messages being passed within the post-transaction data pipeline began to fail. This led to data in the Dashboard system becoming out of date. A subset of these messages continued to fail until 12:00pm EDT when Spreedly fully recovered the service. The failed messages were replayed over the next few hours, and the Dashboard data source was fully repopulated at 4:24pm EDT.

Transaction processing was not impacted during these events. The issue only manifested as a temporary delay in delivery of data to Dashboard.

Next Steps

Spreedly will evaluate test transaction rate limiting, and will re-assess the data pipeline’s load balancing mechanisms.

Posted Apr 10, 2020 - 15:10 EDT

Resolved
Dashboard results are now fully caught up and reporting in real time.
Posted Apr 07, 2020 - 16:51 EDT
Identified
Real-time reporting has returned to normal. All new transaction activity is being represented correctly in Dashboard, although transactions which occurred between 2:30pm UTC and 4:00pm UTC are currently being repopulated.
Posted Apr 07, 2020 - 12:27 EDT
Investigating
We are currently seeing degraded performance on the Spreedly Insights Dashboard. Our capacity has now returned to normal, and we expect to see an improvement in real-time reporting shortly.

We are actively monitoring to ensure that everything is working as expected.
Posted Apr 07, 2020 - 10:48 EDT
This incident affected: Supporting Services (Dashboard).