Spreedly logo
  • Operational
  • Degraded Performance
  • Partial Outage
  • Major Outage
  • Maintenance
Small gap in reporting data
Incident Report for Spreedly
Postmortem

Summary

Around 17:53 UTC on Friday, October 6th, and for a period of about two hours, due to an upstream service provider rotating shared credentials, we became unable to reach the event pipeline service that is responsible for propagating changes out to secondary systems. This resulted in a temporary gap in reporting data for our query API endpoints such as listing payment methods and transactions, as well as a gap in data in Insights. At no point were revenue-affecting transactions down or inaccessible.

Next Steps

We will be looking closely at how to more quickly propagate changes in our upstream provider’s configuration. However, due to the secondary nature of the affected systems, we will also be willing to accept some amount of downtime or maintenance. For these situations we will look at investing in our recovery tool chain to more quickly fill in associated gaps in data and completely restore functionality.

Conclusion

The transactional service of Spreedly is architected to be as isolated as possible from other secondary services. We prioritize the uptime of revenue-affecting API calls over other secondary concerns like reporting and visualization, though we also acknowledge that many businesses do rely on these abilities for some of their business processes. While we do still consider these secondary functions to be sacrificial when experiencing systems instability, we also hold ourselves accountable for the operation and uptime of these functions and apologize for the inconvenience posed by their degradation.

Posted Oct 12, 2017 - 17:58 EDT

Resolved
All missing transaction records from the incident (October 6, 2017 - 18:12 UTC to 20:08pm UTC) have been replayed into secondary systems and all functionality has been restored. We apologize for the inconvenience the inability to query the API, or the gap in data visualization within Insights, may have caused.
Posted Oct 10, 2017 - 16:45 EDT
Identified
We've updated this incident to reflect that the secondary API calls (those not associated with financial transactions) are still impacted, as is Insights. The only impact remains the small window of time where any transactions or payment methods aren't being returned in the API or shown in Insights graphs. The transactions and payment methods themselves are not affected, only their inclusion in these secondary systems.
Posted Oct 10, 2017 - 11:29 EDT
Update
We continue to see stability from the upstream vendor and are making progress in restoring the ability to list the transactions and payment methods stored during the approximately 2 hour affected window on Friday. We will continue to share our progress and notify customers once that capability is fully restored.
Posted Oct 10, 2017 - 09:26 EDT
Update
The connection to our upstream vendor has been stable since Friday afternoon at approximately 3:57pm EDT. Some customers may be experiencing a degraded ability to list transactions or payment methods stored during the incident. We are continuing to work on fully restoring this capability. We will continue to share updates on our progress until that capability is fully restored.
Posted Oct 09, 2017 - 09:49 EDT
Monitoring
We experienced an inability to connect to an upstream system from approximately 2:12pm EDT to 4:08pm EDT. This incident did not affect production transactions and we did not observe any transaction errors. Some customers may notice a degraded ability to list transactions or payment methods stored within this timeframe. We are working on determining the exact duration and will begin restoring full query capability for all data within this timeframe. We will provide a post mortem once available.
Posted Oct 06, 2017 - 16:59 EDT