Core Saturation
Incident Report for Spreedly
Resolved
We have the rate limit in place now, and do not expect any more saturation events.
Posted over 1 year ago. May 20, 2017 - 16:50 EDT
Identified
We're still working to get the rate limit in place, and saw a handful of 503 status codes returned at ~20:00 UTC. It has since quieted back down.
Posted over 1 year ago. May 20, 2017 - 16:13 EDT
Update
We just realized we had the wrong time range on the original posting - we've edited to be correct.
Posted over 1 year ago. May 20, 2017 - 15:41 EDT
Monitoring
For about five minutes, from ~17:56-18:03 UTC, Spreedly came under heavy request load to an endpoint that was not well rate-limited, and experienced saturation of our request handling capacity. Due to this we returned 503 status codes for a subset of requests. Based on the cause, we're pretty confident that requests will have either never gone through, or completed successfully - no indeterminate transaction states should have resulted.

The situation had resolved itself by the time the incident response team was able to investigate, thus the lateness of this update, as we focused in on validating the cause. We're working now to get a better rate limit in place to prevent an immediate recurrence, and will get an event post-mortem up early next week.
Posted over 1 year ago. May 20, 2017 - 15:17 EDT