Intermittent Request Errors

Incident Report for Spreedly

Resolved

The affected storage node's state has been manually reset and it has been added back to the cluster. The system is fully operational and the incident is resolved.

Posted Feb 05, 2016 - 16:23 EST

Monitoring

We have identified that the impacted storage node did not shut down cleanly, resulting in it starting up in a locked state. After verifying that it was safe to do so with the storage vendor, we shut down that storage node and the system is operating normally again. We will be investigating why the storage node did not shut down cleanly and, when we have manually resolved the state, will mark this issue resolved.

Posted Feb 05, 2016 - 16:17 EST

Update

A storage node is behaving erratically after being rebooted for maintenance. We have updated the API services to not use the erratic storage node, though there are still errors from the cluster as it attempts to integrate with this node. It is safe to retry failed API calls to route around this partial outage.

Posted Feb 05, 2016 - 16:04 EST

Investigating

We are investigating elevated rate of errors for Spreedly API calls.

Posted Feb 05, 2016 - 15:49 EST