Spreedly Core Certificate Errors
Incident Report for Spreedly
Postmortem

Summary

An upstream provider allowed one of their public certificates to expire. This caused some user agents with a common but older software component to be unable to communicate with Spreedly Core, Id, and Billing, until the expired certificate was removed.

What Happened

On Tuesday, May 30, 2000, an organization identified as “AddTrust AB” issued a TLS public certificate with the name “AddTrust External CA Root” and an expiration date twenty years in the future. This certificate became widely installed in the trust stores of many operating systems, browsers, and other applications which use the TLS public-key infrastructure.

The OpenBSD project forked LibreSSL from OpenSSL 1.0.1g in April 2014 as a response to the Heartbleed security vulnerability.

The widely-used TLS library, OpenSSL, had not been designed specifically to handle a cross-signed certificate. In October 2011, an issue was reported titled “#2634: Fail to verify server with a trusted CA root in the middle of the chain,” which was fixed in February 2015, in the then-upcoming OpenSSL 1.1.0 branch. It was not considered to be a bug, so was not backported to the OpenSSL 1.0 branch. OpenSSL 1.1.0 was first released in August 2016.

The certificate authority with control over the “AddTrust External CA Root” certificate knew that the time would come when it would expire. And they did not simply renew it with a new expiration date. They created a new root certificate, supporting new encryption algorithms, with a new key pair. But, because this new root certificate wasn’t yet included in trust stores around the Internet, they also provided a backwards compatibility fallback by cross-signing it with the “AddTrust External CA Root.” This way, even on machines that had not had a recent trust store update, the new root certificate would still be trusted. Gradually, the new root certificate was added to trust stores.

On November 20, 2019, Spreedly engineers renewed the wildcard certificate used by Spreedly Core and Id. The new certificate had a certificate chain that included the nearly-expired cross-signing root certificate.

On May 12, 2020, Spreedly engineers renewed the certificate on billing.spreedly.com. This certificate also had a certificate chain that included the nearly-expired cross-signing root certificate.

At 10:48AM UTC on Saturday, May 30, 2020, precisely twenty years after it was issued, the “AddTrust External CA Root” certificate expired. Because the newer root certificates were also in modern up-to-date trust stores, up-to-date user agents no longer consider this legacy certificate when calculating the validity of the Spreedly certificate. However, older user agents which incorrectly checked the validity of the legacy root certificate rejected it as expired, and so were unable to communicate with Spreedly Core, Id, and Billing. Applications which use OpenSSL below 1.1.1 and LibreSSL below 3.2.0 are known to have been affected.

Around 8:37AM EDT, Spreedly engineers updated our certificates to no longer present the expired “AddTrust External CA Root” certificate, which resolved the issue with agents communicating with Spreedly Core and Id. The certificate on Spreedly Billing was similarly updated around 2:50PM EDT.

The duration of the issue was around 1 hour 49 minutes on Spreedly Core and Id. The Spreedly Dashboard was unusable during that time. Spreedly Billing was affected for around 7 hours and 58 minutes.

LibreSSL 3.2.0 was released May 31, 2020. It includes in its release notes “use non-expired certificates first when building a certificate chain.”

On June 1, 2020 the expired certificate was removed from the trust store of Spreedly Dashboard’s host operating system image.

Next Steps

Spreedly will work to automatically verify that all certificates in our presented certificate chain have valid expiration dates in the far future.

Conclusion

Like incidents of any complexity, there was a “Swiss cheese” cascade of small problems which caused this incident to occur. If the “AddTrust” certificate were renewed instead of replaced, or if all applications using OpenSSL-derived libraries had been updated, or if the “AddTrust” certificate were no longer included on Sectigo-issued SSL certificate chains, or if the “AddTrust” certificate were no longer included in trust stores, May 30 would have been a quieter weekend on the Internet. It’s only when every one of these "holes in the cheese” line up that failure occurs.

To read more about how this certificate expiration affected people around the internet, we recommend Scott Helme’s article “The Impending Doom of Expiring Root CAs and Legacy Certificates.” Spreedly happened to be in the first wave, but this will continue to affect webservers across the Internet until these holes have been closed.

Posted Jun 12, 2020 - 17:12 EDT

Resolved
The SSL certificate chain has been updated. As of now, all systems appear to be stabilized and functioning. The incident is being considered resolved.

We are still investigating to understand the specific causes of the incident and will publish a post-mortem.

We apologize for any inconvenience and disruption to service.
Posted May 30, 2020 - 08:42 EDT
Identified
We've identified a CA Root certificate that has expired.

This appears to be impacting certain API/web clients. Most notably: openssl and curl.

We will provide updates as they become available.
Posted May 30, 2020 - 08:33 EDT
Update
The errors are also impacting the Spreedly dashboard.
Posted May 30, 2020 - 08:13 EDT
Investigating
We are investigating errors indicating that the SSL certificate for `https://core.spreedly.com` has expired.

Updates will be provided as they become available.
Posted May 30, 2020 - 08:11 EDT
This incident affected: Supporting Services (Core Secondary API, Dashboard) and Core Transactional API.