In the end, Jeff Johnson's tweet clearly indicated the root cause. It turned out that Apple's OCSP Responder service was too overloaded, so macOS could not verify the cryptographic certificates of the application developers.
But why is OCSP Responder on the critical path to launch applications? In this article, we'll briefly discuss code signing, how the Online Certificate Status Protocol (OCSP) works, why it's completely wrong, and some of the best alternatives. Unlike other notes on this incident, I want to discuss the practical cryptographic aspects (at a high level) and offer a balanced perspective.
Code Signing
On the developer portal, Apple explains the purpose of code signing:
Signing your app's code ensures users that it comes from a known source and hasn't been altered since it was last signed. An app must be signed with a certificate issued by Apple before it can be integrated, installed on a device, or entered the app catalog.
In other words, for an app to be trusted on macOS, it needs to be signed with its own key pair-based certificate. A keychain is used to create a unique "Developer ID" certificate that includes a private key for use by the developer and a public key for distribution. When Apple has signed a Developer ID certificate, the developer can use the private key to create cryptographic signatures on their apps with every release.
When the application is launched, its signature is verified against the public key of the developer's certificate. The certificate itself is then verified to ensure that it has not expired (certificates are usually valid for one year), and that it is ultimately signed by the Apple root certificate. There can also be intermediate certificates as part of the chain up to the root certificate. This is a "chain of trust" because the Developer ID certificate was signed by the application, the intermediate certificate signed the Developer ID certificate, and the Apple root certificate signed the intermediate certificate. Any Apple device can check this chain of trust and therefore approve the launch of the app.
This is similar to the Internet's TLS public key infrastructure. But also fundamentally different, as Apple has complete control over its own chain of trust. Other CAs are not allowed to issue valid code signing certificates as all certificates must be tied to Apple.
If verification fails, then the user will see a terrible window:
Feedback
What happens if a developer violates Apple policies or loses their private key? The certification authority must instantly revoke the issued certificates. If a certificate is used maliciously, it is unacceptable to wait days or months for it to expire naturally, otherwise a leak of the private key would render the entire system useless.
It is in this situation that the certificate is revoked. This is an additional step in the signature verification process, which involves asking the certification authority that the certificate is still valid.
On the Internet, this is done in the simplest way. The CA gives you a Certificate Revocation List (CRL) with the serial numbers of all revoked certificates, and you verify that the certificate is not on the list. However, browsers stopped using this approach as the list got longer and longer. Especially after horrific exploits like Heartbleed required massive certificate revocations.
Online Certificate Status Protocol (OCSP) is an alternative that allows you to validate certificates in real time. Each certificate contains a built-in OCSP Responder, which is the URL that you request and it tells you if the certificate has been revoked. In the case of Apple, this is
ocsp.apple.com
... So now, in addition to verifying the cryptographic validity of the signature, every time you launch the application, you perform a real-time verification on the Apple site (with some caching) that they still consider the developer's certificate to be legitimate.
OCSP availability issue
The huge problem with OCSP is that the external service becomes a single point of failure. What happens if the OCSP responder is down or unavailable? Are we just refusing to verify the certificate (hard-fail)? Or do we pretend that the check was successful (soft-fail)?
Apple is forced to use soft-fail behavior, otherwise applications will not work offline. All major browsers also implement soft-fail behavior, since OCSP responders are traditionally unreliable and the browser wants to load the site even if the CA responder is temporarily down.
But soft-fail is not a good option, because with control over the network, an attacker can block requests to the responder and the check will be skipped. In fact, such a "fix" of the error was widely circulated on Twitter during this incident: traffic to it was
ocsp.apple.com
blocked by a line in the / etc / hosts file. Many will leave this line for a while, as disabling OCSP does not cause any noticeable problems.
Incident
If Apple's OCSP validation is built on soft failure, why would applications hang when the OCSP responder is disabled? Probably because it was actually a different glitch: the OCSP responder was not actually completely disabled. It just didn't work well.
Due to the load from millions of users around the world who were updating to macOS Big Sur, Apple's servers slowed down and did not respond properly to OCSP requests. But at the same time, they worked well enough that soft-fail did not work.
OCSP privacy issue
In addition to the OCSP availability issue, the protocol was not designed to protect privacy in the first place. The basic OCSP request includes an unencrypted HTTP request to the OCSP responder with the certificate serial number. This way, not only can the responder determine which certificate you are interested in, but also your ISP and any other person intercepting packets. Apple can list, in order, which developer apps you open, and outsiders can do the same.
Encryption could have been added, and there is a better, more private version called OCSP stapling , but Apple hasn't done either. OCSP stapling doesn't really make sense in this scenario, but this technology illustrates that OCSP should not leak data by default.
Better future
The incident sparked a lively discussion in the community, with one side stating, "Your computer is not really yours," and the other arguing, "Establishing trust in applications is difficult, but Apple does it well . " I am trying to show that OCSP is a terrible way to manage certificate revocations anyway, and in the future it will lead to more incidents related to responder availability and privacy. In my opinion, this is a bad engineering decision - to set the dependency of application launcher on OCSP. At least in the short term, they mitigated the damage by increasing response caching times .
Fortunately, the best method for revoking certificates, CRLite, is almost ripe. It allows you to shorten all certificate revocation lists to a reasonable size. In Scott Helme blog provides a good summary of how CRLite uses Bloom filters to return the old approach with a list of revoked certificates, which operated until OCSP.
MacOS devices may periodically receive updates to this CRL and perform checks locally on the device, addressing OCSP availability and privacy issues. On the other hand, since the Developer ID revocation list is much smaller than the list of all PKI revoked certificates, it is worth asking why Apple does not use CRLs. They may not want to disclose which certificates have been revoked.
Conclusion
Overall, this incident was a good reason to reflect on the trust model promoted by organizations such as Apple and Microsoft. Malware has become more sophisticated, and most people are unable to determine whether it is safe to run certain binaries. Code signing seems like a great cryptographic way to establish trust for applications and at least link applications with well-known developers. And revoking certificates is a necessary part of that trust.
But just a few glitches in the OCSP verification process spoil the cryptographic elegance of the code signing and verification process. OCSP is also widely used for TLS certificates on the Internet, but failures there are less catastrophic due to the large number of CAs and the widespread ignorance of failures from browsers. Moreover, people are used to seeing websites unavailable from time to time, but they don't expect the same from applications on their own computers. MacOS users worried that their own apps were affected by Apple's infrastructure issues. However, this is an inevitable result, stemming from the fact that certificate validation depends on external infrastructure, and no infrastructure is 100% reliable.
Scott Helme also expresses concerns about the authority that CAs get if revoking certificates is really effective. Even if you are not worried about the potential for censorship, errors will sometimes occur and should be weighed against the security benefits. As one developer discovered when Apple mistakenly revoked his certificate , the risk of running on an isolated platform is that you can be isolated from it.