(This time posting to the list with the right email address; sorry for the duplicate email Jeremy and Ryan)
I’d like to point out that the precedent has not been to require CAs to revoke all certificates in the face of a CAA implementation flaw, but merely that the CAA checks are executed again to determine authorization for non-revocation. Specifically, a flaw in Let’s Encrypt’s CAA implementation was discovered last year (https://bugzilla.mozilla.org/show_bug.cgi?id=1462735) and it did not necessitate the revocation of all valid Let’s Encrypt certificates at the time. The incident report is light on details, but it sounds like only those certificates which failed the CAA recheck were revoked. Furthermore, in Let’s Encrypt’s case, there was insufficient logging of the original pre-issuance CAA lookup results, which may not be the case with Digicert here.
In light of this, I believe that the revocation of only the 16 certificates would align with precedent.
From: dev-security-policy on behalf of Jeremy Rowley via dev-security-policy
Sent: Friday, May 10, 2019 16:54
Subject: RE: CAA record checking issue
The difference is we actually have the data at time of issuance. It just wasn’t correctly relied on for these specific certs. I think this means there is an open question on whether the issuance even was a mis-issuance since the CAA information was collected…even if it wasn’t perfect.
This is why we’re revising the approach to say “Were the certs actually mis-issued? If yes, revoke. If no, then don’t revoke.”
I was looking at it like a law. You may think you trespassed by walking on some grass. But if permission was granted at the time to walk on the grass, then you never actually violated a rule (even if you didn’t know about the permission). If permission was granted later, you still broke that law and are accountable, even if no penalty is applied. Here, we didn’t appropriately store the information but the data may have been stored and checked in a process. More succinctly said, the difference is the broken process may result in compliantly issued certificates which is different than a broken certs that are then remediated. If I can prove the compliance at the time the cert was issued, then the certs shouldn’t be revoked.
Does that makes sense? I can certainly revoke all 1100 if that’s the preferred approach, but I figure with a few days time I can better answer question of what were the results in a break of normally compliant process?
Oh, one other factor is that the system wasn’t exploitable. The break was between two internal processes talking to each other so the errors couldn’t result in certificates issued to a bad actor. It was also a very low volume compared to normal issue. Neither of these are good reasons or excuses. Instead they are the reason we thought we should perhaps not revoke all the certs until we better understand the compliance implications.
From: Ryan Sleevi
Sent: Friday, May 10, 2019 2:16 PM
To: Jeremy Rowley
Cc: email@example.com; firstname.lastname@example.org
Subject: Re: CAA record checking issue
On Fri, May 10, 2019 at 3:55 PM Jeremy Rowley > wrote:
The analysis was basically that all the verification documents are still good, which means if we issued the cert today, the issuance would pass without further checks (since the data itself is good for 825 days). Because of this, customers with domains that didn’t prohibit Digicert in their CAA record (anywhere in the chain) could simply reissue the certificate without a problem. We could require this of all customers. For the 16, issuance would fail if the CAA check was performed today. Therefore, we want to revoke those.
The one reason I wanted more time to respond is that we think we may have most CAA records in our Splunk data for the time of issuance. Our new plan is that we will revoke all certs unless we can confirm the CAA record was permissive at the time of issuance. I don’t know the number of certs that we will revoke yet. I’ll post an update when we compare the Splunk data to the issuance data.
Thanks for answering. I was hoping you had a more thorough analysis ;) I do have other questions about the implementation details, but I'll add those to the bug, so we can focus this discussion on the immediate remediation steps.
I guess my reservation with such an approach (and this is more a metapoint) is consider issuing an EV certificate without having the supporting documentation and/or without validating the documentation. You later come back to the documents, validate them, and find out you got lucky - the information was actually correct, even though the controls failed and the process wasn't followed. Do you revoke the certificates, on the basis the process failed, or do you not revoke them, because they were eventually consistent?
This might sound like a hypothetical, but it's a question this industry has faced in the past , and browsers have reached different conclusions than CAs. It's not immediately clear to me how the proposed response here differs from those past responses, and may highlight some of the difference in philosophies here. An analysis that considered these past events, and how they were received by the community, and how there may be different facts here that lead to different conclusions, would be useful in both validating and justifying the proposed course of action.
The real problem was the CA would kick off a request to the CAA checker. If the CA encountered an error, the request would time out. The CAA record may still have checked the CAA records appropriately but the CA never pulled the information to verify issuance authorization. So it’s a mis-issuance unless we can pull the data and prove it wasn’t. Combing through the archive data will take a while.