Thunderbird, RNP and the importance of a good API





I recently had a chat with a Thunderbird developer   about API design. During this conversation, I shared my thoughts on RNP , a  new implementation of OpenPGP that Thunderbird recently started using instead of GnuPG .



The interlocutor was skeptical about my thesis that the RNP API needs to be improved, and asked, "Isn't it subjective - which API is better and which is worse?" I agree that we don't have good metrics for evaluating APIs. But I disagree that we, in principle, are not able to judge the API.



In fact, I suspect that most experienced programmers will recognize a bad API if they see it. I think further in this article it will turn out to develop a good heuristic, which I will try to build on my own experience with (and over) GnuPG,  Sequoiaand RNP. Then I'll take a look at the RNP API. Unfortunately, not only can this API be easily misused, it is also deceiving and should not yet be used in contexts where security is critical. But Thunderbird's target audience is people known to be vulnerable, such as journalists, activists, lawyers and their communication partners; all these people need protection. In my opinion, this means that Thunderbird should think once more about whether to use RNP.



Note: I also suggest reading this email:  Let's Use GPL Libraries in Thunderbird! which I posted  to the Thunderbird Development Planning post .



What are the traits of a bad API?



Before we started Sequoia with Justus and Kai  , the three of us worked on GnuPG . We not only delved into gpg ourselves, but also talked and collaborated with many subsequent gpg users. People were able to say a  lot of good things about GnuPG .





As far as criticism of gpg is concerned, the most significant were two kinds of criticism about the API. The first boils down to this: The gpg API is too dogmatic. For example, gpg uses a keyring approach. Thus, you can only view or use an OpenPGP certificate if it has been imported into a personal keybase. But some developers want to look at the certificate first, and only then import it. For example, when you search for a certificate on a key server by its fingerprint, you can check and make sure that the returned certificate is really the one you need.because its URL is self-authenticating. This can be done using gpg, but only in a workaround way, bypassing the principles of the programming model that is embedded in it. The basic idea is this: create a temporary directory, add a configuration file to it, tell gpg to use an alternative directory, import the certificate there, check the certificate, and then clear the temporary directory. This is an official recommendation added by  Justus  based on our conversations with subsequent gpg users. Yes, this method works. But it requires writing code that is specific to the operating system, this code is slow and bugs are often introduced in it.



Another class of remarks that we have come across many times is that to work with gpg, you need to know a lot of non-obvious things - so as not to abuse this mechanism. Or, to put it another way, you need to be very careful when using the gpg API to avoid inadvertently introducing a vulnerability into your code.



To better understand the second concern, consider the EFAIL vulnerabilities ... The main problem with the gpg decryption API: when decrypting a message, gpg will give out plain text, even if the input was corrupted. gpg does return an error in this case, but some programs still output plain text in a corrupted form. So why not? It is definitely better to show at least part of the message than to show nothing, right? Well, the EFAIL vulnerabilities demonstrate how an attacker can take advantage of this to inject a web bug  into an encrypted message. When a user views this post, a web bug leaks out of the post. Uf.



So, whose fault is this bug? The GnuPG  developers  insisted that the problem is at the application level, in that they use gpg incorrectly:



It is recommended that mail user agents respect the DECRYPTION_FAILED status code and do not display data, or at least find an appropriate way to display potentially corrupted mail without creating an oracle and informing the user that the mail does not inspire confidence.



gpg signaled an error; applications do not honor the API contract. I have to agree with the GnuPG developers and add: the gpg interface was (and still is) a ticking time bomb because it doesn't tell the user how to proceed. On the contrary, an easy and seemingly beneficial action is wrong. And an  API of this  kindare unfortunately common in GnuPG.



What makes a good API?



Realizing these two things - that the gpg API is too dogmatic and difficult to use properly - shaped my plans. When we started the Sequoia project, we agreed that we wanted to avoid such mistakes. Based on our observations, we put into practice two tests that we continue to use as reference points for the development of the Sequoia API. First, in  addition to any high-level API, there must be a low-level API that is not dogmatic - in the sense that it does not prevent the user from doing anything that is not prohibited. At the same time, the  API should guide the user to the right (hard-coded) things, making the right actions easy to execute and most obvious when choosing an action .



To accomplish these two slightly conflicting goals of making everything possible, but preventing errors, we have relied particularly heavily on two tools: types and examples. Types make it difficult to use an object in an unintended way because the API contract is formalized at compile time and even enforces specific conversions . Examples - code snippets - will be  copied . Therefore, good examples will not only teach users how to use the function correctly, but will also greatly influence how they will use it.



Types



I'll show you with an example how we use types in Sequoia, and how they help us make a good API. To make the example clearer, it will be useful to recall some context regarding OpenPGP.





OpenPGP



There are several fundamental data types in OpenPGP, namely certificates, components (such as keys and user IDs), and binding signatures. The root of the certificate is the primary key that fully identifies the certificate's fingerprint (fingerprint = Hash (primary key)). A certificate usually includes components such as subkeys and user IDs. OpenPGP binds a component to a certificate using a so-called binding signature. When we use a regular primary key hash as a fingerprint and use signatures to bind components to the primary key, conditions are created so that additional components can be added later. The binding signatures also include properties. Therefore, it is possible to change the component, for example, to extend the validity period of the subkey.As a consequence, multiple valid signatures can be associated with a particular component. Anchor signatures are not only fundamental, but also integral to the OpenPGP security mechanism.



Since there can be many valid binding signatures, you need a way to choose the one you want. As a first approximation, suppose the signature we need is the most recent, unexpired, unrevoked valid signature that has not been postponed for the future. But what is a valid signature? At Sequoia, the signature must not only pass a mathematical check, but also be consistent with the policy. For example, due to our ability to resist compromised collisions , we only  allow SHA-1 in a very small number of situations . ( Paul Schaub , working on  PGPainless , recently  wrote at length about these complexities..) By forcing the API user to keep all of these considerations in mind, we create a breeding ground for vulnerabilities. In Sequoia, the easy way to get the expiration time is the safe way. Consider the following code, which works as expected:



let p = &StandardPolicy::new();
 
let cert = Cert::from_str(CERT)?;
for k in cert.with_policy(p, None)?.keys().subkeys() {
    println!("Key {}: expiry: {}",
             k.fingerprint(),
             if let Some(t) = k.key_expiration_time() {
                 DateTime::<Utc>::from(t).to_rfc3339()
             } else {
                 "never".into()
             });
}
      
      







cert



 Is a certificate. We start by applying the policy to it. (The policies are user-defined, but as a rule,  StandardPolicy is  not only sufficient but also the most appropriate). In fact, this is where the certificate view is created, in which only the components with a valid binding signature are visible. Importantly, it also modifies and introduces a number of new methods. The keys method, for example, has been changed to  return ValidKeyAmalgamation   instead of KeyAmalgamation . (This is a merge, since the result includes not only Key, but all signatures associated with it; some believe that this process would be better called Katamari... ¯ \ _ (ツ) _ / ¯) ValidKeyAmalgamation has a valid anchor signature according to the above criteria. It also provides methods like key_expiration_time, which only makes sense with a valid key! Also note that the return type used with key_expiration_time is ergonomic. Instead of returning a raw value, key_expiration_time returns SystemTime , which is safe and easy to use.



In line with our first “allow all” principle, the developer still retains access to single signatures  and  explores subpackagesto find out from a different signature binding when the key expires. But, compared to the way Sequoia API is supposed to correctly know the expiration date of a key, any other approach would contradict the API. This is a good API in our opinion.



Examples of



The 1.0 release of  the Sequoia library took place in December 2020. Nine months before that, we entered a feature complete situation and were ready for release. But they waited . It took us the next nine months to add documentation and examples to the public API. Take a look at the Cert data structure documentation for   an example, see what we get. As pointed out in our post, we were unable to provide examples for every feature down to one, but we did quite a bit. As a bonus to writing the examples, we also managed to find a few rough edges, which we polished in the process.



After the release, we were able to talk to many of the developers who included Sequoia in their code. A common thread through their feedback was recognition of how useful both the documentation and the examples were. We can confirm that although this is our code, we look into the documentation almost daily and copy our own examples. It's easier. Since the examples show how to use a particular function correctly, why re-do it from scratch?



RNP API



RNP  is a fresh implementation of OpenPGP, developed primarily by Ribose . About  two years ago , Thunderbird decided to integrate  Enigmail  into Thunderbird and at the same time  replace GnuPG with RNP . The fact that Thunderbird chose RNP is not only flattering for RNP; it also means that RNP has become arguably the most requested implementation of OpenPGP for encrypting mail.



Criticism is easy to perceive as negative. I want to be very clear: I think that the work that Ribose is doing is good and important, I am grateful to them for investing time and effort in a new implementation of OpenPGP. The OpenPGP ecosystem desperately needs to add variety. But this is not an excuse for releasing an immature product for use in a security-critical context.



Security-critical infrastructure



Unfortunately, RNP has not yet reached a state where, in my opinion, it can be safely deployed. Enigmail was used not only by people concerned about the privacy of their data, but also by journalists, activists and lawyers who care about their own safety and the safety of their interlocutors. In a 2017 interview, Benjamin Ismail, head of the Asia-Pacific chapter of Reporters Without Borders , said:



We mainly use GPG to communicate freely with our sources. The information they give us about human rights and violations of these rights is not safe for them, therefore it is necessary to protect the integrity of our conversations. 



Interview with Benjamin Ismail  from the organization  Reporters Without Borders



It is critical that Thunderbird continues to provide these users with the safest possible experience, even during this transition period.



RNP and subkey binding signatures



In talking about how we use types in Sequoia to make it harder to misuse the API, I showed you how to find out the expiration date of a key in just a few lines of code. I wanted to start with an example demonstrating to a person inexperienced in OpenPGP or RNP how the same functionality can be implemented using RNP. The following code iterates over the certificate subkeys (key) and displays the expiration date for each subkey. As a reminder, the expiration time is stored in the subkey binding signature, and a value of 0 indicates that the key will never expire. 



int i;
for (i = 0; i < sk_count; i ++) {
  rnp_key_handle_t sk;
  err = rnp_key_get_subkey_at(key, i, &sk);
  if (err) {
    printf("rnp_key_get_subkey_at(%d): %x\n", i, err);
    return 1;
  }
 
  uint32_t expiration_time;
  err = rnp_key_get_expiration(sk, &expiration_time);
  if (err) {
    printf("#%d (%s). rnp_key_get_expiration: %x\n",
           i + 1, desc[i], err);
  } else {
    printf("#%d (%s) expires %"PRIu32" seconds after key's creation time.\n",
           i + 1, desc[i],
           expiration_time);
  }
}
      
      





I tested this code on a certificate with five subkeys. The first subkey has a valid binding signature and does not expire; the second has a valid binding signature and will expire in the future; the third has a valid binding signature but has already expired; the fourth has an invalid binding signature, according to which the subkey will expire in the future; the fifth signature has no anchor at all. Here's the output:



#1 (doesn't expire) expires 0 seconds after key's creation time.

#2 (expires) expires 94670781 seconds after key's creation time.

#3 (expired) expires 86400 seconds after key's creation time.

#4 (invalid sig) expires 0 seconds after key's creation time.

#5 (no sig) expires 0 seconds after key's creation time.
      
      







First, note that the rnp_key_get_expiration call succeeds, regardless of whether the subkey has a valid binding signature, an invalid binding signature, or no binding signature at all. If you read the  documentation , this behavior seems a little surprising. It says: 



       .

 : 0 ,     .

      
      





Since the key expiration time is stored in the binding signature, as an OpenPGP expert I understand it this way: a call to rnp_key_get_expiration will only succeed if the subkey has a valid binding signature. In fact, it turns out that if there is no valid binding signature, then the function simply defaults to 0, which, given the above remark, the API user would expect to interpret as: this key is valid indefinitely.



To improve this code, you first need to check if the key has a valid binding signature. Several functions to do just that were recently added to RNP to address  CVE-2021-23991 . In particular, RNP developers added the rnp_key_is_valid function to return information about whether a key is valid. This add-on improves the situation, but requires the developer to explicitly choose whether these security critical checks should be performed (rather than explicitly abandoning the checks already set - as would be the case with Sequoia). Because security checks are not about doing useful work, it's easy to forget about them: the code works even if no security checks have been performed. And since it takes expert knowledge to make the right choice of what to check, checks are forgotten.



The following code provides security checks and skips any keys that rnp_key_is_valid deems invalid:



int i;
for (i = 0; i < sk_count; i ++) {
  rnp_key_handle_t sk;
  err = rnp_key_get_subkey_at(key, i, &sk);
  if (err) {
    printf("rnp_key_get_subkey_at(%d): %x\n", i, err);
    return 1;
  }
 
  bool is_valid = false;
  err = rnp_key_is_valid(sk, &is_valid);
  if (err) {
    printf("rnp_key_is_valid: %x\n", err);
    return 1;
  }
 
  if (! is_valid) {
    printf("#%d (%s) is invalid, skipping.\n",
           i + 1, desc[i]);
    continue;
  }
 
  uint32_t expiration_time;
  err = rnp_key_get_expiration(sk, &expiration_time);
  if (err) {
    printf("#%d (%s). rnp_key_get_expiration: %x\n",
           i + 1, desc[i], err);
  } else {
    printf("#%d (%s) expires %"PRIu32" seconds after key's creation time.\n",
           i + 1, desc[i],
           expiration_time);
  }
}

      
      







Output:



#1 (doesn't expire) expires 0 seconds after key's creation time.

#2 (expires) expires 94670781 seconds after key's creation time.

#3 (expired) is invalid, skipping.

#4 (invalid sig) is invalid, skipping.

#5 (no sig) is invalid, skipping.
      
      







This code correctly skips two keys that do not have a valid binding signature, but it also skips an expired key - which is probably not what we wanted, even though the documentation warns us that this function "checks ... expiration dates."



Although it also happens that we do not want to use an expired key or certificate, sometimes we resort to them. For example, if the user forgets to renew the key, then he should be able to see that the key has expired, and then check the certificate, and also renew the key in this case. Although it gpg --list-keys



 does not show expired keys, when editing a certificate, expired subkeys are still visible, so that the user can renew their validity:



$ gpg --edit-key 93D3A2B8DF67CE4B674999B807A5D8589F2492F9

Secret key is available.

sec  ed25519/07A5D8589F2492F9

     created: 2021-04-26  expires: 2024-04-26  usage: C   

     trust: unknown       validity: unknown

ssb  ed25519/1E2F512A0FE99515

     created: 2021-04-27  expires: never       usage: S   

ssb  cv25519/8CDDC2BC5EEB61A3

     created: 2021-04-26  expires: 2024-04-26  usage: E   

ssb  ed25519/142D550E6E6DF02E

     created: 2021-04-26  expired: 2021-04-27  usage: S   

[ unknown] (1). Alice <alice@example.org>
      
      







There are other situations in which an expired key should not be invalidated. Suppose, for example, that Alice sends Bob a signed message: "I will pay you 100 euros for a year," and the signature key expires in six months. When the year is over, will Alice owe Bob based on this signature? Yes, I think so. The signature was valid when it was affixed. The fact that the key has already expired is irrelevant. Of course, when the key has expired, the signatures sealed by it after the moment of its expiration should be considered invalid. Likewise, a message should not be encrypted with an expired key.



In short, whether a key should be considered valid is highly context sensitive. rnp_key_is_valid is better than nothing, but despite the name, this function is quite nuanced in determining if a key is valid.



As part of that commit, the second function has been added  rnp_key_valid_till



. This function returns "a timestamp before which the key can be considered valid ... If the key was never valid, zero is returned as the value." Using this function, you can determine whether the key was ever valid, for this you need to check if this function returns a non-zero value:



int i;
for (i = 0; i < sk_count; i ++) {
  rnp_key_handle_t sk;
  err = rnp_key_get_subkey_at(key, i, &sk);
  if (err) {
    printf("rnp_key_get_subkey_at(%d): %x\n", i, err);
    return 1;
  }
 
  uint32_t valid_till;
  err = rnp_key_valid_till(sk, &valid_till);
  if (err) {
    printf("rnp_key_valid_till: %x\n", err);
    return 1;
  }
 
  printf("#%d (%s) valid till %"PRIu32" seconds after epoch; ",
         i + 1, desc[i], valid_till);
 
  if (valid_till == 0) {
    printf("invalid, skipping.\n");
    continue;
  }
 
  uint32_t expiration_time;
  err = rnp_key_get_expiration(sk, &expiration_time);
  if (err) {
    printf("rnp_key_get_expiration: %x\n", err);
  } else {
    printf("expires %"PRIu32" seconds after key's creation time.\n",
           expiration_time);
  }
}
      
      







Results:



#1 (doesn't expire) valid till 1714111110 seconds after epoch; expires 0 seconds after key's creation time.

#2 (expires) valid till 1714111110 seconds after epoch; expires 94670781 seconds after key's creation time.

#3 (expired) valid till 1619527593 seconds after epoch; expires 86400 seconds after key's creation time.

#4 (invalid sig) valid till 0 seconds after epoch; invalid, skipping.

#5 (no sig) valid till 0 seconds after epoch; invalid, skipping.
      
      







Now we got the results we wanted! We display the correct expiration times for the first three subkeys, and also indicate that the last two subkeys are invalid.



But let's take a closer look  rnp_key_valid_till



. First, in OpenPGP, the key expiration time is stored as an unsigned 32-bit indent from the time the key was created, also in unsigned 32-bit format. Therefore, the function would have to use a wider type, or at least check the code for overflows. (I  reported this issue and it has already been fixed.)



But, even if we ignore this jamb, the function is still strange. In OpenPGP, a key can be valid for several periods of time. Let's say the key expires on July 1, and the user renews it only from July 10. During the period from July 1 to July 10, the key was invalid, and signatures generated during this time should also be considered invalid. So what should the considered function return for such a key? More importantly, how should a user of such an API interpret the result? Is it appropriate to use such an API at all? ( Yes, I asked .)



At Sequoia, we went the other way. Instead of returning information that the key is valid, we flip the situation; API user might ask:  is this key valid at time t . In our experience, this is all that was actually required in all cases known to us.



Don't think I'm picking on this particular issue with the RNP API on purpose. This is just a complication that I was thinking about recently. When we reimplemented the RNP API to create  an alternative OpenPGP backend  for Thunderbird, we faced  many similar issues .



Conclusion



The mistakes made by the RNP developers are understandable and excusable. OpenPGP is complex, like many other protocols. But it can be greatly simplified if we strive to keep it flexible and reliable PKI , and not just have a file encryption tool. 



However, the RNP API is dangerous. Thunderbird is  used  in security-critical contexts. In a 2017 interviewMichal 'Rysiek' Wozniak  of the Center for Organized Crime and Corruption Research (OCCRP) made it clear that someone's life is at stake:



I really strongly believe that if we hadn't been using GnuPG all this time, many of our informants and journalists would be in danger or behind bars ...



Interview  with  Michal 'Rysiek' Wozniak from the Center for the Study of Corruption and Organized Crime



How will this affect Thunderbird? I see three options. First, Thunderbird could switch back to Enigmail. You might think porting Enigmail to Thunderbird 78 would be difficult, but I've heard from many Thunderbird developers that this is technically feasible with a lift. But one of the reasons Thunderbird chose to move away from Enigmail is the enormous amount of time the Enigmail developers had to spend helping users install and configure GnuPG correctly. Therefore, this path is not ideal.



Second, Thunderbird could switch to a different OpenPGP implementation. There are a whole bunch of them nowadays.   to choose from. Personally, I think Thunderbird should have switched to Sequoia. Of course, I'm a Sequoia developer, so I'm biased. But it's not about the money: the fund pays me, and on the free market I would be offered, perhaps, twice as much as I earn now. I work to protect users. But, even aside from the Sequoia API and the implementation benefits, Thunderbird also wins in this case in one more respect: we've already made this implementation work. A few weeks ago we released Octopus , an alternative OpenPGP backend for Thunderbird. It not only has functional parity with RNP, but also has a number of previously missing features, for example, integration with gpg, as well as patched some security holes and fulfilled several non-functional requirements.



Third, Thunderbird could have stopped using OpenPGP altogether. This decision does not suit me. But on several occasions I have been concerned about the security of Thunderbird's most vulnerable users, and I believe that not providing any OpenPGP support at all is perhaps even safer than the status quo.






Macleod VPS are ideal for API development.



Register using the link above or by clicking on the banner and get a 10% discount for the first month of renting a server of any configuration!






All Articles