openpgp-notes/book/source/04-certificates.md

<!--
SPDX-FileCopyrightText: 2023 The "Notes on OpenPGP" project
SPDX-License-Identifier: CC-BY-SA-4.0
-->

(certificates_chapter)=
# Certificates

OpenPGP fundamentally hinges on the concept of "OpenPGP certificates," also known as "OpenPGP public keys." These certificates are complex data structures essential for identity verification, data encryption, and digital signatures. Understanding their structure and function is pivotal to effectively applying the OpenPGP standard.

An OpenPGP certificate, by definition, does not contain private key material.

## Terminology: Understanding "keys"

The term "(cryptographic) keys" is central to grasping the concept of OpenPGP certificates. However, it can refer to different entities, making it a potentially confusing term. Let's clarify those differences.

### Public vs. private keys

The term "key," without additional context, can refer to either public or private asymmetric key material. Additionally, symmetric keys may be used in OpenPGP to encrypt private key material, adding a layer of security and complexity.

### Layers of keys in OpenPGP

In OpenPGP, the term "key" may refer to three distinct layers, each serving a unique purpose:

1. A (bare) ["cryptographic key"](asymmetric_key_pair) comprises the private and/or public parameters forming a key. For instance, in the case of an RSA private key, the key consists of the exponent `d` along with the prime numbers `p` and `q`.
2. An OpenPGP *component key* includes either an "OpenPGP primary key" or an "OpenPGP subkey." It is a building block of an OpenPGP certificate, consisting of a cryptographic keypair coupled with some invariant metadata, such as key creation time.
3. An "OpenPGP certificate" (or "OpenPGP key") consists of several component keys, identity components, and other elements. These certificates are dynamic, evolving over time as components are added, expire, or are marked as invalid.

The following section will delve into the OpenPGP-specific layers (2 and 3) to provide a clearer understanding of their roles within OpenPGP certificates.

For a discussion of private key material in OpenPGP, see the chapter {ref}`private_key_chapter`. Bindings that connect the components of a certificate are discussed in our chapter {ref}`component_signatures_chapter`. For much more detail on the internal (packet) structure of certificates and keys refer to our chapter {ref}`zoom_certificates`. Additionally, managing certificates, and understanding their authentication and trust models are vital topics. While this document only touches upon these aspects briefly, they are integral to working proficiently with OpenPGP.

## Structure of OpenPGP certificates

An OpenPGP certificate (or "OpenPGP key") is a collection of an arbitrary number of elements[^packets]:

[^packets]: In technical terms, the elements of an OpenPGP certificate are a collection of "packets." Each component key and identity component is internally represented as a packet. Another common type of packet is the "signature" packet, which connect the components of a certificate.

- Component keys
- Identity components
- Additional metadata, including connections between the certificate's components

This documentation collectively refers to component keys and identity components as "the components of a certificate."

```{figure} diag/OpenPGP_Certificate.png
:name: fig-openpgp-certificate
:alt: Depicts a box with white background and the title "OpenPGP certificate". In the box several other boxes and accompanying texts, representing component keys and User IDs, are shown. There are three component keys boxes with a green frame, each with a dotted lower-left section, that shows the text "key creation time" and the green public key symbol in the lower right area. All three have a title, a unique fingerprint below the box and a unique capability keyword, perpendicular to the box on the right side. The top-most component key box has a light-green background, with the title "Component Key (primary)" and capability keyword "certification". The second-to-top component key box has a white background, with the title "Component Key" and capability keyword "encryption". The lowest component key box has a white background, with the title "Component Key" and capability keyword "signing". There are two User ID boxes, each with a black frame, open to top left and lower right corner. Both boxes have a user icon on the top left side, the title "User ID" on the top right side and a User ID string at the bottom. The top box has "Alice Adams <alice@example.org>" and the lower box has "Alice" as User ID string.

Typical components in an OpenPGP certificate
```

Every element in an OpenPGP certificate revolves around a central component: the *OpenPGP primary key*. The primary key acts as a personal *certification authority* (CA) for the certificate's owner, enabling cryptographic statements regarding subkeys, identities, expiration, revocation, and more.

```{note}
OpenPGP certificates tend to have a long lifespan, with the potential for modifications (typically by their owner) over time. Components may be added or invalidated throughout a certificate's lifetime. However, once published, components [cannot be removed](append-only) from certificates.
```

## Component keys

An OpenPGP certificate usually contains multiple component keys. Component keys serve in one of two roles: either as an "OpenPGP primary key" or as an "OpenPGP subkey."

OpenPGP component keys logically consist of an [asymmetric cryptographic keypair](asymmetric_key_pair) and a creation timestamp. Once created, these attributes of a component key remain fixed (for ECDH keys, two additional parameters are part of a component key's constitutive data[^ecdh-parameters]).

[^ecdh-parameters]: For [ECDH](https://www.ietf.org/archive/id/draft-ietf-openpgp-crypto-refresh-10.html#name-algorithm-specific-part-for-ecd) component keys, two additional algorithm parameters are integral to the component key's constitutive and immutable properties. Those parameters specify a hash function and a symmetric encryption algorithm.

```{figure} diag/Component_Key.svg
:name: fig-component-key
:alt: Depicts a box with white background and no title. In the box one other box is shown. The inner box has a green frame, with a dotted lower-left section, that shows the text "key creation time" and the green public key symbol, as well as the red-dotted private key symbol in the lower right area. In the top left of the inner box the text reads "Component Key".

An OpenPGP component key
```

Component keys containing private key material also include metadata specifying the password protection scheme. This is another facet of metadata, akin to the aforementioned creation timestamp and additional parameters for certain algorithms. However, this discussion focuses on OpenPGP certificates, in which the component keys contain only the public part of its cryptographic key data. For information on private keys in OpenPGP, see {numref}`private_key_chapter`.

### Fingerprint

Each OpenPGP component key possesses an *OpenPGP fingerprint*. This fingerprint is derived from the public key material, the creation timestamp, and, when relevant, the ECDH parameters.

```{figure} diag/Fingerprint.png
:name: fig-fingerprint
:alt: Depicts a box with white background and the title "Fingerprint of an OpenPGP component key". Inside, another box with a green frame, the title "Component Key", the text "key creation time" on the lower left and a the green public key symbol on the lower right is shown. Below the component key box a fingerprint in a box with a light-yellow background and a yellow dotted line is depicted. The word "Fingerprint" is shown left of the box with the fingerprint and both are connected with a yellow dotted line.

Every OpenPGP component key is identifiable by a  fingerprint.
```

The fingerprint of our example OpenPGP component key is `C0A5 8384 A438 E5A1 4F73 7124 26A4 D45D  BAEE F4A3 9E6B 30B0 9D55 13F9 78AC CA94`[^keyid].

[^keyid]: In OpenPGP version 4, the rightmost 64 bits were sometimes used as a shorter identifier, called "Key ID."
For example, an OpenPGP version 4 certificate with the fingerprint `B3D2 7B09 FBA4 1235 2B41 8972 C8B8 6AC4 2455 4239` might be referenced by the 64-bit Key ID `C8B8 6AC4 2455 4239` or formatted as `0xC8B86AC424554239`.  
Historically, even shorter 32-bit identifiers were used, like this: `2455 4239`, or `0x24554239`. Such identifiers still appear in very old documents about PGP. However, [32-bit identifiers have been long deemed unfit for purpose](https://evil32.com/). At one point, 32-bit identifiers were called "short Key ID," while 64-bit identifiers were referred to as "long Key ID."

```{note}
In practice, the fingerprint of a component key is used like a unique identifier.

However, formally, a fingerprint is not unique. For every component key, other component keys with the same fingerprint exist, in theory. But because fingerprints are calculated using a [cryptographic hash algorithm](crypto-hash), it is practically impossible to find two different component keys that have the same fingerprint.
 ```

### Primary key

The OpenPGP primary key is a component key that serves a distinct, central role in an OpenPGP certificate:

- Its fingerprint acts as an identifier for the entire OpenPGP certificate.
- It facilitates lifecycle operations, such as adding or invalidating subkeys or identities within a certificate.

```{admonition} Terminology
:class: note

In the RFC, the OpenPGP primary key is occasionally referred to as "top-level key." Informally, it has also been termed the "master key."
```

### Subkeys

Modern OpenPGP certificates typically include several subkeys in addition to the primary key, although these subkeys are optional.

While subkeys have the same structural attributes as the primary key, they fulfill  different roles. Subkeys are cryptographically linked with the primary key, a relationship further discussed in {numref}`binding_subkeys`.

```{figure} diag/Subkeys.png
:name: fig-subkeys
:alt: Diagram depicting three component keys. The primary key is positioned at the top, designated for certification. Below it, connected by arrows, are two subkeys labeled as "for encryption" and "for signing," respectively.

OpenPGP certificates can contain multiple subkeys.
```

(identity_components)=
## Identity components

Identity components in an OpenPGP certificate are used by the certificate holder to state that they are known by a certain identifier (like a name, or an email address).

### User IDs in OpenPGP certificates

OpenPGP certificates can contain multiple [User IDs](https://www.ietf.org/archive/id/draft-ietf-openpgp-crypto-refresh-10.html#name-user-id-packet-tag-13). Each User ID associates the certificate with an identity.

```{figure} diag/user_ids.png
:name: fig-user-ids
:alt: Depicts a diagram with white background and the title "User IDs". Inside, a public primary component key for certification and a User ID is shown. A green arrow points from component key to User ID and is annotated with a signature.

Relationship of User ID to primary component key in an OpenPGP certificate
```

A typical User ID identity is a UTF-8-encoded string composed of a name and an email address. By convention, User IDs align with the format described in [RFC2822](https://www.rfc-editor.org/rfc/rfc2822) as a *name-addr*.

For further conventions on User IDs, refer to the document [draft-dkg-openpgp-userid-conventions-00](https://datatracker.ietf.org/doc/draft-dkg-openpgp-userid-conventions/), dated 25 August 2023.

**Split User IDs**

One proposed variant for encoding identities in User IDs is to use ["split User IDs"](https://dkg.fifthhorseman.net/blog/2021-dkg-openpgp-transition.html#split-user-ids). Although currently uncommon, there are currently no significant technical barriers to implementing this format[^dkg-split].

[^dkg-split]: Historically, the OpenPGP ecosystem faced challenges in this context. For further details, refer to Daniel Kahn Gillmor's January 2019 article, ["What were Separated User IDs"](https://dkg.fifthhorseman.net/blog/2019-dkg-openpgp-transition.html#what-were-separated-user-ids).

The rationale for split User IDs lies in the distinction between a name and an email address, which represent two separate facets of an individual's identity. Separating these elements simplifies the process for third parties tasked with certifying that an identity is legitimately connected to a certificate.

Consider this scenario: A third party is confident about the email-based identity of an individual (e.g.,`<alice@example.org>`) and is willing to certify it. However, they might not have sufficient knowledge about the person's name-based identity (e.g., `Alice Adams`), so are unwilling to extend the same level of certification. Split User IDs address this dichotomy by allowing distinct certification processes for each type of identity.

(primary_user_id)=
### Implications of the Primary User ID

Within a certificate, a specific User ID is designated as the [Primary User ID](https://www.ietf.org/archive/id/draft-ietf-openpgp-crypto-refresh-10.html#name-primary-user-id).

Each User ID carries associated preference settings, such as preferred encryption algorithms, which is detailed in {numref}`zooming_in_user_id`). When a certificate is used in the context of a specific identity, then the preferences associated with that identity component are used. When a certificate is used without reference to a specific identity, the preferences associated with the direct key signature, or the primary User ID take precedence by default.

The primary User ID was historically the main store for preferences that apply to the certificate as a whole. For more on this, see {ref}`primary-metadata`.

### User attributes in OpenPGP
While 
[user attributes](https://www.ietf.org/archive/id/draft-ietf-openpgp-crypto-refresh-10.html#name-user-attribute-packet-tag-1) are similar to User IDs, they are less commonly used.

Currently, the OpenPGP standard prescribes only one format to be stored in user attributes: an [image](https://www.ietf.org/archive/id/draft-ietf-openpgp-crypto-refresh-10.html#name-the-image-attribute-subpack), in JPEG format. Typically, this image represents the key owner, although it is not required.

## Linking the components

To form an OpenPGP certificate, individual components are interconnected by the certificate holder using their OpenPGP software. Within OpenPGP, this process is termed "binding," as in "a subkey is bound to the primary key." These bindings are realized using cryptographic signatures. An in-depth discussion of this topic can be found in {ref}`component_signatures_chapter`.

In very abstract terms, the primary key of a certificate acts as a root of trust or "certification authority." It is responsible for:

- issuing signatures that express the certificate holder's intent to use specific subkeys or identity components; 
- conducting other lifecycle operations, including setting expiration dates and  marking components as invalidated or "revoked."

By binding components using digital signatures, recipients of an OpenPGP certificate need only validate the authenticity of the primary key to use for their communication partner. Traditionally, this is done by manually verifying the *fingerprint* of the primary key. Once the validity of the primary key is confirmed, the validity of the remaining components can be automatically assessed by the user's OpenPGP software. Generally, components are valid parts of a certificate if there is a statement signed by the certificate's primary key endorsing this validity.

## Metadata capabilities, preferences, and storage

OpenPGP certificates, their component keys, and identities possess metadata that is not stored within the components it pertains to. Instead, this metadata is stored within signature packets, which are integral to the structure of an OpenPGP certificate.

Key attributes, such as capabilities (like *signing* or *encryption*) and expiration times, are examples of metadata not stored in the component key data. How this metadata is stored depends on the component:

- **Primary key metadata** is defined either through a direct key signature on the primary key (preferred in OpenPGP version 6), or by associating the metadata with the [Primary User ID](primary_user_id).

- **Subkey metadata** is defined within the [subkey binding signature](binding_subkeys) that links the subkey to the certificate.

- **User ID metadata** is is associated via the [certifying self-signature](bind_ident) that links the identity to the certificate.

It is crucial to note that the components of an OpenPGP certificate remain static after their creation. The use of signatures to store metadata allows for subsequent modifications without altering the original components. For instance, a certificate holder can update the expiration time of a component by issuing a new, superseding signature.

### Defining operational capabilities of component keys with key flags

Each component key has a set of ["key flags"](https://www.ietf.org/archive/id/draft-ietf-openpgp-crypto-refresh-10.html#key-flags) that delineate the operations a key can perform.

Commonly used key flags include:

- **Certification**: enables issuing third-party certifications
- **Signing**: allows the key to sign data
- **Encryption**: allows the key to encrypt data
- **Authentication**: primarily used for SSH authentication[^auth-flag]

[^auth-flag]: It's important to note that the function of the  [authentication](https://www.ietf.org/archive/id/draft-ietf-openpgp-crypto-refresh-12.html#name-authentication-via-digital-) key flag is unrelated to the authentication process used in certifying OpenPGP identities and linking them to certificates. Rather, this flag indicates a mechanism that uses cryptographic signatures to confirm control of private key material with a remote system.

```{note}
Distinct component keys handle specific operations. Only the primary key can be used for certification, although it can have additional capabilities. Subkeys can be used for signing, encryption, and authentication but cannot have the certification capability. A component key can technically have multiple capabilities. It is considered good practice, however, to [use separate keys for each capability](https://www.ietf.org/archive/id/draft-ietf-openpgp-crypto-refresh-12.html#section-10.1.5-7). 

Notably, in many algorithms, encryption and signing-related functionalities (i.e., certification, signing, authentication) are mutually exclusive, because the algorithms only support one of those two families of operations[^key-flag-sharing].
```

[^key-flag-sharing]: With ECC algorithms, it's impossible to combine encryption functions with those intended for signing. For example,  ed25519 is specifically used for signing; cv25519 is designated for encryption.

### Algorithm preferences and feature signaling

OpenPGP demonstrates significant ["cryptographic agility"](https://en.wikipedia.org/wiki/Cryptographic_agility). It doesn't rely on a single fixed set of algorithms. Instead, it defines a suite of cryptographic primitives from which users (or their applications) can choose.

This agility facilitates the easy adoption of new cryptographic primitives into the standard, allowing for a seamless transition. Users can gradually migrate to new cryptographic mechanisms without disruption.

However, this approach requires that OpenPGP software determine the cryptographic mechanisms that a set of communication partners can handle and prefer. OpenPGP employs several mechanisms for this purpose, which allow negotiation between sender and recipient. It's important to note that OpenPGP is not an online scheme; thus, this negotiation is effectively one-way. The active party interprets the preferences expressed in the certificate of the passive party.

Key negotiation mechanisms in OpenPGP include:

- [Preferred hash algorithms](https://www.ietf.org/archive/id/draft-ietf-openpgp-crypto-refresh-12.html#preferred-hashes-subpacket)
- [Preferred symmetric ciphers for v1 SEIPD](https://www.ietf.org/archive/id/draft-ietf-openpgp-crypto-refresh-12.html#preferred-v1-seipd)  
- [Preferred AEAD ciphersuites](https://www.ietf.org/archive/id/draft-ietf-openpgp-crypto-refresh-12.html#preferred-v2-seipd)
- [Features subpacket](https://www.ietf.org/archive/id/draft-ietf-openpgp-crypto-refresh-12.html#features-subpacket)
- [Preferred compression algorithms](https://www.ietf.org/archive/id/draft-ietf-openpgp-crypto-refresh-12.html#preferred-compression-subpacket)

Beyond these explicitly expressed preferences, implementations also deduce capabilities of communication partners based on the version of the OpenPGP certificate they possess.

#### User ID-specific preferences

As a starting point, a certificate has a set of preferences that apply generally. These are defined either in a direct key signature, or via the primary User ID of the certificate.

Additionally, OpenPGP allows modeling User ID-specific preferences. The idea is that a user may prefer a different suite of algorithms on their private email account compared to their work email account. Such identity-specific preferences can be expressed on the certifying signatures that bind User IDs to a certificate.

## Revocations

When a certificate owner needs to invalidate certain components of their certificate, or even the entire certificate, they accomplish this through "revocation." Revoking the primary key renders the entire certificate invalid.

Notably, revocations are not the only means by which components can become invalid. Other factors, such as the passing of a component's expiration time, can also render components invalid.

For more detailed information on revoking specific components of a certificate, see the section on {ref}`self-revocations`.

## Third-party (identity) certifications

Since its inception, third-party identity certifications have been a cornerstone of the OpenPGP ecosystem. The original PGP designers, starting with Phil Zimmermann, advocated for decentralized trust models over reliance on centralized authorities. This decentralized approach in OpenPGP is known as the ["Web of Trust."](wot)

Third-party certifications are statements by OpenPGP users confirming that a user with a specific identity is the owner of a  particular OpenPGP certificate.

For example, Bob's OpenPGP software may issue a certification that Bob has checked that the User ID `Alice Adams <alice@example.org>` and the certificate with the fingerprint `AAA1 8CBB 2546 85C5 8358 3205 63FD 37B6  7F33 00F9 FB0E C457 378C D29F 1026 98B3` are legitimately linked.

Take, for instance, a scenario where Bob's OpenPGP software issues a certification confirming as legitimate the link between the User ID `Alice Adams <alice@example.org>` and the certificate bearing the fingerprint `AAA1 8CBB 2546 85C5 8358 3205 63FD 37B6  7F33 00F9 FB0E C457 378C D29F 1026 98B3`.

This process assumes that Bob knows the person known as `Alice Adams` and is confident that `alice@example.org` is indeed Alice's email address. Bob also verifies that the certificate his OpenPGP software associates with Alice matches the one Alice uses. In essence, both users must have a certificate for Alice with an identical fingerprint. In OpenPGP version 6, manual fingerprint comparison by end-users is discouraged, with a replacement verification mechanism still under development. The verification process must occur over a sufficiently secure channel, such as an end-to-end encrypted video call or a face-to-face meeting.

For more on third-party certifications, see {ref}`third_party_cert`.

## Advanced topics

```{admonition} TODO
:class: warning

This section only contains notes and still needs to be written
```

### Certificate management / Evolution of a certificate over time

Minimized versions, merging, effective "append only" semantics, ...

### "Naming" a certificate in user-facing contexts - fingerprints and beyond

```{admonition} TODO
:class: warning

In v4, a 20 byte fingerprint in hex representation was used to name certificates, even in user-facing contexts.

For v6, this type of approach is discouraged, but a replacement mechanism is still pending.
```

### Merging

- How to merge two copies of the same certificate?
- Canonicalization

### How to generate "minimized" certificate?

### When are certificates valid?

- Full certificate: Primary revoked/key expired/binding signature expired,
- Subkey: Revoked/key expired/binding signature expired
- User ID: revoked, binding expired, ...

### Best practices regarding Key Freshness

```{admonition} TODO
:class: warning

- Expiry
- Subkey rotation

Wiktor suggests to check: https://blogs.gentoo.org/mgorny/2018/08/13/openpgp-key-expiration-is-not-a-security-measure/ for important material
```

### Metadata about the primary key: In Direct Key Signature vs. in Primary User ID, in v4 and v6

```{admonition} TODO
:class: warning

write
```

### Metadata leak of Social Graph

(unbound_user_ids)=
### Adding unbound User IDs to a certificate

```{admonition} TODO
:class: warning

references/links missing
```

Some OpenPGP subsystems may add User IDs to a certificate, which are not bound to the primary key by the certificate's owner. This can be useful to store local identity information (e.g., Sequoia's public store attaches "pet-names" to certificates, in this way).

### Third-party certification flooding

While a convenience for consumers, indiscriminately accepting and integrating third-party identity certifications comes with significant risks.

Without any restrictions in place, malicious entities can flood a certificate with excessive certifications. Called "certificate flooding," this form of digital vandalism grossly expands the certificate size, making the certificate cumbersome and impractical for users.

It also opens the door to potential denial-of-service attacks, rendering the certificate non-functional or significantly impeding its operation.

The popular [SKS keyserver network experienced certificate flooding firsthand](https://dkg.fifthhorseman.net/blog/openpgp-certificate-flooding.html), causing it to shut down operations in 2019.