pgpainless/misc/OpenPGPMessageFormat.md

9.9 KiB

Pushdown Automaton for the OpenPGP Message Format

See RFC4880 §11.3. OpenPGP Messages for the formal definition.

A simulation of the automaton can be found here.

RFC4880 defines the grammar of OpenPGP messages as follows:

   OpenPGP Message :- Encrypted Message | Signed Message |
                      Compressed Message | Literal Message.

   Compressed Message :- Compressed Data Packet.

   Literal Message :- Literal Data Packet.

   ESK :- Public-Key Encrypted Session Key Packet |
          Symmetric-Key Encrypted Session Key Packet.

   ESK Sequence :- ESK | ESK Sequence, ESK.

   Encrypted Data :- Symmetrically Encrypted Data Packet |
         Symmetrically Encrypted Integrity Protected Data Packet

   Encrypted Message :- Encrypted Data | ESK Sequence, Encrypted Data.

   One-Pass Signed Message :- One-Pass Signature Packet,
               OpenPGP Message, Corresponding Signature Packet.

   Signed Message :- Signature Packet, OpenPGP Message |
               One-Pass Signed Message.

   In addition, decrypting a Symmetrically Encrypted Data packet or a
   Symmetrically Encrypted Integrity Protected Data packet as well as
   decompressing a Compressed Data packet must yield a valid OpenPGP
   Message.

This grammar can be translated into a pushdown automaton with the following graphical representation:

graph LR
    start((start)) -- "ε,ε/m#" --> pgpmsg((OpenPGP Message))
    pgpmsg -- "Literal Data,m/ε" --> literal((Literal Message))
    literal -- "ε,#/ε" --> accept((Valid))
    literal -- "Signature,o/ε" --> sig4ops((Corresponding Signature))
    sig4ops -- "Signature,o/ε" --> sig4ops
    sig4ops -- "ε,#/ε" --> accept
    pgpmsg -- "OnePassSignature,m/o" --> ops((One-Pass-Signed Message))
    ops -- "ε,ε/m" --> pgpmsg
    pgpmsg -- "Signature,m/ε" --> signed((Signed Message))
    signed -- "ε,ε/m" --> pgpmsg
    pgpmsg -- "Compressed Data,m/ε" --> comp((Compressed Message))
    comp -. "ε,ε/m" .-> pgpmsg
    comp -- "ε,#/ε" --> accept
    comp -- "Signature,o/ε" --> sig4ops
    pgpmsg -- "SKESK|PKESK,m/k" --> esks((ESKs))
    pgpmsg -- "Sym. Enc. (Int. Prot.) Data,m/ε" --> enc
    esks -- "SKESK|PKESK,k/k" --> esks
    esks -- "Sym. Enc. (Int. Prot.) Data,k/ε" --> enc((Encrypted Message))
    enc -. "ε,ε/m" .-> pgpmsg
    enc -- "ε,#/ε" --> accept
    enc -- "Signature,o/ε" --> sig4ops

Formally, the PDA is defined as M = (\mathcal{Q}, \Sigma, \Upgamma, \delta, q_0, Z, F), where

  • \mathcal{Q} is a finite set of states
  • \Sigma is a finite set which is called the input alphabet
  • \Upgamma is a finite set which is called the stack alphabet
  • \delta is a finite set of \mathcal{Q}\times(\Sigma\cup\textbraceleft\epsilon\textbraceright)\times\Upgamma\times\mathcal{Q}\times\Upgamma^*, the transition relation
  • q_0\in\mathcal{Q} is the start state
  • Z\in\Upgamma is the initial stack symbol
  • F\subseteq\mathcal{Q} is the set of accepting states

In our diagram, the initial state is q_0 = \text{start}. The initial stack symbol is Z=\epsilon (TODO: Make it #?). The set of accepting states is F=\textbraceleft\text{valid}\textbraceright. \delta is defined by the transitions shown in the graph diagram.

The input alphabet \Sigma consists of the following OpenPGP packets:

  • \text{Literal Data}: Literal Data Packet
  • \text{Signature}: Signature Packet
  • \text{OnePassSignature}: One-Pass-Signature Packet
  • \text{Compressed Data}: Compressed Data Packet
  • \text{SKESK}: Symmetric-Key Encrypted Session Key Packet
  • \text{PKESK}: Public-Key Encrypted Session Key Packet
  • \text{Sym. Enc. Data}: Symmetrically Encrypted Data Packet
  • \text{Sym. Enc. Int. Prot. Data}: Symmetrically Encrypted Integrity Protected Data Packet

Additionally, \epsilon is used to transition without reading OpenPGP packets or popping the stack.

The following stack alphabet \Upgamma is used:

  • m: OpenPGP Message
  • o: One-Pass-Signature packet.
  • k: Encrypted Session Key
  • #: Terminal for valid OpenPGP messages

Note: The standards document states, that Marker Packets shall be ignored as well. For the sake of readability, those transitions are omitted here.

The dotted line indicates a nested transition. For example, the transition (\text{Compressed Message}, \epsilon, \epsilon, \text{OpenPGP Message}, m) indicates, that the content of the Compressed Data packet itself is an OpenPGP Message.