Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC4048: Authenticated key backup #4048

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
275 changes: 275 additions & 0 deletions proposals/4048-signed-key-backup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,275 @@
# MSC4048: Authenticated key backup

The [server-side key
backups](https://spec.matrix.org/unstable/client-server-api/#server-side-key-backups)
allows clients to store event decryption keys so that when the user logs in to
a new device, they can decrypt old messages. The current algorithm encrypts
the event keys using an asymmetric algorithm, allowing clients to upload keys to
the backup without necessarily giving them the ability to read from the
backup. For example, this allows for a partially-trusted client to be able to
read (and save the keys for) current messages, but not read old messages.

However, since the event decryption keys are encrypted using an asymmetric
algorithm, this allows anyone who knows the public key to write to the backup.
As a result, keys loaded from the backup must be marked as unauthenticated,
leading to [usability
issues](https://github.com/vector-im/element-web/issues/14323).

[MSC3270](https://github.com/matrix-org/matrix-spec-proposals/pull/3270) tries
to fix this issue by using a symmetric, authenticated encryption algorithm,
uhoreg marked this conversation as resolved.
Show resolved Hide resolved
which ensures that only someone who knows the secret key can write to the
backup. However this removes the ability for a client to be able to write to
the backup without being able to read from it.

We propose to continue using an asymmetric encryption algorithm in the backup,
but to ensure authenticity by producing a MAC using a key derived from the
backup's decryption key.

## Proposal

A user who has a key backup derives a new backup MAC key by performing HKDF on
the backup decryption key (as raw unencoded bytes) with no salt and an info
parameter of `"MATRIX_BACKUP_MAC_KEY"` and generating 32 bytes (256 bits):

backup_mac_key = HKDF("", decryption_key, "MATRIX_BACKUP_MAC_KEY", 32)

The backup MAC key can be shared using [the Secrets
module](https://spec.matrix.org/unstable/client-server-api/#secrets) using the
name `m.megolm_backup.v1.mac`. Note that if the backup decryption key (the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a bit more detail on that, and maybe the use case?
Something like:
It is possible in the protocol to only make part of the secrets available to certain devices. With that mecanism it's possible to have a session that is not authorised to access history (m.megolm_backup.v1 is not shared), but that yet remains able to participate in the backup for new keys (needs then only the m.megolm_backup.v1.mac). It would also allow clients to not cache the backup decryption key, but still participate in the backup.

WDYT?

Copy link
Member

@dkasak dkasak Sep 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also #4048 (comment) for some more nuance on this.

TL;DR, I would:

  1. Only store the backup decryption key in SSSS. That is, never store the MAC key.
  2. To be able to support write-only clients, the MAC key could be shared using Secret Sharing.
  3. The MAC key would therefore only ever be derived or shared directly via to-device.

Avoiding the storage of the MAC key will lower complexity and ease reasoning, because we will avoid hard-to-debug situations, such as the SSSS containing only the MAC key, and therefore all clients being able to write to the backup but none of them being able to read it.

secret using the name `m.megolm_backup.v1`) is shared, then the backup MAC key
does not need to be shared as it can be derived from the backup decryption
key. Since the backup decryption key is usually stored in Secret Storage, the
backup MAC key does not need to be stored.

### `m.backup.v2.curve25519-aes-sha2`

A new backup algorithm is defined, identified by the name
"`m.backup.v2.curve25519-aes-sha2`". In addition to incrementing the version
number, this name drops the "megolm", as it is expected that other types of
keys may be stored in it, for example [MLS
groups](https://github.com/matrix-org/matrix-spec-proposals/pull/4038).

The intention of creating a new backup algorithm is to prevent an attacker from
uploading additional keys that cannot be authenticated.

The `auth_data` is the same as with `m.megolm_backup.v1.curve25519-aes-sha2`.

The `session_data` is constructed as follows:

1. Encode the session key to be backed up as a JSON object using the
`SessionDataV2` format defined below.
2. Generate an ephemeral curve25519 key, and perform an ECDH with the ephemeral
key and the backup’s public key to generate a shared secret. The public half
of the ephemeral key, encoded using unpadded base64, becomes the `ephemeral`
property of the `session_data`.
3. Using the shared secret, generate 80 bytes by performing an HKDF using
SHA-256 as the hash, with a salt of 32 bytes of 0, and with the empty string
as the info. The first 32 bytes are used as the AES key, the next 32 bytes
are discarded, and the last 16 bytes are used as the AES initialization
vector. (This is the same as the key generation for
`m.megolm_backup.v1.curve25519-aes-sha2`, except that the generated MAC key
is discarded since it is unused.)
4. Stringify the JSON object, and encrypt it using AES-CBC-256 with PKCS#7
padding. This encrypted data, encoded using unpadded base64, becomes the
`ciphertext` property of the `session_data`.
5. Encode the `session_data` as canonical JSON, as would be done when [signing
JSON](https://spec.matrix.org/unstable/appendices/#signing-details), and
calculate the HMAC-SHA-256 MAC using the backup MAC key. The MAC is
base64-encoded (unpadded), and becomes the `backup_mac` property of the
`unsigned` property of `session_data`.

Thus the `session_data` property has `ephemeral`, `ciphertext`, and `unsigned`
properties, with the `unsigned` property having a `backup_mac` property.
Keys without an `unsigned`.`backup_mac` property, or with an incorrect MAC,
must be ignored.

When verifying the MAC, the `session_data` is encoded as canonical JSON,
following the procedure as when signing JSON. That is, any additional
properties, other than `signatures` and `unsigned`, are included. By putting
the MAC in `unsigned` this allows clients to reuse existing code used for
serializing JSON for signing.

The `SessionDataV2` has algorithm-dependent and algorithm-independent
properties. The algorithm-independent properties are:

- `algorithm`: (required string) the end-to-end message encryption algorithm that the
key is for. The values are the same as for the `algorithm` property in the
`m.room_key` event. For example, for Megolm keys, this is
`m.megolm.v1.aes-sha2`.
- `unauthenticated`: (optional string) if not present, the key is considered to
be authenticated, that is, the device that uploaded the key to the backup
believes that the key belongs to the recorded sender, as defined by the key
algorithm (with `m.megolm.v1.aes-sha2`, the sender is given in the
`sender_key` property). A key is considered to be authenticated if: a) the
key was received via an Olm-encrypted `m.room_key` event from the
`sender_key`, b) the key was received via a trusted key forward
([MSC3879](https://github.com/matrix-org/matrix-spec-proposals/pull/3879)),
or c) the key was downloaded from the key backup where it is marked as
authenticated, and the data can be authenticated (for example using the
method from this proposal).

If the key is not considered to be authenticated, this property indicates the
source of the key. Currently defined values are: `m.undefined`, which
indicates that the source is not specified; `m.legacy-v1`, which indicates
that the key was an unauthenticated key from a
`m.megolm_backup.v1.curve25519-aes-sha2` backup ([see
below](#migrating-keys)); and `m.forwarded_room_key`, which indicates that
the key came from an untrusted key forward. (FIXME: do we also want to
encode the source of the key forward?) Clients may create other values to
specify other sources, using the Java package naming convention; clients
should treat unknown values as `m.undefined`.

For the `m.megolm.v1.aes-sha2` algorithm, the algorithm-dependent properties
are the `forwarding_curve25519_key_chain`, `sender_claimed_keys`, `sender_key`,
and `session_key` properties defined for
`m.megolm_backup.v1.curve25519-aes-sha2`.

### `m.megolm_backup.v1.curve25519-aes-sha2`

Megolm keys may be uploaded to a `m.megolm_backup.v1.curve25519-aes-sha2`
backup using the `m.backup.v2.curve25519-aes-sha2` format, provided the
`session_data` also contains the `mac` property as required for the
`m.megolm_backup.v1.curve25519-aes-sha2` algorithm.

The [construction of the `session_data`
property](https://spec.matrix.org/unstable/client-server-api/#backup-algorithm-mmegolm_backupv1curve25519-aes-sha2)
thus becomes:

1. Encode the session key to be backed up as a JSON object using the
`SessionData`.
2. Generate an ephemeral Curve25519 key, and perform an ECDH with the ephemeral
key and the backup’s public key to generate a shared secret. The public half
of the ephemeral key, encoded using unpadded base64, becomes the `ephemeral`
property of the `session_data`.
3. Using the shared secret, generate 80 bytes by performing an HKDF using
SHA-256 as the hash, with a salt of 32 bytes of 0, and with the empty string
as the info. The first 32 bytes are used as the AES key, the next 32 bytes
are used as the MAC key, and the last 16 bytes are used as the AES
initialization vector.
4. Stringify the JSON object, and encrypt it using AES-CBC-256 with PKCS#7
padding. This encrypted data, encoded using unpadded base64, becomes the
`ciphertext` property of the `session_data`.
5. Pass the raw encrypted data (prior to base64 encoding) through HMAC-SHA-256
using the MAC key generated above. The first 8 bytes of the resulting MAC
are base64-encoded, and become the `mac` property of the `session_data`.
6. Encode the `session_data` as canonical JSON, as would be done when [signing
JSON](https://spec.matrix.org/unstable/appendices/#signing-details), and
calculate the HMAC-SHA-256 MAC using the backup MAC key. The MAC is
base64-encoded (unpadded), and becomes the `backup_mac` property of the
`unsigned` property of `session_data`.

FIXME: should the server compare the `unsigned`.`backup_mac` property when a
client uploads a key to the backup, when deciding whether to keep the existing
key or replace it with a new key?

To simplify logic, clients may treat `m.backup.v2.curve25519-aes-sha2`-format
keys with the same semantics as `m.megolm_backup.v1.curve25519-aes-sha2` keys
when they are in a `m.megolm_backup.v1.curve25519-aes-sha2` backup. That is,
clients may treat all keys in a `m.megolm_backup.v1.curve25519-aes-sha2` backup
as being unauthenticated, regardless of the presence or absence of the
`unsigned`.`backup_mac` property in the cleartext `session_data` property.

#### Migrating keys

When migrating keys from a `m.megolm_backup.v1.curve25519-aes-sha2` backup to a
`m.backup.v2.curve25519-aes-sha2` backup, keys without a
`unsigned`.`backup_mac` property in the cleartext `session_data` property, or
with an invalid MAC, must have the `unauthenticated` property set to
`m.legacy-v1` in the encrypted `SessionData`, regardless of whether the key
originally had an `unauthenticated` property, and a `unsigned`.`backup_mac`
property added to the cleartext `session_data`. If the same backup decryption
key is used for the old and new backups, keys that have an existing
`unsigned`.`backup_mac` property with a valid MAC may be uploaded to the new
backup unchanged, as they will be valid
`m.backup.v2.curve25519-aes-sha2`-format keys.

## Potential issues

For users with existing backups, in order to start storing backup keys using
this format, the user may need to enter their Secret Storage key so that the
client can obtain the backup decryption key, if it does not already have it
cached, in order to derive the backup MAC key. If a user has multiple clients,
one client may try to obtain the backup MAC key from other clients using Secret
Sharing, but it does not have a way of knowing which clients, if any, have the
backup MAC key.

## Alternatives

As mentioned above, we could switch to using a symmetric encryption algorithm
for the key backup. However, this is not backwards-compatible, and does not
allow for clients that can write to the backup without reading.

Rather than using a new MAC key, we could use an existing signing key, such as
one of the cross-signing keys. This would remove the need for users to enter
their Secret Storage key to add the new signing key. However, this means that
a user cannot create a key backup without also using cross-signing. Using a
separate key also allows the user to give someone else (such as a bot)
permission to write to their backups without allowing them to perform any
cross-signing operations.

A previous version of this MSC used a signing key that was generated randomly.
The method presented in the current version has the following advantages:

- No changes to `AuthData` are necessary, so a new backup version is not
required.
- A MAC is faster to calculate. The main advantage of a signature is that it
allows one to verify the signature without knowing the private key, but in
this case, reading is a more privileged action than writing, and writers
already need to know the private/secret key.
- Since the MAC key is derived from the decryption key, two clients can be
upgraded at the same time without interfering with each other, as they will
derive the same MAC key.
- The MAC is calculated after encryption, and hence is verified before
decryption, so we know that it is authenticated before we do any processing
on it.

A disadvantage of the currently-proposed method versus the previous proposal is
that migration requires that the user gives the client access to the backup
decryption key in order to derive the MAC key. However, in both proposals,
most clients would require that the user enter their default SSSS key, which
would give them access to the decryption key anyways.

## Security considerations

Being able to prove authenticity of keys may affect the deniability of
messages: if a user has a Megolm session in their key backup that is MAC'ed by
their backup MAC key, and the session data indicates that it originated from
one of their devices, this could be used as evidence that the Megolm session
did in fact come from them.

This is somewhat mitigated by the fact that obtaining the Megolm session
requires the decryption key for the backup. In addition, the deniability
property mainly refers to the fact that a recipient cannot prove the
authenticity of the message to a third party, and usually is not concerned with
preventing self-incrimination. And in fact, a confiscated device may already
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Just because I created a megolm session doesn’t mean that I was the one who encrypted the messages in it, as megolm is symmetric? So proving I own the creation of a key doesn’t achieve much in terms of deniability aiui; a given message could have been fabricated by the other party? (at least until you try to send a msg with the same ratchet key - but i guess the same would be true if the megolm session was entirely fabricated, in terms of happening at the wrong place relative to other megolm sessions)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Megolm session has a signing key that only the creator knows the private part. So while anyone can encrypt a message with the Megolm session, they won't be able to produce a correct signature, so the message won't be validated.

have enough information to sufficiently prove that the device's owner sent a
message.

## Unstable prefix

Until this MSC is accepted, the following unstable names should be used:

- the algorithm name `org.matrix.msc4048.curve25519-aes-sha2` should
be used in place of the name `m.backup.v2.curve25519-aes-sha2`.
- the property name `org.matrix.msc4048.unauthenticated` should be used in place
of `unauthenticated` in the `SessionData` object,
- the property name `org.matrix.msc4048.backup_mac` should be used in place of
the `backup_mac` property in the `unsigned` property,
- the SSSS identifier `org.matrix.msc4048.mac` should be used in place of
`m.megolm_backup.v1.mac`.

### Migration to stable names

After this MSC is accepted, clients that understand the
`org.matrix.msc4048.curve25519-aes-sha2` algorithm name should
migrate the user to a backup using the accepted version of the
`m.backup.v2.curve25519-aes-sha2` algorithm. Keys that use the unstable
property names should be re-uploaded using the stable names.

This includes migrating
`org.matrix.msc4048.curve25519-aes-sha2`-format keys uploaded to
`m.megolm_backup.v1.curve25519-aes-sha2` backups.

## Dependencies

None
Loading