How to pass in valid values into cleartext_keyset_json to create a Tink key

777 Views Asked by At

In Tink, it is possible to load and write cleartext keysets as jsons. An non-working example is seen below:

{
  "primaryKeyId": 2800579,
  "key": [
    {
      "keyData": {
        "typeUrl": "type.googleapis.com/google.crypto.tink.AesGcmKey",
        "value": "ODA9eJX9wcAGwZocL0Jym==",
        "keyMaterialType": "SYMMETRIC"
      },
      "status": "ENABLED",
      "keyId": 2800579,
      "outputPrefixType": "TINK"
    }
  ]
}

My question is- is it possible to insert your own values into the various key/value pairs to get another valid keyset? I have experimented with this and haven't had much success- mainly because of the "value" key which complains INVALID_ARGUMENT: Could not parse key_data.value as key type 'type.googleapis.com/google.crypto.tink.AesGcmKey' Any idea of what a valid "value" would be?

2

There are 2 best solutions below

5
Topaco On BEST ANSWER

First of all, the Base64 string of the value field in the posted code snippet is invalid, possibly a copy/paste error.

The following Python code uses Tink version 1.5.0 and creates and displays a keyset for AES-256/GCM as JSON:

import io
from tink import aead
from tink import tink_config
from tink import JsonKeysetWriter
from tink import new_keyset_handle
from tink import cleartext_keyset_handle

tink_config.register()

key_template = aead.aead_key_templates.AES256_GCM
keyset_handle = new_keyset_handle(key_template)

string_out = io.StringIO()
writer = JsonKeysetWriter(string_out)
cleartext_keyset_handle.write(writer, keyset_handle)

serialized_keyset = string_out.getvalue();
print(serialized_keyset);

The result is similar to the KeySet you posted and is e.g.:

{
  "primaryKeyId": 1794775293,
  "key": [
    {
      "keyData": {
        "typeUrl": "type.googleapis.com/google.crypto.tink.AesGcmKey",
        "value": "GiD5ojApaIM2MRpPhGf5sVMhxeA6NE5KjdzUxsJ0ChH/JA==",
        "keyMaterialType": "SYMMETRIC"
      },
      "status": "ENABLED",
      "keyId": 1794775293,
      "outputPrefixType": "TINK"
    }
  ]
}   

I haven't found a documentation that describes the structure in general or for the value field, but comparing the generated KeySets for different algorithms allows conclusions. If value is hex encoded, the result is:

1a20f9a23029688336311a4f8467f9b15321c5e03a344e4a8ddcd4c6c2740a11ff24

For AES-256/GCM it has 34 bytes, where the last 32 bytes are the actual key. The beginning is characteristic for the algorithm, the second byte indicates the size of the key, e.g. 0x1a10 for AES-128/GCM, 0x1a20 for AES-256/GCM or 0x1220 for ChaCha20Poly1305 (but can be more complex depending on the algorithm).

To use a self-defined key for AES-256/GCM, e.g.

000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f 

prepend 0x1a20, Base64 encode the result:

GiAAAQIDBAUGBwgJCgsMDQ4PEBESExQVFhcYGRobHB0eHw==

and apply this value instead of the old value in the above KeySet.

The modified KeySet can be loaded and used for encryption as follows:

from tink import JsonKeysetReader
from tink import cleartext_keyset_handle

serialized_keyset = '''
{
  "primaryKeyId": 1794775293,
  "key": [
    {
      "keyData": {
        "typeUrl": "type.googleapis.com/google.crypto.tink.AesGcmKey",
        "value": "GiAAAQIDBAUGBwgJCgsMDQ4PEBESExQVFhcYGRobHB0eHw==",
        "keyMaterialType": "SYMMETRIC"
      },
      "status": "ENABLED",
      "keyId": 1794775293,
      "outputPrefixType": "TINK"
    }
  ]
}   
'''
reader = JsonKeysetReader(serialized_keyset)
keyset_handle = cleartext_keyset_handle.read(reader)

plaintext = b'The quick brown fox jumps over the lazy dog'
aead_primitive = keyset_handle.primitive(aead.Aead)
tink_ciphertext = aead_primitive.encrypt(plaintext, b'')

The relationship between KeySet and the example key 0001...1e1f can be verified by decrypting the generated ciphertext using the example key without Tink, e.g. with PyCryptodome.

The format of the Tink ciphertext is described in Tink Wire Format, Crypto Formats. The first byte specifies the version, the next 4 bytes the key ID, followed by the actual data.
For GCM the actual data has the format nonce (12 bytes) || ciphertext || tag (16 bytes). Decryption is then possible with (using PyCryptodome):

from Crypto.Cipher import AES

key = bytes.fromhex('000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f')

prefix = tink_ciphertext[:5]
nonce = tink_ciphertext[5:5 + 12]
ciphertext = tink_ciphertext[5 + 12:-16]
tag = tink_ciphertext[-16:]

cipher = AES.new(key, AES.MODE_GCM, nonce=nonce)
cipher.update(b'')
decryptedText = cipher.decrypt_and_verify(ciphertext, tag)

print(decryptedText.decode('utf-8')) # The quick brown fox jumps over the lazy dog

which proves that the example key 0001...1e1f was correctly integrated into the KeySet.

0
Sophie Schmieg On

While the other answer contains some really cool work, the actual answer is slightly different (and we should add it to the Wireformat description). The value stored here is a serialized proto of the type associated with the KeyTypeManager that has the corresponding typeUrl registered. In this case, it is the proto AesGcmKey. AES-GCM does not have any parameters other than the key size (which is implicitly defined when given an AES-GCM key), so the proto contains a single byte array. (This explains the superfluous looking first byte, it's the proto serialization of the version and a single bytes field. XChaCha20Poly1305 also has no additional parameters, I assume the reason you are seeing different prefixes is due to equivalent but unequal proto serialization, in theory the proto's name should not influence the serialization, and both use the same tags (1 for version and 3 for the raw key material) and version 0)

If you look at more complicated key types like ECIES, the corresponding value would be far more involved (and containing even more serialized protos. It's protos all the way down).

We are planning to overhaul our key management layer somewhat substantially in the nearish future, which will make it easier to export/import keys that aren't using Tink's key format, without having to depend on proto definitions (which are supposed to be an internal implementation detail).