Apply ASN.1 schema to provide interpretation of ASN.1 encoded data

121 Views Asked by At

I have done a nice ASN1 parser in C++. It takes an encoded file (e.g. a x509 certificate) and creates an internal tree structure.

The most obvious example is an X509 certificate sample. I store it internally as a tree and it could be printed as it follows:

SEQUENCE_OF (3 elements)
├───SEQUENCE_OF (8 elements)
|   ├───CONTEXT_SPECIFIC (3 bytes) (1 elements)
|   |   └───INTEGER (1 bytes): 0x02
|   ├───INTEGER (18 bytes): 0x03d415318e2c571d2905fc3e0527689d0d09
|   ├───SEQUENCE_OF (2 elements)
|   |   ├───OBJECTID: 1.2.840.113549.1.1.11 (PKCS #1 - sha256WithRSAEncryption)
|   |   └───NULL: null
|   ├───SEQUENCE_OF (3 elements)
|   |   ├───SET_OF (1 elements)
|   |   |   └───SEQUENCE_OF (2 elements)
|   |   |       ├───OBJECTID: 2.5.4.6 (X.520 DN component - countryName)
|   |   |       └───PRINTABLE_STRING (2 bytes): US
|   |   ├───SET_OF (1 elements)
|   |   |   └───SEQUENCE_OF (2 elements)
|   |   |       ├───OBJECTID: 2.5.4.10 (X.520 DN component - organizationName)
|   |   |       └───PRINTABLE_STRING (13 bytes): Let's Encrypt
|   |   └───SET_OF (1 elements)
|   |       └───SEQUENCE_OF (2 elements)
|   |           ├───OBJECTID: 2.5.4.3 (X.520 DN component - commonName)
|   |           └───PRINTABLE_STRING (26 bytes): Let's Encrypt Authority X3
|   ├───SEQUENCE_OF (2 elements)
|   |   ├───UTC_TIME: 19-09-29 16:33:36 UTC
|   |   └───UTC_TIME: 19-12-28 16:33:36 UTC
|   ├───SEQUENCE_OF (1 elements)
|   |   └───SET_OF (1 elements)
|   |       └───SEQUENCE_OF (2 elements)
|   |           ├───OBJECTID: 2.5.4.3 (X.520 DN component - commonName)
|   |           └───PRINTABLE_STRING (15 bytes): letsencrypt.org
|   ├───SEQUENCE_OF (2 elements)
|   |   ├───SEQUENCE_OF (2 elements)
|   |   |   ├───OBJECTID: 1.2.840.113549.1.1.1 (PKCS #1 - rsaEncryption)
|   |   |   └───NULL: null
|   |   └───BIT_STREAM (271 bytes): 0x00 30 82 01 0a 02 ...
|   └───CONTEXT_SPECIFIC (631 bytes) (1 elements)
|       └───SEQUENCE_OF (9 elements)
|           ├───SEQUENCE_OF (3 elements)
|           |   ├───OBJECTID: 2.5.29.15 (X.509 extension - keyUsage)
|           |   ├───BOOLEAN (1 bytes): true
|           |   └───OCTET_STREAM (4 bytes): 0x03 02 05 a0
|           |       └───BIT_STREAM (2 bytes): 0x05 a0
|           ├───SEQUENCE_OF (2 elements)
|           |   ├───OBJECTID: 2.5.29.37 (X.509 extension - extKeyUsage)
|           |   └───OCTET_STREAM (22 bytes): 0x30 14 06 08 2b 06 01 05 05 07 03 ...
|           |       └───SEQUENCE_OF (2 elements)
|           |           ├───OBJECTID: 1.3.6.1.5.5.7.3.1 (PKIX key purpose - serverAuth)
|           |           └───OBJECTID: 1.3.6.1.5.5.7.3.2 (PKIX key purpose - clientAuth)
|           ├───SEQUENCE_OF (3 elements)
|           |   ├───OBJECTID: 2.5.29.19 (X.509 extension - basicConstraints)
|           |   ├───BOOLEAN (1 bytes): true
|           |   └───OCTET_STREAM (2 bytes): 0x30 00
|           |       └───SEQUENCE_OF (0 elements)
|           ├───SEQUENCE_OF (2 elements)
|           |   ├───OBJECTID: 2.5.29.14 (X.509 extension - subjectKeyIdentifier)
|           |   └───OCTET_STREAM (22 bytes): 0x04 14 7c 2b a3 e7 3c 84 5f ...
|           |       └───OCTET_STREAM (20 bytes): 0x7c 2b a3 e7 3c 84 5f 38 ...
|           ├───SEQUENCE_OF (2 elements)
|           |   ├───OBJECTID: 2.5.29.35 (X.509 extension - authorityKeyIdentifier)
|           |   └───OCTET_STREAM (24 bytes): 0x30 16 80 14 a8 ...
|           |       └───SEQUENCE_OF (1 elements)
|           |           └───CONTEXT_SPECIFIC (20 bytes): 0xa8 4a 6a 63 04 7d dd ba e6 ...
|           ├───SEQUENCE_OF (2 elements)
|           |   ├───OBJECTID: 1.3.6.1.5.5.7.1.1 (PKIX private extension - authorityInfoAccess)
|           |   └───OCTET_STREAM (99 bytes): 0x30 61 30 2e 06 08 2b 06 ...
|           |       └───SEQUENCE_OF (2 elements)
|           |           ├───SEQUENCE_OF (2 elements)
|           |           |   ├───OBJECTID: 1.3.6.1.5.5.7.48.1 (PKIX OCSP - ocsp)
|           |           |   └───CONTEXT_SPECIFIC (34 bytes): http://ocsp.int-x3.letsencrypt.org
|           |           └───SEQUENCE_OF (2 elements)
|           |               ├───OBJECTID: 1.3.6.1.5.5.7.48.2 (PKIX subject/authority info access descriptor - caIssuers)
|           |               └───CONTEXT_SPECIFIC (35 bytes): http://cert.int-x3.letsencrypt.org/
|           ├───SEQUENCE_OF (2 elements)
|           |   ├───OBJECTID: 2.5.29.17 (X.509 extension - subjectAltName)
|           |   └───OCTET_STREAM (40 bytes): 0x30 26 82 0f 6c 65 74 73 65 6e 63 72 ...
|           |       └───SEQUENCE_OF (2 elements)
|           |           ├───CONTEXT_SPECIFIC (15 bytes): letsencrypt.org
|           |           └───CONTEXT_SPECIFIC (19 bytes): www.letsencrypt.org
|           ├───SEQUENCE_OF (2 elements)
|           |   ├───OBJECTID: 2.5.29.32 (X.509 extension - certificatePolicies)
|           |   └───OCTET_STREAM (69 bytes): 0x30 43 30 08 06 06 67 81 ...
|           |       └───SEQUENCE_OF (2 elements)
|           |           ├───SEQUENCE_OF (1 elements)
|           |           |   └───OBJECTID: 2.23.140.1.2.1 (CAB Certificate Policies - domainValidated)
|           |           └───SEQUENCE_OF (2 elements)
|           |               ├───OBJECTID: 1.3.6.1.4.1.44947.1.1.1
|           |               └───SEQUENCE_OF (1 elements)
|           |                   └───SEQUENCE_OF (2 elements)
|           |                       ├───OBJECTID: 1.3.6.1.5.5.7.2.1 (PKIX policy qualifier - cps)
|           |                       └───IA5_STRING (26 bytes): http://cps.letsencrypt.org
|           └───SEQUENCE_OF (2 elements)
|               ├───OBJECTID: 1.3.6.1.4.1.11129.2.4.2 (Google Certificate Transparency - googleSignedCertificateTimestamp)
|               └───OCTET_STREAM (243 bytes): 0x04 81 f0 00 ee 00 75 00 e2 69...
|                   └───OCTET_STREAM (240 bytes): 0x00 ee 00 75 00 e2 69 4b ae ...
├───SEQUENCE_OF (2 elements)
|   ├───OBJECTID: 1.2.840.113549.1.1.11 (PKCS #1 - sha256WithRSAEncryption)
|   └───NULL: null
└───BIT_STREAM (257 bytes): 0x00 16 97 ae c0 be ...

Although the above representation might be useful, it still does not provide the whole info about the data. So my goal is to provide the interpretation for all these ASN.1 elements while at the same time keeping the application work with any kind of asn1 artifacts. I want to provide the user the opportunity to provide its own ASN1 schema to interpret its data. For a X509 certificate, the user will naturally input the schema described in rfc5280.

To interpret the data, I see two options:

Option1: I could easily use the tree index paths to provide the interpretation. Example for the x509 certificate:

version: 0/0/0/0
serialNumber: 0/0/1
signature: 0/0/2

However, going in the direction of using paths, the application will no longer be generic thus only interpreting some predefined schemas which is not my goal.

Option2: Apply the asn1 schema from RFC over the parsed elements. But here I am no longer able to connect the dots. For the above example, I will have to apply the schema provided in ANEX A of rfc5280 (https://www.rfc-editor.org/rfc/rfc5280#appendix-A) to provide the relevant interpretation. Here I am blocked. How should I do that? I guess I would need to do a modified version of asn1c compiler, generate another tree out of the schema and overlap the two trees. Would this be a valid approach? Are there any better approaches? Do you have any other ideas on how to divide this problem in some simple achievable steps?

PS: I saw https://github.com/lapo-luchini/asn1js/blob/trunk/rfcdef.js and https://github.com/etingof/pyasn1-modules/blob/master/pyasn1_modules/rfc5280.py#L1551 as examples. It seems none of these are loading the ASN1 schema but rather they convert it to a more convenient format and then they use it.

1

There are 1 best solutions below

2
AKha On

First of all, you implemented an ANS.1 DER parser (since ASN.1 includes other formats as well), where DER can be parsed without requiring an ASN.1 schema because it's a TLV format (Tag/Length/Value). This is a "low level" way to look at the "raw" data, where you get the structure and primitive types, but not the schema semantics.

Compiling your schema (e.g. with asn1c) will produce: 1) a "parser" (aka a decoder) and 2) the data structures (aka the bindings) for your specific schema. So you do not need to parse the data anymore, but to decode them (using the generated code and referencing a schema type) and then access the data using the schema identifiers (say myCert.toBeSigned.version).