Why is a bitmask (0x1F) commonly ANDed to the character encoding bytes of NFC tag NDEF payloads?

780 Views Asked by At

I am writing an android app to write NFC tags, and I keep seeing examples like this:

private NdefRecord createTextRecord(String content){
    try {
        byte[] language;
        language = Locale.getDefault().getLanguage().getBytes("UTF-8");

        final byte[] text = content.getBytes("UTF-8");
        final int languageSize = language.length;
        final int textLength = text.length;
        final ByteArrayOutputStream payload = new ByteArrayOutputStream(1 + languageSize + textLength);

        payload.write((byte) (languageSize & 0x1F)); // <----- LOOK HERE
        payload.write(language, 0, languageSize);
        payload.write(text, 0, textLength);

        return new NdefRecord(NdefRecord.TNF_WELL_KNOWN, NdefRecord.RTD_TEXT, new byte[0], payload.toByteArray());
    }
    catch (UnsupportedEncodingException e){
        Log.e("createNdefMessage",e.getMessage());
    }
    return null;
}

Note the payload.write((byte) (languageSize & 0x1F)); part. What's up with that 0x1F bitmask? At first I thought the specification would only allow for 5 bits to describe the length of the encoding, but that doesn't make sense because we're writing a whole byte anyway.

See here and here for examples of the NDEF spec. And see here, and here for more examples of this mysterious 0x1F mask being used.

Am I missing something?

EDIT: Since I have answered my own question, and I'm not entirely sure if I am correct, if anyone else can provide a better explanation, or more insight, I will select your answer instead.

2

There are 2 best solutions below

3
On BEST ANSWER

An NDEF Text Record is a version of the generic NDEF Record structure, characterized by the Type-Name-Format (TNF field) code 1 (well-known record type names assigned by the NFC Forum) and Type-Name (TYPE field) "T" (0x54).

For the NFC Forum Well-Known Type Name "T" the structure of the NDEF Record PAYLOAD is given by the "NFC Forum Text Record Type Definition" specification.

The Text Record payload consists of a status byte, followed by a variable length language code and the actual UTF-8 or UTF-16 encoded text content. The most significant bit of the status byte is 0 for UTF-8 and 1 for UTF-16 encoding. The next bit is reserved. The 6 least significant bits indicate the number of bytes occupied by the language code. The bit mask 0x1F corresponds to the 5 least significant bits of a byte and does not match the specification text. Furthermore, the subsequent line writes languageSize bytes without applying the same mask, thus potentially creating an incorrect NDEF Text Record where a tail part of the language code becomes part of the text content.

As an example payload, the byte sequence 02656e48656c6c6f20576f726c64 starts with status byte 0x02 for the 2 byte language code "en" (0x65, 0x6e) followed by the UTF-8 encoded text "Hello World".

0
On

Thanks to a comment in the code here ...

byte MASK = (byte) 0x1F;
if ((tagFirstOctet & MASK) == MASK) { // EMV book 3, Page 178 or Annex B1 (EMV4.3)

... I was able to find a partial answer to my question on page 156 of EMV 4.3 Book 3.

It seems that the lower 5 bits describe the encoding, are for a tag number, and the first 3 bits describe the class and object, thusly:

b8 | b7 | b6 | b5 | b4 | b3 | b2 | b1 | Meaning
---------------------------------------------------------------
 0 |  0 |    |    |    |    |    |    | Universal class
 0 |  1 |    |    |    |    |    |    | Application class
 1 |  0 |    |    |    |    |    |    | Context-specific class
 1 |  1 |    |    |    |    |    |    | Private class
   |    |  0 |    |    |    |    |    | Primitive data object
   |    |  1 |    |    |    |    |    | Constructed data object
   |    |    |  1 |  1 |  1 |  1 |  1 | See subsequent bytes
   |    |    |   Any other value <31  | Tag number

According to ISO/IEC 8825, Table 36 defines the coding rules of the 
subsequent bytes of a BER-TLV tag when tag numbers ≥ 31 are used
(that is, bits b5 - b1 of the first byte equal '11111').

b8 | b7 | b6 | b5 | b4 | b3 | b2 | b1 | Meaning
---------------------------------------------------------------
 1 |    |    |    |    |    |    |    | Another byte follows
 0 |    |    |    |    |    |    |    | Last tag byte
   |           Any value > 0          | (Part of) tag number

So, it seems as if the suggestion to use (languageSize & 0x1F) is incorrect, at least for the following reasons:

  1. This value should represent a tag number, not the character encoding.
  2. It assumes every tag is universal class and primitive data
  3. If the lower 5 bits are all 1 (ie: the value is 31), the format will be incorrect because the next byte should describe the number.

Since I have answered my own question, and I'm not entirely sure if I am correct, if anyone else can provide a better explanation, or more insight, I will select your answer instead.