How can I combine nom parsers to get a more bit-oriented interface to the data?

1k Views Asked by At

I'm working on decoding AIS messages in Rust using nom.

AIS messages are made up of a bit vector; the various fields in each message are an arbitrary number of bits long, and they don't always align on byte boundaries.

This bit vector is then ASCII encoded, and embedded in an NMEA sentence.

From http://catb.org/gpsd/AIVDM.html:

The data payload is an ASCII-encoded bit vector. Each character represents six bits of data. To recover the six bits, subtract 48 from the ASCII character value; if the result is greater than 40 subtract 8. According to [IEC-PAS], the valid ASCII characters for this encoding begin with "0" (64) and end with "w" (87); however, the intermediate range "X" (88) to "_" (95) is not used.

Example

  • !AIVDM,1,1,,A,D03Ovk1T1N>5N8ffqMhNfp0,0*68 is the NMEA sentence
  • D03Ovk1T1N>5N8ffqMhNfp0 is the encoded AIS data
  • 010100000000000011011111111110110011000001100100000001011110001110000101011110001000101110101110111001011101110000011110101110111000000000 is the decoded AIS data as a bit vector

Problems

I list these together because I think they may be related...

1. Decoding ASCII to bit vector

I can do this manually, by iterating over the characters, subtracting the appropriate values, and building up a byte array by doing lots of work bitshifting, and so on. That's fine, but it seems like I should be able to do this inside nom, and chain it with the actual AIS bit parser, eliminating the interim byte array.

2. Reading arbitrary number of bits

It's possible to read, say, 3 bits from a byte array in nom. But, each call to bits! seems to consume a full byte at once (if reading into a u8).

For example:

named!(take_3_bits<u8>, bits!(take_bits!(u8, 3)));

will read 3 bits into a u8. But if I run take_3_bits twice, I'll have consumed 16 bits of my stream.

I can combine reads:

named!(get_field_1_and_2<(u8, u8)>, bits!(pair!(take_bits!(u8, 2), take_bits!(u8, 3))));

Calling get_field_1_and_2 will get me a (u8, u8) tuple, where the first item contains the first 2 bits, and the second item contains the next 3 bits, but nom will then still advance a full byte after that read.

I can use peek to prevent the nom's read pointer from advancing, and then manually manage it, but again, that seems like unnecessary extra work.

0

There are 0 best solutions below