We use struct.unpack to read a binary file created out of a dump of all the C structures fields and their values (integers and strings). The unpacked tuples are then used to create an intermediate dictionary representation of the fields and their values, which is later written to a text file output.
The text file output displays the strings as below:
ID = b'000194901137\x00\x00\x00\x00'
timestampGMT = 1489215906
timezoneDiff = -5
timestampPackage = 1489215902
version = 293
type = b'FULL\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
The program was earlier written in python 2.6, where it used to work fine. We had used the below lambda expression to remove the unwanted hex characters, while writing to the text file :
filtered_string = filter(lambda x: x in string.printable, line)
Moving the to Python 3.5, the lambda expression isn't supported anymore, since it now returns a filter which can't be converted to a string easily.
What is the Pythonic way to convert these binary string literals to equivalent ascii text ( without the trailing NUL'\x00'), so its written as normal strings values.
Also, since there are multiple thousand of entries to be processed for each file ( again there are multiple files ), looking for some best possible solutions in the current context.
In Python 2 you could use the str type for both text and binary data interchangeably and it worked fine. From Python3 binary data read is of type
bytes, and it doesn't share a common base class as in Python 2.Strings encoded in the binary file are read in as
bytestype string literals, which need to be converted to thestr(Unicode) type to be displayed/written to a file as normal strings.After I retrieve the tuple from
struct.unpack(), I do the following :Read this https://docs.python.org/3/howto/pyporting.html#text-versus-binary-data