I'm working with an old software to collect acoustic data from a manufacturing process. The file that was generated is an encoding unknown to both me and every application I've used to open the file. I've also used python to try and convert the file from latin1 to UTF-8... no luck.
Could anyone suggest an alternative way to convert this code to something sensible? Or at the very least, help me confirm what encoding I'm dealing with? much appreciated!
The output should just be numbers. Ideally separated into columns and rows but any advice is appreciated.
First, we need to determine the charset of the file. So we can either use
chardet(python lib) orfind -bi $file(Linuxfilecommand) to determine the charset. In practice, I have observed thatchardettakes slightly longer processing time as compared tofilecommand. Also, I'm executing my code in a Linux container so I don't have to worry about the availability of thefilecommand. So for this reason I'm using thefilecommand with the help ofsubprocesslib from Python to get the charset.run_cmdwill run the provided linux command and return a tuple ofstdoutandstderrin string format.filecommand and process the result fromrun_cmd. If a file is encoded withus-asciiit will return thisus-asciias charset.utf-8or some other encoding.This will convert the file into a
utf-8encoded file.Note: I have used
logto print messages. You can use print instead.