I am facing the problem to find out which is the file-type behind a file-handler.
I need this because my apache_log_parser failed to parse a line and the whole program bumped out:
Traceback (most recent call last): File "VirtualEnvs/moslog/bin/mosloganalisys.py", line 108, in
<module>
totalines = count_agent(logfilehandler,agentcount,totalines) File "VirtualEnvs/moslog/bin/mosloganalisys.py", line
27, in count_agent
log_line_data = line_parser(line) File "VirtualEnvs/moslog/lib/python2.7/site-packages/apache_log_parser/__init__.py",
line 225, in parse
raise LineDoesntMatchException(log_line=log_line, regex=self.log_line_regex.pattern)
The reason was that the file handler was pointing to a gz file. No matter if I used the gzip library to decompress the file because this was a double compressed file *.gz.gz and therefore the decompressed file was in turn another gziped file.
So I try to use the python-magic library to find out the file type but it seems that a filename is needed.
72 """
73 self._thread_check()
---> 74 if not os.path.exists(filename):
75 raise IOError("File does not exist: " + filename)
76
/usr/lib64/python2.7/genericpath.pyc in exists(path)
16 """Test whether a path exists. Returns False for broken symbolic links"""
17 try:
---> 18 os.stat(path)
19 except os.error:
20 return False
I already implement a try: / expect: statement but this doesn't really solve the problem of processing a lot of useless lines.
What do you suggest to do? Thanks
Looking better in the magic library I found the way to do that:
You just need to open the file reading the first 1024 char and using the function magic.from_buffer