Find the file-type behind a file-handler in python

233 Views Asked by At

I am facing the problem to find out which is the file-type behind a file-handler.

I need this because my apache_log_parser failed to parse a line and the whole program bumped out:

Traceback (most recent call last):   File "VirtualEnvs/moslog/bin/mosloganalisys.py", line 108, in
 <module>
     totalines = count_agent(logfilehandler,agentcount,totalines)   File "VirtualEnvs/moslog/bin/mosloganalisys.py", line
 27, in count_agent
     log_line_data = line_parser(line)   File "VirtualEnvs/moslog/lib/python2.7/site-packages/apache_log_parser/__init__.py",
 line 225, in parse
     raise LineDoesntMatchException(log_line=log_line, regex=self.log_line_regex.pattern)

The reason was that the file handler was pointing to a gz file. No matter if I used the gzip library to decompress the file because this was a double compressed file *.gz.gz and therefore the decompressed file was in turn another gziped file.

So I try to use the python-magic library to find out the file type but it seems that a filename is needed.

     72         """
     73         self._thread_check()
---> 74         if not os.path.exists(filename):
     75             raise IOError("File does not exist: " + filename)
     76 

/usr/lib64/python2.7/genericpath.pyc in exists(path)
     16     """Test whether a path exists.  Returns False for broken symbolic links"""
     17     try:
---> 18         os.stat(path)
     19     except os.error:
     20         return False

I already implement a try: / expect: statement but this doesn't really solve the problem of processing a lot of useless lines.

What do you suggest to do? Thanks

1

There are 1 best solutions below

0
On

Looking better in the magic library I found the way to do that:

logfile=open('workspace/mosloganalysis/access.log.1429142400','r').read(1024)
print logfile
magic.from_buffer(logfile)

You just need to open the file reading the first 1024 char and using the function magic.from_buffer