Detect whether file contains text or binary

537 Views Asked by At

I am using Apache Tika to detect whether a given file is binary or text.

I'd like the following extensions (".txt", ".csv", ".log", ".bat", ".m", ".properties", ".inf", ".ini",".java", ".c", ".cpp", ".h", ".vpp" ) to be detected as text files.

I am simply using Tika.detect(file) method to do this. But I notice that some of the above extensions like .inf (which is clearly text based) and .vpp are getting wrongly detected as 'application'.

Using javax.activation.MimetypesFileTypeMap.MimetypesFileTypeMap(), .vpp files are detected as application/octect-stream (binary). Using, SVNAccessControl svn:mimetype, we get type as text.

Is there a way to detect these files as text correctly in a Java program using any of these third party libs ?

0

There are 0 best solutions below