I am using Apache Tika to detect whether a given file is binary or text.
I'd like the following extensions (".txt", ".csv", ".log", ".bat", ".m", ".properties", ".inf", ".ini",".java", ".c", ".cpp", ".h", ".vpp" ) to be detected as text files.
I am simply using Tika.detect(file) method to do this. But I notice that some of the above extensions like .inf (which is clearly text based) and .vpp are getting wrongly detected as 'application'.
Using javax.activation.MimetypesFileTypeMap.MimetypesFileTypeMap(), .vpp files are detected as application/octect-stream (binary). Using, SVNAccessControl svn:mimetype, we get type as text.
Is there a way to detect these files as text correctly in a Java program using any of these third party libs ?