I am using tika to detect media file types (extensions), but whenever I send a docx, doc, xls, they are detected as "application/x-tika-ooxml". here is my method for determining fileExtensions
private static String determineFileExtension(String mimeType) {
String fileExtension = "";
if (mimeType != null) {
if (mimeType.equalsIgnoreCase("application/pdf")) {
fileExtension = ".pdf";
} else if (mimeType.equalsIgnoreCase("application/vnd.ms-excel")) {
fileExtension = ".xls";
} else if (mimeType.equalsIgnoreCase("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")) {
fileExtension = ".xlsx";
} else if (mimeType.equalsIgnoreCase("image/jpeg")) {
fileExtension = ".jpg";
} else if (mimeType.equalsIgnoreCase("video/mpeg")) {
fileExtension = ".mpeg";
} else if (mimeType.equalsIgnoreCase("video/mp4")) {
fileExtension = ".mp4";
} else if (mimeType.equalsIgnoreCase("application/x-tika-ooxml")) {
fileExtension = ".xlsx";
} else if (mimeType.equalsIgnoreCase("application/x-tika-msoffice")) {
fileExtension = ".xls";
} else if (mimeType.equalsIgnoreCase("text/plain")) {
fileExtension = ".txt";
} else if (mimeType
.equalsIgnoreCase("application/vnd.openxmlformats-officedocument.wordprocessingml.document")) {
fileExtension = ".docx";
} else if (mimeType.equalsIgnoreCase("application/vnd.ms-powerpoint")) {
fileExtension = ".ppt";
} else if (mimeType
.equalsIgnoreCase("application/vnd.openxmlformats-officedocument.presentationml.presentation")) {
fileExtension = ".pptx";
} else if (mimeType.equalsIgnoreCase("video/avi")) {
fileExtension = ".avi";
} else if (mimeType.equalsIgnoreCase("application/x-zip-compressed")) {
fileExtension = ".zip";
} else if (mimeType.equalsIgnoreCase("image/png")) {
fileExtension = ".png";
} else if (mimeType.equalsIgnoreCase("application/msword")) {
fileExtension = ".doc";
} else {
LOGGER.error("Unknown mimeType " + mimeType);
}
}
return fileExtension;
}
I tried changing the file extension for "application/x-tika-ooxml"
} else if (mimeType.equalsIgnoreCase("application/x-tika-ooxml")) {
fileExtension = ".xlsx";
to docx, doc, however, I still got the same results
I am using this dependency
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>2.9.0</version>
</dependency>
other file formats, such as .pdf .xml .mp4 .jpeg, .jpg .mp3 are working