Get File Extension for special cases like tar.gz

5.2k Views Asked by At

I need to extract extensions from file names.

I know this can be done for single extensions like .gz or .tar by using filePath.lastIndexOf('.') or using utility methods like FilenameUtils.getExtension(filePath) from Apache commons-io.

But, what if I have a file with an extension like .tar.gz? How can I manage files with extensions that contain . characters?

4

There are 4 best solutions below

0
On BEST ANSWER

If you know what extensions are important, you can simply check for them explicitly. You would have a collection of known extensions, like this:

List<String> EXTS = Arrays.asList("tar.gz", "tgz", "gz", "zip");

You could get the (first) longest matching extension like this:

String getExtension(String fileName) {
  String found = null;
  for (String ext : EXTS) {
    if (fileName.endsWith("." + ext)) {
      if (found == null || found.length() < ext.length()) {
        found = ext;
      }
    }
  }
  return found;
}

So calling getExtension("file.tar.gz") would return "tar.gz".

If you have mixed-case names, perhaps try changing the check to filename.toLowerCase().endsWith("." + ext) inside the loop.

2
On

Found a simple way. Use substring to get filename only and indexOf instead of lastIndexOf to get first '.' and extension after it

0
On

You can get the filename part of the path, split on . and take the final 0, 1, or 2 elements in the array as the extension.

Of course if .tar.* (gz, bz2, etc.) is your only edge case it may be pragmatic to just build a solution that filters filenames for .tar. and use that as the point at which to extract the extension (to include the .tar portion).

3
On

A file can just have one extension!

If you have a file test.tar.gz,

  • .gz is the extension and
  • test.tar is the Basename!

.tar in this case is part of the basename, not the part of the extension!

If you like to have a file encoded as tar and gz you should call it .tgz. To use a .tar.gz is bad practice, if you need to handle thesse files you should make a workaround like rename the file to test.tgz.