SimpleFileVisitor to walk a directory tree to find all .txt files except in two sub directories

339 Views Asked by At

I want to traverse a directory tree with many sub directories. My target is to print all .txt file except those which are inside subdir and anotherdir sub-directories. I am able to achieve this with the below code.

public static void main(String[] args) throws IOException {
    Path path = Paths.get("C:\\Users\\bhapanda\\Documents\\target");
    Files.walkFileTree(path, new Search());
}

private static final class Search extends SimpleFileVisitor<Path> {

    @Override
    public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException {
        PathMatcher pm = FileSystems.getDefault().getPathMatcher("glob:**\\subdir");
        PathMatcher pm1 = FileSystems.getDefault().getPathMatcher("glob:**\\anotherdir");
        if (pm.matches(dir) || pm1.matches(dir)) {
            System.out.println("matching dir found. skipping it");
            return FileVisitResult.SKIP_SUBTREE;
        } else {
            return FileVisitResult.CONTINUE;
        }
    }

    @Override
    public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
        PathMatcher pm = FileSystems.getDefault().getPathMatcher("glob:*.txt");
        if (pm.matches(file.getFileName())) {
            System.out.println(file);
        }
        return FileVisitResult.CONTINUE;
    }
}

But when I am trying to combile pm and pm1 PathMatchers with below code, it's not working.

PathMatcher pm = FileSystems.getDefault().getPathMatcher("glob:**\\{subdir,anotherdir}");
if (pm.matches(dir)) {
            System.out.println("matching dir found. skipping it");
            return FileVisitResult.SKIP_SUBTREE;
        } else {
            return FileVisitResult.CONTINUE;
        }
    }

Is there anything wrong with the glob syntax ?

1

There are 1 best solutions below

0
CryptoFool On

Yes, there is something wrong with the glob syntax. You need to double up each of your backslashes so that they remain escaped backslashes in your glob patterns.

The first matcher:

PathMatcher pm = FileSystems.getDefault().getPathMatcher("glob:**\\subdir");

is not matching against a path ending with \subdir. Rather, the double slash becomes a single slash in the glob pattern, which means that the 's' is being escaped. And since an escaped 's' is just an 's', this matcher is equivalent to:

PathMatcher pm = FileSystems.getDefault().getPathMatcher("glob:**subdir");

which means it will match any path ending in subdir. So it will match the path xxx\subdir, but will also match the paths xxx\xxxsubdir and xxxsubdir.

The combined matcher:

PathMatcher pm = FileSystems.getDefault().getPathMatcher("glob:**\\{subdir,anotherdir}");

has the same problem. What is being escaped in this case is the '{'. In a glob pattern, this means to treat '{' as a literal character rather than the beginning of a pattern group. So this matcher will not match the path xxx\subdir, but it will match the path xxx{subdir,anotherdir}.

These two matchers will do what is intended:

PathMatcher pm = FileSystems.getDefault().getPathMatcher("glob:**\\\\subdir");
PathMatcher pm = FileSystems.getDefault().getPathMatcher("glob:**\\\\{subdir,anotherdir}");