SimpleFileVisitor to walk a directory tree to find all .txt files except in two sub directories

331 Views Asked by At

I want to traverse a directory tree with many sub directories. My target is to print all .txt file except those which are inside subdir and anotherdir sub-directories. I am able to achieve this with the below code.

public static void main(String[] args) throws IOException {
    Path path = Paths.get("C:\\Users\\bhapanda\\Documents\\target");
    Files.walkFileTree(path, new Search());
}

private static final class Search extends SimpleFileVisitor<Path> {

    @Override
    public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException {
        PathMatcher pm = FileSystems.getDefault().getPathMatcher("glob:**\\subdir");
        PathMatcher pm1 = FileSystems.getDefault().getPathMatcher("glob:**\\anotherdir");
        if (pm.matches(dir) || pm1.matches(dir)) {
            System.out.println("matching dir found. skipping it");
            return FileVisitResult.SKIP_SUBTREE;
        } else {
            return FileVisitResult.CONTINUE;
        }
    }

    @Override
    public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
        PathMatcher pm = FileSystems.getDefault().getPathMatcher("glob:*.txt");
        if (pm.matches(file.getFileName())) {
            System.out.println(file);
        }
        return FileVisitResult.CONTINUE;
    }
}

But when I am trying to combile pm and pm1 PathMatchers with below code, it's not working.

PathMatcher pm = FileSystems.getDefault().getPathMatcher("glob:**\\{subdir,anotherdir}");
if (pm.matches(dir)) {
            System.out.println("matching dir found. skipping it");
            return FileVisitResult.SKIP_SUBTREE;
        } else {
            return FileVisitResult.CONTINUE;
        }
    }

Is there anything wrong with the glob syntax ?

1

There are 1 best solutions below

0
On

Yes, there is something wrong with the glob syntax. You need to double up each of your backslashes so that they remain escaped backslashes in your glob patterns.

The first matcher:

PathMatcher pm = FileSystems.getDefault().getPathMatcher("glob:**\\subdir");

is not matching against a path ending with \subdir. Rather, the double slash becomes a single slash in the glob pattern, which means that the 's' is being escaped. And since an escaped 's' is just an 's', this matcher is equivalent to:

PathMatcher pm = FileSystems.getDefault().getPathMatcher("glob:**subdir");

which means it will match any path ending in subdir. So it will match the path xxx\subdir, but will also match the paths xxx\xxxsubdir and xxxsubdir.

The combined matcher:

PathMatcher pm = FileSystems.getDefault().getPathMatcher("glob:**\\{subdir,anotherdir}");

has the same problem. What is being escaped in this case is the '{'. In a glob pattern, this means to treat '{' as a literal character rather than the beginning of a pattern group. So this matcher will not match the path xxx\subdir, but it will match the path xxx{subdir,anotherdir}.

These two matchers will do what is intended:

PathMatcher pm = FileSystems.getDefault().getPathMatcher("glob:**\\\\subdir");
PathMatcher pm = FileSystems.getDefault().getPathMatcher("glob:**\\\\{subdir,anotherdir}");