In Java, here is one of several ways to process a "snapshot" of the files in a particular directory:
String directory = "/path/to/directory";
List<File> fileList = Arrays.asList((new File(directory)).listFiles());
fileList.parallelStream.forEach(file->{
Path fileAsPath = file.toPath();
// Assume the process method finishes by deleting the file or moving it to another directory
process(fileAsPath);
});
And here is one of several ways to process files that are added to the directory:
WatchService watchService = FileSystems.getDefault().newWatchService();
Path directoryAsPath = Paths.get(directory);
WatchKey watchKey = directoryAsPath.register(watchService, ENTRY_CREATE);
while (true) {
WatchKey key;
key = watchService.take();
for (WatchEvent<?> event: key.pollEvents()) {
WatchEvent.Kind<?> kind = event.kind();
if (kind == OVERFLOW) {
continue;
}
Path filename = event.context();
// Again, assume the process method finishes by deleting the file or moving it
// to another directory
process(filename);
}
}
What would be a fairly straightforward approach to process pre-existing files in the directory -- such as when the process starts -- and also process files that are subsequently added?
Each file should be processed exactly once. In this situation, the order in which files are processed does not matter.
I suppose one straightforward way would be to put the first block of logic in an infinite loop -- just have the listFiles() method take a new snapshot of the directory, perhaps with a brief delay between iterations -= but this seems clunky. It's possible that files can be on the order of tens of megabytes. It would be nice not to have to wait for an entire "snapshot" of files to be fully processed before beginning another "snapshot" of files.
Using a database to track the files that have been processed seems overly complicated.
Thanks!
Use 2 directories.
First move existing files out to a temp dir, then copy them back. These files, and ones created, will all trigger the watch as new files.
If you’re on Linux, you could instead try
touch
each existing file (untested, but may be enough to trigger the watch).