I see examples where CSV files can be consumed using jet eg.
BatchSource<SalesRecordLine> source = Sources.filesBuilder(sourceDir)
.glob("*.csv")
.build(path -> Files.lines(path).skip(1).map(SalesRecordLine::parse));
In a multinode setup, will all the nodes start picking up the file (on say a shared NFS) or does it employ some smart locking (like Apache Camel's idempotent file consumer method?). How does Jet know the file has been completely flushed to disk before reading?
thanks
If you are using an NFS then set the
sharedFileSystemproperty totrue:From the method javadoc:
For the batch source, Jet assumes the files are not modified while they are read. If they are, the result is undefined.
If you want to monitor files as they are written to, use
FileSourceBuilder.buildWatcher()instead ofbuild()- this will create a streaming job. But the watcher processes only lines appended since the job started. Again, if the files are modified in any other way than appending at the end, the result is undefined. For example, many text editors delete and write the entire file, even when you just appended a line at the end - for testing it's easiest to use