I have Apache Flink job for parsing csv-files which works fine in from IntelliJ IDEA on Windows. But when I put my job (jar) in docker-container Apache Flink i have problems with permisson to file with class FileSource.forRecordStreamFormat(...). Inside the container i have file: /opt/flink/data/test2.csv. The permissions are ok (I can even changed file from my job). For fileName I used /opt/flink/data/test2.csv, //opt/flink/data/test2.csv, ///opt/flink/data/test2.csv.
Permissions:
# pwd
/opt/flink/data
# ls -ls
total 16088
1204 -rwxrwxrwx 1 root root 1231979 Jan 24 15:54 test2.csv
14876 -rwxrwxrwx 1 root root 15231523 Jan 22 19:24 test3.csv
8 -rwxrwxrwx 1 root root 6623 Jan 24 14:32 test_Home.xlsx
Docker-compose:
version: "2.2"
services:
jobmanager:
image: flink:1.16-java8
ports:
- "8081:8081"
command: jobmanager
environment:
- |
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanager
volumes:
- /c/Users/MGubina/Desktop/data:/opt/flink/data
taskmanager:
image: flink:1.16-java8
depends_on:
- jobmanager
command: taskmanager
scale: 1
environment:
- |
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanager
taskmanager.numberOfTaskSlots: 2
Part of job code:
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
CsvReaderFormat<Product> csvFormat = CsvReaderFormat.forPojo(Product.class);
FileSource<Product> csvSource =
// FileSource.forRecordStreamFormat(csvFormat, Path.fromLocalFile(file)).build(); // firsrt version
FileSource.forRecordStreamFormat(csvFormat, new Path(fileName)).build(); // second version
DataStream<Product> csvInputStream = env.fromSource(csvSource, WatermarkStrategy.noWatermarks(), "csv-source");
...
Logs with exception:
Caused by: java.io.FileNotFoundException: File file:/opt/flink/data/test2.csv does not exist or the user running Flink ('flink') has insufficient permissions to access it.
at org.apache.flink.core.fs.local.LocalFileSystem.getFileStatus(LocalFileSystem.java:106)
at org.apache.flink.connector.file.src.impl.StreamFormatAdapter.openStream(StreamFormatAdapter.java:157)
at org.apache.flink.connector.file.src.impl.StreamFormatAdapter.createReader(StreamFormatAdapter.java:70)
at org.apache.flink.connector.file.src.impl.FileSourceSplitReader.checkSplitOrStartNext(FileSourceSplitReader.java:112)
at org.apache.flink.connector.file.src.impl.FileSourceSplitReader.fetch(FileSourceSplitReader.java:65)
at org.apache.flink.connector.base.source.reader.fetcher.FetchTask.run(FetchTask.java:58)
at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:142)
I tried to use different ways of getting Path, but no luck this way.
As long as I have in exception File file:/opt/flink/data/test2.csv does not exist I think that problem might be that in local fyle system in Docker (Unix-like)is needed path like file:///.
What can I do? Maybe I miss something?
The issue seems to be that You are deploying two separate containers
taskmanagerandjobmanager, but the file is only available onjobmanagerand not ontaskmanager. Can You try to add the correct mounts to the task manager too and try again ?