Writing to a File in Apache Beam

5.2k Views Asked by At

I am running WordCount program in Windows using Apache Beam via DirectRunner.I can see the output files getting created in a temp folder(under src/main/resources/).But the write to the output file is getting failed. Below is the code snippet:

p.apply("ReadMyFile", TextIO.read().from("src/main/resources/input.txt"))
                .apply(Regex.split(" "))
                .apply(Count.<String>perElement())
                .apply(ToString.elements())
                .apply(TextIO.write().to("src/main/resources/output.txt"));

Please let me know the format it expects for the output directory/file Thanks in advance

Following is the error : Adding Exception:Caused by: java.lang.IllegalStateException: Unable to find registrar for i at org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:447) at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:111) at org.apache.beam.sdk.io.FileSystems.matchResources(FileSystems.java:174) at org.apache.beam.sdk.io.FileSystems.delete(FileSystems.java:321) at org.apache.beam.sdk.io.FileBasedSink$Writer.cleanup(FileBasedSink.java:905) at org.apache.beam.sdk.io.WriteFiles$WriteShardedBundles.processElement(WriteFiles.java:376)

2

There are 2 best solutions below

1
On

Beam currently doesn't handle Windows paths very well. See associated JIRAs, e.g. this one. Perhaps try specifying the absolute path using file:// ?

0
On

Summary: you can use the "/" character as a standin for the drive the process is running on, e.g. if your output file is located at

"C:/myFile"

write

TextIO.write().to("/myFile"));

Longer answer:

Even after the issue mentioned in jkff's answer (this one) was resolved, I could only make the way they specified work for input, not for output.

The javadoc in the LocalFileSystem class says

 * <p>Windows OS:
 *
 * <ul>
 *   <li>pom.xml
 *   <li>C:/Users/beam/Documents/pom.xml
 *   <li>C:\\Users\\beam\\Documents\\pom.xml
 *   <li>file:/C:/Users/beam/Documents/pom.xml
 *   <li>file:///C:/Users/beam/Documents/pom.xml
 * </ul>
 */

but none of these worked for the method

TextIO.write().to(String filenamePrefix))

However, using release version 2.12.0, I was able to write to a file on the same drive by using "/" as the root directory, i.e. instead of "C:/myDirectory/myFile", I used "/myDirectory/myFile". Of course, this way, you can only write to files on the same drive, but given that DirectRunner should only be used for testing, this might be good enough for many cases.