How to read the resource file? (google cloud dafaflow)

1.5k Views Asked by At

My Dataflow pipeline needs to read a resource file GeoLite2-City.mmdb. I added it to my project and ran the pipeline. I confirmed that the project package zip file exists in the staging bucket on GCS.

However, when I try to read the resource file GeoLite-City.mmdb, I get a FileNotFoundException. How can I fix this? This is my code:

String path = myClass.class.getResource("/GeoLite2-City.mmdb").getPath();

File database = new File(path);

try
{

DatabaseReader reader = new DatabaseReader.Builder(database).build(); //<-this line get a FileNotFoundException

}

catch (IOException e)

{

LOG.info(e.toString());

}

My project package zip file is "classes-WOdCPQCHjW-hRNtrfrnZMw.zip" (it contains class files and GeoLite2-City.mmdb)

The path value is "file:/dataflow/packages/staging/classes-WOdCPQCHjW-hRNtrfrnZMw.zip!/GeoLite2-City.mmdb", however it cannot be opened.

and This is the options.

--runner=BlockingDataflowPipelineRunner 
--project=peak-myproject 
--stagingLocation=gs://mybucket/staging 
--input=gs://mybucket_log/log.68599ca3.gz

The Goal is transform the log file on GCS, and insert the transformed data to BigQuery. When i ran locally, it was success importing to Bigquery. i think there is a difference local PC and GCE to get the resource path.

1

There are 1 best solutions below

1
On BEST ANSWER

I think the issue might be that DatabaseReader does not support paths to resources located inside a .zip or .jar file.

If that's the case, then your program worked with DirectPipelineRunner not because it's direct, but because the resource was simply located on the local filesystem rather than within the .zip file (as your comment says, the path was C:/Users/Jennie/workspace/DataflowJavaSDK-master/eclipse/starter/target/classe‌​s/GeoLite2-City.mmdb, while in the other case it was file:/dataflow/packages/staging/classes-WOdCPQCHjW-hRNtrfrnZMw.zip!/GeoLite2-City.mmdb)

I searched the web for what DatabaseReader class you might be talking about, and seems like it is https://github.com/maxmind/GeoIP2-java/blob/master/src/main/java/com/maxmind/geoip2/DatabaseReader.java .

In that case, there's a good chance that your code will work with the following minor change:

try
{
  InputStream stream = myClass.class.getResourceAsStream("/GeoLite2-City.mmdb");
  DatabaseReader reader = new DatabaseReader.Builder(stream).build();
}
catch (IOException e)
{
  ...
}