Java - file.resource.loader.path not working in AWS EMR

40 Views Asked by At

So I've to make my Java-Spark application work in both locally (on windows machine) and on AWS EMR cluster.

This is my code:

readHDFSConfig.loadHDFslibrary(conf.get("velocity.template.path"), conf, sparkSession, conf.get("velocity.template.path"));

VelocityEngine ve = new VelocityEngine();
VelocityContext context = null;

//Below 3 lines I added to make it work in local mode (on windows machine), where velocityPath will be a windows direcotry.
        
Properties p = new Properties();
p.setProperty("file.resource.loader.path", velocityPath);
ve.init(p);

context = new VelocityContext();

This is how I'm calling the velocity templates:

String alum = rows.get(0).getAs(DeltaMembers.ALUM.toString());
context.put("alum", alum);
artifacts.append(getSnippet(ve, context, "alum"));

public String getSnippet(VelocityEngine ve, VelocityContext context, String type) throws DataProcessException {

        StringWriter indexSnippet = new StringWriter();
        try {
            Template t = ve.getTemplate(type + ".vm");
            t.merge(context, indexSnippet);
        } catch (ResourceNotFoundException e) {
            e.printStackTrace();
            throw new DataProcessException("ResourceNotFoundException occured !!!! " + e.getMessage(), e);
        } catch (ParseErrorException e) {
            e.printStackTrace();
        } catch (Exception e) {
            e.printStackTrace();
        }
        return indexSnippet.toString();
    }

The readHDFSConfig.loadHDFslibrary function:

public void loadHDFslibrary(String hdfsPath , Configuration conf , SparkSession sparkSession , String defaultPath){
        boolean statusFlag = true;
        
        try {
            FileSystem fs  = FileSystem.get(conf);          
            LOGGER.info("HDFS Path is ..........."+hdfsPath);
            for(FileStatus eachPath: fs.listStatus(new Path(conf.get(hdfsPath, defaultPath)))){         
                LOGGER.info("FileName is ..........."+eachPath.getPath());
                sparkSession.sparkContext().addJar(eachPath.getPath().toString());
            }
            
        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block          
            e.printStackTrace();
        } catch (IllegalArgumentException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
            statusFlag = false;
        }
        LOGGER.info("File read flag value is -> ..........."+statusFlag);       
    }

Issue:

This is working in local machine when the velocityPath is a local windows path (resources folder). But on AWS EMR when the velocityPath is /usr/share/MPR_RESOURCES/velocityTemplate/ (exists both locally and in hdfs on EMR), the sections which are supposed to be populated by the .vm files are omitted even though the job is successful and I don't see any errors in the log.

I tried to hardcode the path in getSNippet function like Template t = ve.getTemplate(velocityPath+type + ".vm"); but the sections in xml are still omitted.

0

There are 0 best solutions below