Hadoop YARN Map Task running out of physical and virtual memory

247 Views Asked by At

I have the following method that I run from my map task in a multithreaded execution , however this works fine in a standalone mod e, but when I runt this in Hadoop YARN it runs out of the physical memory of 1GB and the virtual memory also shoots up.

I need to know if I am doing anything wrong from a programming perspective, I think I am closing all the streams that I am opening ASAP , so I see no reason for a memory leak to happen . Please advise.

Thanks.

public static void manageTheCurrentURL(String url) {

logger.trace("Entering the method manageTheCurrentURL ");

InputStream stream = null;
InputStream is = null;
ByteArrayOutputStream out = null;
WebDriver driver = null;
try {

    if (StringUtils.isNotBlank(url)) {

        caps.setJavascriptEnabled(true); // not really needed: JS
                                            // enabled by default
        caps.setCapability(
                PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY,
                "/usr/local/bin/phantomjs");

        // Launch driver (will take care and ownership of the phantomjs
        // process)
        driver = new PhantomJSDriver(caps);
        driver.get(url);
        String htmlContent = driver.getPageSource();

        if (htmlContent != null) {

            is = new ByteArrayInputStream(htmlContent.getBytes());

            ByteArrayDocumentSource byteArrayDocumentSource = new ByteArrayDocumentSource(
                    is, url, "text/html");

            Any23 runner = new Any23();
            runner.setHTTPUserAgent("test-user-agent");

            out = new ByteArrayOutputStream();
            TripleHandler handler = new NTriplesWriter(out);

            try {
                runner.extract(byteArrayDocumentSource, handler);
            } catch (ExtractionException e) {


            } finally {

                if (driver != null) {
                    driver.quit();
                    //driver.close();
                }

                try {
                    handler.close();

                } catch (TripleHandlerException e) {

                }
                if (is != null) {
                    try {
                        is.close();
                    } catch (IOException e) {
                    }
                }

            }

            if (out != null) {

                stream = new ByteArrayInputStream(out.toByteArray());
                Iterator<Node[]> it = new DeltaParser(stream);
                if (it != null) {

                    SolrCallbackForNXParser callback = new SolrCallbackForNXParser(
                            url);
                    callback.startStory();

                    while (it.hasNext()) {
                        Node[] abc = it.next();
                        callback.processStory(abc);
                    }

                    callback.endStory();
                }
            }

        }

    }

} catch (IOException e) {
    return;
}

finally {

    if (stream != null) {
        try {
            stream.close();
        } catch (IOException e) {
        }
    }
    if (out != null) {
        try {
            out.close();
        } catch (IOException e) {
        }

    }
}

logger.trace("Exiting the method manageTheCurrentURL ");

}

0

There are 0 best solutions below