I have the following method that I run from my map task in a multithreaded execution , however this works fine in a standalone mod e, but when I runt this in Hadoop YARN it runs out of the physical memory of 1GB and the virtual memory also shoots up.
I need to know if I am doing anything wrong from a programming perspective, I think I am closing all the streams that I am opening ASAP , so I see no reason for a memory leak to happen . Please advise.
Thanks.
public static void manageTheCurrentURL(String url) {
logger.trace("Entering the method manageTheCurrentURL ");
InputStream stream = null;
InputStream is = null;
ByteArrayOutputStream out = null;
WebDriver driver = null;
try {
if (StringUtils.isNotBlank(url)) {
caps.setJavascriptEnabled(true); // not really needed: JS
// enabled by default
caps.setCapability(
PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY,
"/usr/local/bin/phantomjs");
// Launch driver (will take care and ownership of the phantomjs
// process)
driver = new PhantomJSDriver(caps);
driver.get(url);
String htmlContent = driver.getPageSource();
if (htmlContent != null) {
is = new ByteArrayInputStream(htmlContent.getBytes());
ByteArrayDocumentSource byteArrayDocumentSource = new ByteArrayDocumentSource(
is, url, "text/html");
Any23 runner = new Any23();
runner.setHTTPUserAgent("test-user-agent");
out = new ByteArrayOutputStream();
TripleHandler handler = new NTriplesWriter(out);
try {
runner.extract(byteArrayDocumentSource, handler);
} catch (ExtractionException e) {
} finally {
if (driver != null) {
driver.quit();
//driver.close();
}
try {
handler.close();
} catch (TripleHandlerException e) {
}
if (is != null) {
try {
is.close();
} catch (IOException e) {
}
}
}
if (out != null) {
stream = new ByteArrayInputStream(out.toByteArray());
Iterator<Node[]> it = new DeltaParser(stream);
if (it != null) {
SolrCallbackForNXParser callback = new SolrCallbackForNXParser(
url);
callback.startStory();
while (it.hasNext()) {
Node[] abc = it.next();
callback.processStory(abc);
}
callback.endStory();
}
}
}
}
} catch (IOException e) {
return;
}
finally {
if (stream != null) {
try {
stream.close();
} catch (IOException e) {
}
}
if (out != null) {
try {
out.close();
} catch (IOException e) {
}
}
}
logger.trace("Exiting the method manageTheCurrentURL ");
}