Getting cryptic errors from Bluemix Document Conversion service

106 Views Asked by At

I am trying to convert this document: http://www.redbooks.ibm.com/redpapers/pdfs/redp5213.pdf to JSON answer units, but it (and many similar others) just won't process through the service. If I try to process it through the demo page at https://document-conversion-demo.mybluemix.net/ it either returns the error 'Missing required parameters: either params.file or params.document_id must be specified' or it simply returns a blank result. If I try it through the REST API via Node.js and watson-developer-cloud, it returns error code 400 along with the message 'The input document failed to be converted because Exception while converting PDF to HTML'. (Why it's trying to convert to HTML I have no clue - I've specified JSON answer units and this code has worked fine with some other documents I've tried).

Is there something unusual about these redpapers that I'm trying to convert, or is the document conversion service having issues?

1

There are 1 best solutions below

2
On

I downloaded that [Redpaper][1] to my laptop, then went to the Document Conversion Demo, clicked Choose your file and uploaded the PDF I had just downloaded and then clicked Answer units JSON as the desired output format. At first, I didn't see anything happen. Hitting the download icon to the right of Output document gave me the converted JSON output as a download and also filled it in on the web page. Reloading the page, I got the conversion to appear on the demo page without having to hit the download.

I'm a newbie to Node.js. I got the following code to work (based on Document Conversion via Node) using the current watson-developer-cloud package, which is version 1.8.0.

var watson = require('watson-developer-cloud');
var fs = require('fs');

var document_conversion = watson.document_conversion({
  username:     'username',
  password:     'password',
  version:      'v1',
  version_date: '2015-12-15'
});

document_conversion.convert({
  file: fs.createReadStream('redp5213.pdf'),
  conversion_target: "ANSWER_UNITS"
}, function (err, response) {
  if (err) {
    console.error(err);
  } else {
    console.log(JSON.stringify(response, null, 2));
  }
});

This did take between ten and twenty seconds to run on a coffee shop WiFi.

Oh, and I forgot to address your question "Why [is it] trying to convert to HTML"?. The Document Conversion service always converts to HTML and then to normalized HTML. For answer units or plain text, it takes an additional step of converting the normalized HTML to the requested format. This is described in Document Conversion - Customizing (which strikes me as oddly out of the way for basic flow documentation).

[1]: http://www.redbooks.ibm.com/redpapers/pdfs/redp5213.pdf Redpaper