We are trying to convert a PDF to XML using the following command
xquery version "1.0-ml";
let $results := xdmp:pdf-convert(
xdmp:document-get("d:\CFR-2010-title48-vol1.pdf"), "CFR-2010-title48-vol1.xml" ),
$manifest := $results[1]
return $results
But it didnt generate the XML output for the PDF. It generated the following output files.
<parts xmlns="xdmp:pdf-convert"> <part>CFR-2010-title48-vol1_xml.xhtml</part> <part>CFR-2010-title48-vol1_xml_parts/01_00.jpg</part> <part>CFR-2010-title48-vol1_xml_parts/01_01.jpg</part> <part>CFR-2010-title48-vol1_xml_parts/conv.css</part> <part>CFR-2010-title48-vol1_xml_parts/toc.txt</part> </parts>
Can you please suggest how to generate the XML output for given PDF file?
Thanks
Venkat
The first document returned is XML.
Were you looking to get the DocBook? For that you need to run the entire upconversion process, and the easiest way to do that is to run the document through the CPF conversion application, which runs through a series of steps and inferences to get to that point.
Or: Are you wondering why the name in the part doesn't match the name from the second parameter to
xdmp:pdf-convert
? The second parameter is just used to adjust the generated hrefs to images; it is not used for the conversion output itself.Or: If you want to target XML of some other kind (not XHTML) directly from the format conversion of
xdmp:pdf-convert
, you can apply a different configuration file. See the documentation on that function for more details.