I am trying to convert MS Word (.docx) file to PDF format using Graph API. The file is stored in SharePoint Office 365. I am using below code which works.
var httpClient = await CreateAuthorizedHttpClient();
string path = $"{GraphEndpoint}sites/{SiteId}/drive/items/";
string requestUrl = $"{path}{fileId}/content?format={targetFormat}";
var response = await httpClient.GetAsync(requestUrl);
However, when we try to convert .docx file which contains HTML added using below code converting fails.
string altChunkId = "myId123";
//Create an alternative format import part on the MainDocumentPart
AlternativeFormatImportPart altformatImportPart = wordDoc.MainDocumentPart
.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Html, altChunkId);
using (MemoryStream htmlMemoryStream = new MemoryStream(Encoding.UTF8.GetBytes($"<html><head></head><body>{value}</body></html>")))
{
//Add the HTML data into the alternative format import part
altformatImportPart.FeedData(htmlMemoryStream);
//create a new altChunk and link it to the id of the AlternativeFormatImportPart
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
//p.InsertAfterSelf(altChunk);
documentBody.Append(altChunk);
break;
}
I get 406 Not Acceptable error when we try to convert the file using Graph API. Also I see that the file is not editable in browser and open in compatibility mode. If I try to open the document in edit mode I get error:
Sorry this document can't be opened because it contains objects that word doesn't support
I tried removing the HTML part of the document and pasted that in another document and tried converting that document to PDF which worked. When I saw the XML of the document I saw Word App converted that HTML to word compatible XML tags.
Question 1: How can I convert the HTML to word compatible tags? So I can convert the document to PDF.
Also if I try to Download as PDF, the file is converted to PDF without any issue.
This option is using below API call:
Question 2: Is there a way I can use this API to convert .docx file to PDF? I saw the access token's audience value is "wopi/{TenantName}@{TenantID}". If I get the correct access token I think I will be able to use the above API.