How to parse Html String In To Word docs In c#

863 Views Asked by At

I have an html string, which I have parsed into PDF using OpenHtmlToPdf library and it's working properly.

Now I want to parse same html string into Word document and i used HtmlToOpenXml library but the issue is format comes out in word document is different than PDF format.

And for that i have tried many solution but they are too expensive.

public static byte[] HtmlToWord(String html)
{

    using (var generatedDocument = new MemoryStream(10 * 1024))
    {
        using (WordprocessingDocument package = WordprocessingDocument.Create(
                               generatedDocument, WordprocessingDocumentType.Document))
            {
                MainDocumentPart mainPart = package.MainDocumentPart;
                if (mainPart == null)
                    {
                        mainPart = package.AddMainDocumentPart();
                        new DocumentFormat.OpenXml.Wordprocessing.Document(new Body()).Save(mainPart);
                    }

                    HtmlConverter converter = new HtmlConverter(mainPart);
                    Body body = mainPart.Document.Body;

                    converter.ParseHtml(html);
                    //converter.Parse(html);
                    //for (int i = 0; i < paragraphs.Count; i++)
                    //{
                    //    body.Append(paragraphs[i]);
                    //}

                    mainPart.Document.Save();
            }

        return generatedDocument.ToArray();
    }
}

Is there any solution for this issue?

1

There are 1 best solutions below

1
On

For working with html, you can try HTML Agility:

With HtmlAgility, you can simply do this:

string webUrl = "http://microsoft.com";

var page = new HtmlWeb();
var document = page.Load(url);

Once, you have this, sky is the limit. You can then save the document to a word document.