Extract Contents code from PDF / Word File

239 Views Asked by At

I have to big files of MS Word & PDF which contains images, text fields, tables.

I need to insert text into these files dynamically at specific locations. I've tried Bookmarks method in Word but I can't use that method now. I've extracted data into byte array and tried to write in pdf but file gets corrupted. Here is the code:

 byte[] bytes = System.IO.File.ReadAllBytes("CDC.doc");
            FileStream fs = new FileStream("CDC.pdf", FileMode.OpenOrCreate);
            fs.Write(bytes, 0, bytes.Length);
            fs.Close();

Is there any way that I can convert these pdf/ word files to get PDF code for these files and then I can append data to specific locations in that code. Please advise. Thanks!

1

There are 1 best solutions below

0
On

If I understand you right, you would like to develop a code that would replace all placeholders in a Word document acting as a template with your application data. For placeholders you can use Bookmarks, but a better choice would be Content Controls. You can use Open XML SDK to parse such a template Word document and replace Content Controls with data. This approach uses a free MS library but is tedious.

A much easier approach would be using a ready-made library which can work with templates, which contain placeholders that will get replaced with your real app data at runtime. In your C# application you can prepare the data (as C# data objects or XML) and merge this data with the template. Output can be in docx, pdf or xps format. You can check out some of the examples here.