Rendering text with HTML tags to Formatted text in a Word table using VBA

2.7k Views Asked by At

I have a word document with html tags that I need converted to formatted text. For example I would like <strong>Hello</strong> to display as Hello instead.

I've never used VBA before, but I've been trying to piece something together that would allow me to copy the html text from a specific table cell in Word, use IE to display the formatted version of that text, copy the formatted text from IE, and then paste it back into that same Word table cell. I think I've been able to figure out some of the code, but I don't think I'm referring to the table cells correctly. Can anyone help? This is what I have so far:

Dim Ie As Object

Set Ie = CreateObject("InternetExplorer.Application")

With Ie
    .Visible = False

    .Navigate "about:blank"

    .Document.body.InnerHTML = ActiveDocument.Tables(1).Cell(2, 2)
    
    .Document.execCommand "SelectAll"
    
    .Document.execCommand "Copy"
    
    ActiveDocument.Paste Destination = ActiveDocument.Tables(1).Cell(2, 2)

    .Quit
End With
End Sub
2

There are 2 best solutions below

3
On BEST ANSWER

You need two different appraoches for the two uses of .cell(2,2).

To get text from the cell you need to amend the first line to read

.Document.body.InnerHTML = ActiveDocument.Tables(1).Cell(2, 2).range.text  

In the second case your terminology is incorrect. It should read

ActiveDocument.Tables(1).Cell(2, 2).range.paste

You can get help on individual keywords/properties quite easily. In the VBA IDE just place your cursor on a keyword/property and press F1. You will be taken to the MS help page for the keyword/property. Sometimes you will have an additional selection step when there is more than one alternative.

You should also be aware that the property .cell(row,column) is prone to failure as it relies on their being no merged cells in the Table. A more robust approach is to use the .cells(index) property.

It may be that you can take an alternative appraoch and use a wildcard search to find the tag and then replace the portion you need whilst applying a suitable linked style (You won't be able to use paragraph styles because you will be trying to format only part of a paragraph and character styles don't seem to work with find/replace).

An example of such code that removes HTML tags and formats the remaining text is below

Option Explicit

Sub replaceHTML_WithFormattedText()

' a comma seperated list of HTML tags
Const myTagsList                          As String = "strong,small,b,i,em"

' a list of linked styles chosen or designed for each tag
' Paragraph  styles cannot be used as we are replacing only part of a paragraph
' Character styles just don't seem to work
' The linked styles below were just chosen from the default Word styles as an example
Const myStylesList                        As String = "Heading 1,Heading 9,Comment Subject,Intense Quote,Message Header"

' <, > and / are special characters therefore need escaping with '\' to get the actual character
Const myFindTag                           As String = "(\<Tag\>)(*)(\<\/Tag\>)"
Const myReplaceStr                        As String = "\2"

Dim myTagsHTML()                        As String
Dim myTagsStyles()                      As String
Dim myIndex                             As Long

    myTagsHTML = Split(myTagsList, ",")
    myTagsStyles = Split(myStylesList, ",")

    If UBound(myTagsHTML) <> UBound(myTagsStyles) Then
        MsgBox "Different number of tags and Styles", vbOKOnly
        Exit Sub

    End If

    For myIndex = 0 To UBound(myTagsHTML)

        With ActiveDocument.StoryRanges(wdMainTextStory).Find
            .ClearFormatting
            .Format = True
            .Text = Replace(myFindTag, "Tag", Trim(myTagsHTML(myIndex)))
            .MatchWildcards = True
            .Replacement.Text = myReplaceStr
            .Replacement.Style = myTagsStyles(myIndex)
            .Execute Replace:=wdReplaceAll

        End With

    Next

End Sub
0
On

Try something along the lines of:

Sub ReformatHTML()
Application.ScreenUpdating = False
With ActiveDocument.Range.Find
  .ClearFormatting
  .Format = True
  .Forward = True
  .MatchWildcards = True
  .Wrap = wdFindContinue
  .Replacement.Text = "\2"
  .Replacement.ClearFormatting
  .Text = "\<(u\>)(*)\</\1"
  .Replacement.Font.Underline = True
  .Execute Replace:=wdReplaceAll
  .Replacement.ClearFormatting
  .Text = "\<(b\>)(*)\</\1"
  .Replacement.Font.Bold = True
  .Execute Replace:=wdReplaceAll
  .Replacement.ClearFormatting
  .Text = "\<(i\>)(*)\</\1"
  .Replacement.Font.Italic = True
  .Execute Replace:=wdReplaceAll
  .Replacement.ClearFormatting
  .Text = "\<(h\>)(*)\</\1"
  .Replacement.Highlight = True
  .Execute Replace:=wdReplaceAll
End With
Application.ScreenUpdating = True
End Sub

The above macro uses the 'normal' HTML codes for bold, italic, underlining, & highlighting.

As your document seems to be using a different convention (Style names, perhaps?), you could replace (b>) in the code with (strong>), for example. And, if it's intended to relate to Word's own 'Strong' Style, you would also change:

.Replacement.Font.Bold = True

to:

.Replacement.Style = "Strong"