OpenEdge: how to remove HTML tags from a string?

1k Views Asked by At

I have tried doing this:

REPLACE(string, "<*>", "").

but it doesn't seem to work.

1

There are 1 best solutions below

2
On BEST ANSWER

REPLACE doesn't work like that. There's no wildcard matching in it.

I've included a simple way of doing this below. However, there's lots of cases that this wont work in - non well formed html etc. But perhaps you can start here and move forward by yourself.

What I do is look for < and > in the text and replace everything between it with a pipe (|) (you could select any character - preferably something not present in the text. When that's done all pipes are removed.

Again, this is a quick and dirty solution and not safe for production...

PROCEDURE cleanHtml:
    DEFINE INPUT  PARAMETER pcString  AS CHARACTER   NO-UNDO.
    DEFINE OUTPUT PARAMETER pcCleaned AS CHARACTER   NO-UNDO.

    DEFINE VARIABLE iHtmlTagBegins AS INTEGER     NO-UNDO.
    DEFINE VARIABLE iHtmlTagEnds   AS INTEGER     NO-UNDO.
    DEFINE VARIABLE lHtmlTagActive AS LOGICAL     NO-UNDO.

    DEFINE VARIABLE i AS INTEGER     NO-UNDO.

    DO i = 1 TO LENGTH(pcString):
        IF lHtmlTagActive = FALSE AND SUBSTRING(pcString, i, 1) = "<" THEN DO:
            iHtmlTagBegins = i.
            lHtmlTagActive = TRUE.
        END.

        IF lHtmlTagActive AND SUBSTRING(pcString, i, 1) = ">" THEN DO:
            iHtmlTagEnds = i.
            lHtmlTagActive = FALSE.

            SUBSTRING(pcString, iHtmlTagBegins, iHtmlTagEnds - iHtmlTagBegins + 1) = FILL("|", iHtmlTagEnds - iHtmlTagBegins).
        END.
    END.

    pcCleaned = REPLACE(pcString, "|", "").

END PROCEDURE.

DEFINE VARIABLE c AS CHARACTER   NO-UNDO.

RUN cleanHtml("This is a <b>text</b> with a <i>little</i> bit of <strong>html</strong> in it!", OUTPUT c).

MESSAGE c VIEW-AS ALERT-BOX.