How do I convert HTML code to Confluence-style Wiki Markup?

20.1k Views Asked by At

The API documentation for Mylyn Wikitext has functions to convert Wiki Markup to HTML, but I cannot find functions to convert / parse HTML code to Wiki Markup. Class MarkupParser has method parseToHTML, but where can I find the reverse?

4

There are 4 best solutions below

1
On

As far as I know there is no way to convert HTML to Confluence wiki markup. And since Atlassian stops using textile as wiki markup in Confluence 4.x there is no need for a conversion. The page format ist XHTML.

1
On

I was able to achieve HTML to Confluence-style WikiMarkup using the DefaultWysiwygConverter from Atlassian's own Java libraries. Here's a simplified unit test:

import com.atlassian.renderer.wysiwyg.converter.DefaultWysiwygConverter;

String htmlString = "This is <em>emphasized</em> and <b>bold</b>";
DefaultWysiwygConverter converter = new DefaultWysiwygConverter();
String wikiMarkupString = converter.convertXHtmlToWikiMarkup(htmlString);
Assert.assertEquals("This is _emphasized_ and *bold*", wikiMarkupString);

The POM must include the correct repositories and dependencies

    <dependency>
        <groupId>com.atlassian.renderer</groupId>
        <artifactId>atlassian-renderer</artifactId>
        <version>8.0.5</version>
        <exclusions>
            <exclusion>
                <!-- This exclusion is necessary if you are in a situation which 
                     it conflicts, EG: using spring-boot -->
                <groupId>javax.servlet</groupId>
                <artifactId>servlet-api</artifactId>
            </exclusion>
        </exclusions>
    </dependency>

    <repositories>
        <repository>
            <!-- https://developer.atlassian.com/docs/advanced-topics/working-with-maven/atlassian-maven-repositories -->
            <id>atlassian-public</id>
            <url>https://packages.atlassian.com/maven/repository/public</url>
            <snapshots>
                <enabled>true</enabled>
                <updatePolicy>never</updatePolicy>
                <checksumPolicy>warn</checksumPolicy>
            </snapshots>
            <releases>
                <enabled>true</enabled>
                <checksumPolicy>warn</checksumPolicy>
            </releases>
        </repository>
    </repositories>
1
On

Here is how you do it in Mylyn using the WikiText Standalone. Substitute the appropriate DocumentBuilder for your desired Wiki markup (you'll have to check the API to see what's available; TextileDocumentBuilder also exists).

File ConvertToConfluence.java:

package com.stackoverflow.mylyn;

import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.StringWriter;

import org.eclipse.mylyn.internal.wikitext.confluence.core.ConfluenceDocumentBuilder;
import org.eclipse.mylyn.wikitext.core.parser.HtmlParser;
import org.xml.sax.InputSource;

public class ConvertToConfluence {

    public static String convertHTML(File htmlFile) {

        InputStream in = null;

        try {

            in = new FileInputStream(htmlFile);

        } catch (Exception ex) {

            // TODO: handle or re-throw file exception
        }

        InputSource inputSource = new InputSource(new InputStreamReader(in));
        StringWriter writer = new StringWriter();
        ConfluenceDocumentBuilder builder = new ConfluenceDocumentBuilder(writer);
        HtmlParser parser = new HtmlParser();

        try {

            parser.parse(inputSource, builder);

        } catch (Exception ex) {

            // TODO: handle or re-throw parsing exception
        }

        return writer.toString();       
    }   

    public static void main(String args[]) {

        File file = new File("c:\\filename.html");
        System.out.println(convertHTML(file));
    }
}

File filename.html:

<HTML>
<BODY>
<p>This is <b>bold text</b> and some <i>italic text</i>.<br/><br/>TEST!</p>
</BODY>
</HTML>

Produces Confluence output:

This is *bold text* and some _italic text_.
\\TEST!
1
On

Try Wikifier.

It doesn't do exactly what you want, but you might find it does enough, or is a useful starting point.

Wikifier converts snippets of the Confluence 4 XML storage format (that is, as presented by the Confluence Source Editor plugin, without a single document root element) into Confluence 3 wiki markup.

Why is this at all relevant to your question? The Confluence 4 XML storage format includes some elements and attributes that have the same names as XHTML elements and attributes.

For more information, click the Help link on the Wikifier web page.

Note: The XSLT stylesheet used by the Wikifier web page is slightly more recent than the XSLT stylesheet bundled with the related schema package.

This added later: Wikifier RT is even closer to what you want.