Convert SPIP text to markdown (or HTML)

Question

Convert SPIP text to markdown (or HTML)

341 Views Asked by Orange Lux At 01 July 2025 at 13:51

I've got to update an old website based on SPIP (A french CMS with specific, Markdown-like syntax).

I'd like to convert its database content to markdown, but I didn't find any useful resource to convert SPIP syntax to HTML (And then to markdown via league/html-to-markdown, for instance), but I'm not able to find the correct method (from SPIP's code) to use to do so.

Any help would be great.

Original Q&A

There are 2 best solutions below

cFreed On 12 June 2016 at 23:23

Like you, I don't know such a tool, so I created mine when I had to face the issue of exporting SPIP data. But this tool:

is intended to output XML instead of HTML
is implemented as a plugin of SPIP, so it must be installed first, then driven from the SPIP private area
and to be honest, since it happened several years ago, I have not so much things in mind about it

So I can't realistically propose you to use it.
In the other hand, if you want to write your own tool, you might take advantage of the following excerpt, which was the heart of my tool:

$spip2xml_specifs = [
  'data_fields' => [
  # obj => [
  #   dest_field =>  src_field | [src_field,...]
  # ]
  # in src_field, initial "*" means: do not apply filters
    'rub' => [
      'titre' => '*titre',
      'body'  => ['descriptif','texte'],
    ],
    'art' => [
      'titre' => '*titre',
      'body'  => ['*surtitre','*soustitre','descriptif','chapo','texte','ps'],
    ],
  ),
  'str_replace' => [
    "\r\n"                                => "\n", # normalize Win with *nix
  ],
  'preg_replace' => [
    '¤\n\n\n*¤'                           => "\n\n", # limit multiple \n up to 2
    #
    '¤{{{(.+)}}}¤msU'                     => '<h3>$1</h3>',
    '¤{{(.+)}}¤msU'                       => '<b>$1</b>',
    '¤{(.+)}¤msU'                         => '<i>$1</i>',
    # _  => <br />
    '¤^_ ¤ms'                             => '<br />',
    # ---- => <hr />
    '¤^(-{4,})(\n|$)¤ms'                  => '<hr />',
    /*
    # \n\n => <paragraph>
    '¤(\n\n)?(.+)((?=\n\n)|$)¤Us'         => '<p>$2</p>',
    '¤\n\n¤'                              => '', # drop left (why?) \n\n
    */
    # [...|...->...] => <a href... /a>
    '¤\[->(.*)\]¤msU'                     => '<a href="$1">$1</a>',
    '¤\[(.*)->(.*)\]¤msU'                 => '<a href="$2">$1</a>',
    '¤<a (.*)>(.*)\|(.*)</a>¤msU'         => '<a title="$3" $1>$2</a>',
    # <cadre>, <code> => <blockquote>
    '¤<(?:cadre|quote)>(.*)</\1>¤imsU'    => '<blockquote>$1</blockquote>',
    # -* => <ul... /ul>
    '¤^-\*([^*].*)¤m'                     => '<li>$1</li>',
    '¤(<li>.*</li>)¤s'                    => '<ul>$1</ul>',
    # tableaux, notes, ancres...? modèles non traités -> signaler ?
    #
    # finally remove superfluous <p>
    '¤<p><(h[1-6r]|ul|table)(.*)>(.*)(</\1>)?</p>¤imsU'
                                          => '<$1$2>$3$4',
  ],
];

The data_fields array registers the fields that have to be processed for the two main data containers (rubrics and articles).
Then the str_replace and preg_replace array members register all transformations that must be executed in turn, on each field.

At least I can assert that these specifications are the right ones and work fine.

Feel free to ask for more information if needed.

**Orange Lux** · Accepted Answer

Orange Lux On 19 June 2016 at 19:24 BEST ANSWER

I finally found a script which matches my needs : https://github.com/nhoizey/spip2markdown

It is intended to be used inside SPIP, but the main functions are easily adaptable.

Convert SPIP text to markdown (or HTML)

There are 2 best solutions below

Related Questions in MARKDOWN

Related Questions in SPIP

Trending Questions

Popular # Hahtags

Popular Questions