PHP regex - capturing content between two strings (multiple results)

654 Views Asked by At

After spending over 2.5 hrs, can someone help with below?

I have html file in format like:

Example 1

[[section_abc]]
<div>
several lines of html ...
</div>
[[/section_abc]]

Example 1

[[section_opq]]
<div>
several lines of html ...
</div>
[[/section_opq]]

Below is desired output: Example 1: group1: section_abc group2: content between [[section_abc]] and [[/section_abc]]

Example 2: group1: section_opq group2: content between [[section_opq]] and [[/section_opq]]

Here is my current test line:

preg_match_all("/(\[\[)([^}]+)(\]\])/", $input_lines, $output_array);
4

There are 4 best solutions below

2
On BEST ANSWER

If there is no section nesting, have a try with

preg_match_all('~\[\[(\w+)]]((?>[^[]+|\[[^[])*)\[\[/\1]]~s', $str, $out)

See php demo at eval.in or regex demo at regex101

0
On

How about:

(\[\[[^\]]+\]\])([^\[]+)(\[\[[^\]]+\]\])

Group 1 will contain openning tag
Group 2 will contain the data block
Group 3 will contain closing tag

1
On

This is what you're looking for:

/(?<=\[\[(section_\w{3})\]\])(.+)(?>\[\[\/\1\]\])/s

Breaking down the regex

  1. (?<=\[\[(section_\w{3})\]\]) provides lookbehind for matching the strings starting with [[section_foo]] without including the tag
  2. (.+) captures everything inside the tags
  3. (?>\[\[\/\1\]\]) provides lookahead matching the strings ending with the same [[/section_foo]] tag without including the tag (Note: \1 is a reference to the first captured group, which is the tagname)
  4. /s makes dot . match newline (and note that in the current regex newline symbols after opening and before closing tags are included in the match)

Results

Example 1:

Group 1: section_abc

Group 2:

<div>
several lines of html ...
</div>

Example 2:

Group 1: section_opq

Group 2:

<div>
several lines of html ...
</div>
1
On

It's possible this pattern could work (minimal):

\[{2}([^\W]+)\]{2}\n([^[]+)

Result:

Match 1

Group 1:

section_abc

Group 2:

<div>
several lines of html ...
<more><a href=""></a>
</div>`

Match 2

Group 1:

section_opq

Group 2:

<div>
several lines of html ...
<more><a href=""></a>
</div>

Example:

https://regex101.com/r/lCX9FA/1