How to capture/isolate a specific substring in text with repeated formatting?

1.3k Views Asked by At

I'm learning regular expressions and I'm not sure how to "take" something specific out of the output.

Example - I want to retrieve specific CSS style's value.

Here's an simplified example:

$source = 'foo { bar: Something }
           foo { bar: Else }
           foo { bar: Yay }';

I want to output this after var_dump:

array(3) {
  [0]=>
  string(9) "Something"
  [1]=>
  string(4) "Else"
  [2]=>
  string(3) "Yay"
}

Here's my regex:

preg_match_all("/foo\s*{\s*bar:\s*[A-Za-z]*\s*}/",$source,$matches);
    
foreach($matches as $example) {
   echo '<pre>';
   var_dump($example);
   echo '</pre>';
   }

And I'm getting:

array(3) {
  [0]=>
  string(22) "foo { bar: Something }"
  [1]=>
  string(17) "foo { bar: Else }"
  [2]=>
  string(16) "foo { bar: Yay }"
}

How to limit my output data so it will display only the desired content not everything that matches regex?

3

There are 3 best solutions below

0
On
preg_match_all("/foo\s*{\s*bar:\s*([A-Za-z]*)\s*}/",$source,$matches);
                                  ^----     ^----

Parentheses in this case are called "capturing group"

http://nz.php.net/manual/en/regexp.reference.subpatterns.php

0
On

Use parentheses around the region you want to match.

0
On

Try changing your regular expression into

/foo\s*{\s*bar:\s*([A-Za-z]*)\s*}/

and then look at the output again. You will then probably see entries in your output with only the text you want to fetch.

By using ( and ) you create a group within your regular expression, and the preg_match_all function will output the content only within those groups as well.

Output array

An example:

$text = 'Here comes a number: 5, here comes a number: 3
          and here comes a number: 4';
preg_match_all( '/[Hh]ere comes a number: ([0-9])/', $text, $matches );

After running this code, $matches will now be:

array(
    array( 'Here comes a number: 5', '5' ),
    array( 'Here comes a number: 5', '5' ),
    array( 'Here comes a number: 5', '5' )
)

As you can see, $matches will contain an array for every string that matches. The first entry ($matches[0]) will always contain the complete matched string. The other indices ($matches[1], $matches[2] and so on) will only contain the value of the specified groups in order. If you specify an optional group (for example test([0-9])?) the associated index will contain a null value.

Excluding groups from the output

Sometimes you want to specify a group but don't want to include it in the output array. For example:

$text = 'Here comes a number: 5, here comes another number: 3
          and here comes a number: 4';
preg_match_all( '/[Hh]ere comes a(nother)? number: ([0-9])/', $text, $matches );

I added a group for nother because I want it to be optional. Now my $matches[1] contains "nother" or null, and my $matches[2] contains the actual number. Since I'm not interested in whether the user chose to write "another" or "a", I would like to exclude this group from the output.

This can be done by starting the group with (?:. The resulting code:

$text = 'Here comes a number: 5, here comes a number: 3
           and here comes a number: 4';
preg_match_all( '/[Hh]ere comes a(?:nother)? number: ([0-9])/', $text, $matches );

The group (?:nother) gets ignored in the output, and $matches[1] references to the actual number we are interested in.