Parse a formatted string into arrays of arrays

6.3k Views Asked by At
+2-1+18*+7-21+3*-4-5+6x29

The above string is an example of the kind of string I'm trying to split into either a key => value array or something similar. The string is used to represent the layout of various classes on a three column page of an intranet site, which is editable by the user via drag and drop. This string is stored in a cookie to be used on the next visit.

The numbers represent the id of a class and -, + and x represent the state of the class (minimised, expanded or hidden), the * represents a column break.

I can split this into the columns easily using explode which gives and array with 3 $key => $value associations.

eg.

$column_layout = array( [0] => '+2-1+18' , [1] => '+7-21+3' , [2] => '-4-5+6x29' )

I then need to split this into the various classes from there, keeping the status and id together. As the different classes and statuses will change from user to user and how many there are for each column, I need to be able to do this all automatically.

$column1 = array(
    array( '+' , 2 ),
    array( '-' , 1 ),
    array( '+' , 18 )
);
$column2 = array(...
2

There are 2 best solutions below

4
On BEST ANSWER

First explode() the array with the delimiter *

You could then use preg_match_all to match each item in the exploded array. Something like this works with your example input.

$layout = explode('*', $input);
$columns = array();
foreach ( $layout as $item ){
    $parts = array();

    //matches either a -, x or + followed by one or more digits
    preg_match_all('/([+-x])(\d+)/', $item, $matches, PREG_SET_ORDER);

    foreach ( $matches as $match){ 
        //match[1] hold the + or -, match[2] holds the digits
        $parts[] = array($match[1], $match[2]);
    }
    $columns[] = $parts;
}

The output from your example ends up like this:

array(
     array( array('+', '2'), array('-', '1'), array('+', '18') ),
     array( array('+', '7'), array('-', '21'), array('+', '3') ),
     //etc
);

With PHP 5.3 you could use something like this (untested). The main difference is that the inner loop has been replaced by array_map which removes the need for a lot of lines of code. (Array map applies a function to every item in an array and returns the transformed array). PHP 5.3 is required for the nice closure syntax

$layout = explode('*', $input);
$columns = array();
foreach ( $layout as $item ){
    preg_match_all('/([+-x])(\d+)/', $item, $matches, PREG_SET_ORDER);
    $columns[] = array_map( function($a){ return array($a[1], $a[2]); },
                            $matches);
}

You could also remove the loops altogether:

$innerMatch = function($item){
    preg_match_all('/([+-x])(\d+)/', $item, $matches, PREG_SET_ORDER);
    return array_map( function($a){ return array($a[1], $a[2]); },
                      $matches);
};
$columns = array_map($innerMatch, explode('*', $input));

However this has the large disadvantage of not being very readable to most PHP developers which is why I wouldn't recommend using it.


More explanation

At the request of @Christopher Altman

The only new bit in the PHP 5.3 version is really this:

array_map(
          function($a){ return array($a[1], $a[2]); },
          $matches
);

Expanding and altering it a bit (as an example)

//bind an anonymous function to the variable $func
$func = function($a){
    return $a*$a; 
}; 
//$func() now calls the anonymous function we have just defined

//then we can call it like so:
$result = array_map($func, $myArray);

So if $myArray is defined as

array(1,2,3,4);

When it is run through the array map function you can think of it as converting it into

array(func(1),func(2),func(3),func(4));

But as PHP isn't a lazy language, all the functions are evaluated as soon as they are encountered, so the array is returned from array_map as:

array(2, 4, 9, 16)

In the actual code, preg_match_all returns an array of matches (where the matches are arrays). So all I do is take the array and on every match apply a function that converts the match into a different array in the required format.

0
On

Assuming that your strictly formatted input has a static number of segments and values per segment, there are some advantages to using sscanf() as a (verbose) direct way to parse the string instead of a preg_ technique.

  1. This is a direct single-function technique. No need to explode and then parse.
  2. There is no useless "fullstring match" generated by this function like preg_match() does.
  3. You don't need to pick out what you need from a $matches array (like with preg_match())
  4. The numeric values are already cast as integers (if that is useful to you).

Code: (Demo)

$layout = '+2-1+18*+7-21+3*-4-5+6x29';

sscanf(
    $layout,
    '%[-+x]%d%[-+x]%d%[-+x]%d*%[-+x]%d%[-+x]%d%[-+x]%d*%[-+x]%d%[-+x]%d%[-+x]%d',
    $column1[0][0], $column1[0][1], $column1[1][0], $column1[1][1], $column1[2][0], $column1[2][1],
    $column2[0][0], $column2[0][1], $column2[1][0], $column2[1][1], $column2[2][0], $column2[2][1],
    $column3[0][0], $column3[0][1], $column3[1][0], $column3[1][1], $column3[2][0], $column3[2][1]
);

var_export($column1);
echo "\n---\n";
var_export($column2);
echo "\n---\n";
var_export($column3);

Output:

array (
  0 => 
  array (
    0 => '+',
    1 => 2,
  ),
  1 => 
  array (
    0 => '-',
    1 => 1,
  ),
  2 => 
  array (
    0 => '+',
    1 => 18,
  ),
)
---
array (
  0 => 
  array (
    0 => '+',
    1 => 7,
  ),
  1 => 
  array (
    0 => '-',
    1 => 21,
  ),
  2 => 
  array (
    0 => '+',
    1 => 3,
  ),
)
---
array (
  0 => 
  array (
    0 => '-',
    1 => 4,
  ),
  1 => 
  array (
    0 => '-',
    1 => 5,
  ),
  2 => 
  array (
    0 => '+',
    1 => 6,
  ),
)

p.s.

  • If you wanted the results to be a single array with 3 first-level elements and those elements containing 3 pairs of symbol-number subarrays, this is achievable as well by modifying the reference variables in sscanf().
  • If you don't like the repetition in the format string, you could declare the repeated subpattern as a variable and programmatically repeat it instead (delimited by asterisks of course).