Parse string containing inline HTML attribute declarations in potentially 4 different formats

98 Views Asked by At

I need to parse strings like k1=v1, k2=v2, ... kn=vn with primary delimiter (,) and key-value-delimiter (=) may vary. The values may be quoted (" or ') and may contain the primary delimiter within the quotes.

As I saw no restrictions for this kind of usage in the PHP docs, I found str_getcsv() suitable.

But with a first test I found out, that the enclosure character was ignored:

$ php -v
PHP 8.1.2-1ubuntu2.14 (cli) (built: Aug 18 2023 11:41:11) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.1.2, Copyright (c) Zend Technologies
    with Zend OPcache v8.1.2-1ubuntu2.14, Copyright (c), by Zend Technologies
$ php -a
Interactive shell

php > $S2 = 'name=ABCD, value=17.3, autoID, onclick="getinfo($this, 99);"';
php > var_dump($S2);
string(60) "name=ABCD, value=17.3, autoID, onclick="getinfo($this, 99);""
php > var_dump(str_getcsv($S2));
array(5) {
  [0]=>
  string(9) "name=ABCD"
  [1]=>
  string(11) " value=17.3"
  [2]=>
  string(7) " autoID"
  [3]=>
  string(23) " onclick="getinfo($this"
  [4]=>
  string(6) " 99);""
}
php > $S3 = 'name=ABCD, value=17.3, autoID, onclick=|getinfo($this, 99);|';
php > var_dump($S3);
string(60) "name=ABCD, value=17.3, autoID, onclick=|getinfo($this, 99);|"
php > var_dump(str_getcsv($S3, ',', '|'));
array(5) {
  [0]=>
  string(9) "name=ABCD"
  [1]=>
  string(11) " value=17.3"
  [2]=>
  string(7) " autoID"
  [3]=>
  string(23) " onclick=|getinfo($this"
  [4]=>
  string(6) " 99);|"
}

The quoted string "getinfo($this, 99);" include the primary delimiter (,) and is therefore split into two elements ignoring the quotes. Usage of other enclosure character does not change anything.

So I'm wondering if this is a bug, a wrong usage or if my assumption, that str_getcsv is suitable for this operation, is wrong. Any help and comments appreciated.

Tested on different PHP environments with same result.

2

There are 2 best solutions below

2
On

It's more a case of where the delimiters are. The are meant to enclose the whole field and not just part of it.

So replace it with

$S3 = 'name=ABCD, value=17.3, autoID, |onclick=getinfo($this, 99);|';

will give you the result

array(4) {
  [0]=>
  string(9) "name=ABCD"
  [1]=>
  string(11) " value=17.3"
  [2]=>
  string(7) " autoID"
  [3]=>
  string(27) "onclick=getinfo($this, 99);"
}
1
On

Thanks for the comments and advice. As a conclusion str_getcsv is not an adequate solution to parse strings like the example. (although at a first glance it seemed to be).

So I needed to write a small parser by my own:

//  --------------------------------------------------------------------------------
//  Convert string with attributes into array
    function string2attribute( string $attstring, array $options ): array {
        if ( empty( $attstring ) ) {return [];}
        $Q = [
            $options['quotes'][0] ?? '', 
            $options['quotes'][1] ?? ($options['quotes'][0] ?? '')
        ];
        $parts = []; $buf = ''; $inquote = 0; $inescape = False;
        foreach ( str_split( $attstring ) as $C ) {
            if ( $inescape ) {
                $buf .= $C; $inescape = False;
            }   else    {
                if ( $C == $options['escape'] ) {
                    $inescape = True;
                }   elseif  ( $C == $Q[1] && $inquote > 0 ) {
                    $buf .= $C; $inquote -= 1;
                }   elseif  ( $C == $Q[0] ) {
                    $buf .= $C; $inquote += 1;
                }   elseif  ( $C == $options['delim1'] && $inquote < 1 ) {
                        $parts[] = trim( $buf ); $buf = '';
                }   else    {
                    $buf .= $C;
                }
            }
        }
        if ( !empty( $buf ) ) {$parts[] = trim( $buf );}

        $out = [];
        foreach ( $parts as $part ) {
            $kv = explode($options['delim2'], $part, 2);
            $kv = array_map( 'trim', $kv );
            if ( count( $kv ) == 2 ) {
                $out[$kv[0]] = rtrim( ltrim( $kv[1], $Q[0] ), $Q[1] );
            }   else    {
                $out[$kv[0]] = True;
            }
        }
        return $out;
    }

//  --------------------------------------------------------------------------------

Test program for this function:

//  --------------------------------------------------------------------------------
    header( "Content-Type: text/plain; charset=UTF-8" );

    $T = 'Test 1: like style definition';
    $S = 'display: block; width: 320px; font-size: 16px; color: #123456ab; border: 1px solid #333';
    $O = ['delim1'=>';', 'delim2'=>':', 'quotes'=>'', 'escape'=>'\\'];
    test_result( $T, $S, $O );


    $T = 'Test 2: like tag attributes';
    $S = 'name=ABCD value=17.3 autoID onclick="getinfo($this, 99);"';
    $O = ['delim1'=>' ', 'delim2'=>'=', 'quotes'=>'"', 'escape'=>''];
    test_result( $T, $S, $O );


    $T = 'Test 3: comma as primary delimiter';
    $S = 'name=ABCD, value=17.3, autoID, onclick="getinfo($this, 99);"';
    $O = ['delim1'=>',', 'delim2'=>'=', 'quotes'=>'"', 'escape'=>'\\'];
    test_result( $T, $S, $O );


    $T = 'Test 4: with escape and different quotes';
    $S = 'name=ABCD, value=17\", autoID, onclick=<getinfo($this, 99);>';
    $O = ['delim1'=>',', 'delim2'=>'=', 'quotes'=>'<>', 'escape'=>'\\'];
    test_result( $T, $S, $O );

//  --------------------------------------------------------------------------------
    function test_result( $title, $teststring, $options ) {
        $NL = "\r\n";
        echo $NL . $title . $NL . 'Teststring:' . $NL . $teststring;
        echo $NL . 'Options:' . $NL;
        print_r( $options );
        echo $NL . 'string2attribute( $Teststring, $Options ):' . $NL;
        print_r( string2attribute( $teststring, $options ) );
    }
//  --------------------------------------------------------------------------------

Results of the test:

Test 1: like style definition
Teststring:
display: block; width: 320px; font-size: 16px; color: #123456ab; border: 1px solid #333
Options:
    [delim1] => ;
    [delim2] => :
    [quotes] => 
    [escape] => \

string2attribute( $Teststring, $Options ):
    [display] => block
    [width] => 320px
    [font-size] => 16px
    [color] => #123456ab
    [border] => 1px solid #333

Test 2: like tag attributes
Teststring:
name=ABCD value=17.3 autoID onclick="getinfo($this, 99);"
Options:
    [delim1] =>  
    [delim2] => =
    [quotes] => "
    [escape] => 

string2attribute( $Teststring, $Options ):
    [name] => ABCD
    [value] => 17.3
    [autoID] => 1
    [onclick] => getinfo($this, 99);

Test 3: comma as primary delimiter
Teststring:
name=ABCD, value=17.3, autoID, onclick="getinfo($this, 99);"
Options:
    [delim1] => ,
    [delim2] => =
    [quotes] => "
    [escape] => \

string2attribute( $Teststring, $Options ):
    [name] => ABCD
    [value] => 17.3
    [autoID] => 1
    [onclick] => getinfo($this, 99);

Test 4: with escape and different quotes
Teststring:
name=ABCD, value=17\", autoID, onclick=<getinfo($this, 99);>
Options:
    [delim1] => ,
    [delim2] => =
    [quotes] => <>
    [escape] => \

string2attribute( $Teststring, $Options ):
    [name] => ABCD
    [value] => 17"
    [autoID] => 1
    [onclick] => getinfo($this, 99);