PHP foreach loop returns an extra unwanted array (Wikipedia API)

329 Views Asked by At

I've been researching this all day and haven't found any solutions. I'm also very new to php.

The purpose of my function is to take user input (Category1) of a Wikipedia article and return its categories. The basic function below does this without any problems.

function get_all_categories ( ) {

        $url = $this->get_url ( 'categories' ) ;
        $url .= 'titles='.urlencode($_POST['Category1']);
        $url .= '&cllimit=500' ;        
        $data = $this->get_result ( $url ) ;

        $array = json_decode($data, true); }

Example result for Urban planning:

Array
(
[batchcomplete] => 
[query] => Array
    (
        [pages] => Array
            (
                [46212943] => Array
                    (
                        [pageid] => 46212943
                        [ns] => 0
                        [title] => Urban planning
                        [categories] => Array
                            (
                                [0] => Array
                                    (
                                        [ns] => 14
                                        [title] => Category:All Wikipedia articles written in American English
                                    )

                                [1] => Array
                                    (
                                        [ns] => 14
                                        [title] => Category:Commons category with local link same as on Wikidata
                                    )

                                [2] => Array
                                    (
                                        [ns] => 14
                                        [title] => Category:Pages using ISBN magic links
                                    )

                                [3] => Array
                                    (
                                        [ns] => 14
                                        [title] => Category:Urban planning
                                    )

                                [4] => Array
                                    (
                                        [ns] => 14
                                        [title] => Category:Use American English from April 2015
                                    )

                                [5] => Array
                                    (
                                        [ns] => 14
                                        [title] => Category:Use dmy dates from April 2015
                                    )

                                [6] => Array
                                    (
                                        [ns] => 14
                                        [title] => Category:Wikipedia articles needing clarification from June 2015
                                    )

                                [7] => Array
                                    (
                                        [ns] => 14
                                        [title] => Category:Wikipedia articles with GND identifiers
                                    )

                            )

                    )

            )

    )

)

My problem begins when I try to extract from this array only the title values. I've attempted to do this with a foreach loop which is the easiest solution I found for multidimensional arrays:

$array1 = new RecursiveIteratorIterator(
        new RecursiveArrayIterator($array),
        RecursiveIteratorIterator::SELF_FIRST);

        foreach ($array1 as $key => $value) {
            if (is_array($value) && $key == 'categories') {
                $result = array_map(function($element){return $element['title'];}, $value);

                print_r($result);
                }               
        }

What I get with this code are two arrays. One array with only the titles (what I wanted), but also an unwanted array (sometime includes the first title) attached to the end:

Array
(
[0] => Category:All Wikipedia articles written in American English
[1] => Category:Commons category with local link same as on Wikidata
[2] => Category:Pages using ISBN magic links
[3] => Category:Urban planning
[4] => Category:Use American English from April 2015
[5] => Category:Use dmy dates from April 2015
[6] => Category:Wikipedia articles needing clarification from June 2015
[7] => Category:Wikipedia articles with GND identifiers
)
Array
(
[ns] => 
[title] => C
)

This extra array is what I don't understand. I think the problem is caused by the foreach loop. I tried unsetting $variable outside of the loop but it didn't help. The extra array becomes especially troublesome if I try to pass these results to another function. How can I prevent this from happening?

3

There are 3 best solutions below

1
On BEST ANSWER

For simplicity you can traverse array manually rather than using RecursiveIteratorIterator.

RecursiveIteratorIterator will kill performance for large arrays.

Change your extracting logic to this:

$result = array();
foreach($arr['batchcomplete']['query']['pages'] as $k => $v)
{
    foreach($v['categories'] as $cat)
    {
        $result[] = $cat['title'];
    }
}

Working Demo

0
On

As @samir mentions, it would be faster to do it manually, but if you require a searching mechanism that traverses unknown depth, you can also use a basic recursive function. It might be a little faster than an OOP-style RecursiveArrayIterator/RecursiveIteratorIterator:

function recurse($array,&$new)
    {
        foreach($array as $key => $value) {
            if($key == 'title' && isset($array['ns'])) {
                if(!isset($array['pageid']))
                    $new[]  =   $value;
            }
            else {
                if(is_array($value)) {
                    recurse($value,$new);
                }
            }
        }
    }

# Set's storage array for final titles
$new    =   array();
# Recurse your array
recurse($array,$new);
# Show stored values
print_r($new);
0
On

That's an interesting combination of PHP misfeatures:

  • $key == 'categories' is non-type-safe comparison; numeric array keys are integers, and for comparing an integer with a string PHP casts the string to an integer: roughly, it takes the longest prefix of the string which consists of numbers. If the string does not start with numbers at all, the result of the string to integer conversion is 0.
    So your condition will be true twice: for the categories subarray and for its first child (the one with the key 0). Tip: always use === for comparison.
  • PHP allows using the [] (array index) operator on almost anything that's not an array (usually returning null). So when the array_map tries to get $element['title'] for $element = 14 (the ns item of the first child of the categories subarray), that will succeed and result in null (which var_dump just displays as emptiness).
  • strings are slightly different: 'foo'[$n] is valid legacy syntax for getting the $n-th character of the string. When the array index operator is used on a string with a non-integer index, the index is cast to an integer (and as we have seen that usually results in zero). So 'Category:...'['title'] will result in the string 'C'.
    You should always be distrustful when using array index syntax on arrays with an unknown or unreliable structure, and use isset or something similar to make sure the array field you are trying to get exists.