PHP: str_word_count(åäöåäöåäöåäö) returns the integer value of 12

613 Views Asked by At

I am using special symbols such as å ä ö on my website which measures the lengths of different texts. Thing is, I have noticed that PHP counts the symbols "å" "ä" "ö" as 1 word each. So åäö counts as 3 words, and åäöåäöåäöåäöåäö counts as 15 words. Well this is clearly not correct and I cannot find an answer to this problem anywhere. I'd be thankful for a useful answer, thank you!

4

There are 4 best solutions below

7
On

If there's a limited set of word characters that you need to take into account, just supply those into str_word_count with its third param (charlist):

$charlist = 'åäö';
echo str_word_count('åäöåäöåäöåäöåäö', 0, $charlist); // 1

Alternatively, you can write your own Unicode-ready str_word_count function. One possible approach is splitting the source string by non-word symbols, then counting the resulting array:

function mb_str_word_count($str) {
  return preg_match_all('#[\p{L}\p{N}][\p{L}\p{N}\'-]*#u', $str);
}

Basically, this function counts all the substrings in the target string that start with either Letter or Number character, followed by any number (incl. zero) of Letters, Numbers, hyphens and single quote symbols (matching the description given in str_word_count() docs).

0
On

You can try adding

setlocale(LC_ALL, 'en_US.utf8')

before your call to str_word_count or roll on your own with

substr_count(trim($str), ' ');
0
On

this work for me... hope its usefull.

USING str_word_count you need to use utf8_decode(utf8_encode)..

function cortar($str)
{
    if (20>$count=str_word_count($str)) {
        return $str;
    }
    else
    {
        $array = str_word_count($str,1,'.,-0123456789()+=?¿!"<>*ñÑáéíóúÁÉÍÓÚ@|/%$#¡');
        $s='';
        $c=0;
        foreach ($array as $e) {
            if (20>$c) {
                if (19>$c) {
                $s.=$e.' ';
                }
                else
                {
                $s.=$e;
                }               
            }
            $c+=1;
        }
        return utf8_decode(utf8_encode($s));
    }
}

function returs 20 words

0
On

If it is a string without linebreaks, and words are separated by a whitespace, a simple workaround would be to trim() the string and then count the whitespaces.

$string = "Wörk has to be done.";

// 1 space is 2 words, 2 spaces are 3 words etc.
if(substr_count(trim($string), ' ') > 2)
{
   // more than 3 words
   // ...
}