I am using special symbols such as å ä ö on my website which measures the lengths of different texts. Thing is, I have noticed that PHP counts the symbols "å" "ä" "ö" as 1 word each. So åäö counts as 3 words, and åäöåäöåäöåäöåäö counts as 15 words. Well this is clearly not correct and I cannot find an answer to this problem anywhere. I'd be thankful for a useful answer, thank you!
PHP: str_word_count(åäöåäöåäöåäö) returns the integer value of 12
613 Views Asked by Alien13 At
4
There are 4 best solutions below
0

You can try adding
setlocale(LC_ALL, 'en_US.utf8')
before your call to str_word_count or roll on your own with
substr_count(trim($str), ' ');
0

this work for me... hope its usefull.
USING str_word_count you need to use utf8_decode(utf8_encode)..
function cortar($str)
{
if (20>$count=str_word_count($str)) {
return $str;
}
else
{
$array = str_word_count($str,1,'.,-0123456789()+=?¿!"<>*ñÑáéíóúÁÉÍÓÚ@|/%$#¡');
$s='';
$c=0;
foreach ($array as $e) {
if (20>$c) {
if (19>$c) {
$s.=$e.' ';
}
else
{
$s.=$e;
}
}
$c+=1;
}
return utf8_decode(utf8_encode($s));
}
}
function returs 20 words
0

If it is a string without linebreaks, and words are separated by a whitespace, a simple workaround would be to trim()
the string and then count the whitespaces.
$string = "Wörk has to be done.";
// 1 space is 2 words, 2 spaces are 3 words etc.
if(substr_count(trim($string), ' ') > 2)
{
// more than 3 words
// ...
}
If there's a limited set of word characters that you need to take into account, just supply those into
str_word_count
with its third param (charlist
):Alternatively, you can write your own Unicode-ready
str_word_count
function. One possible approach is splitting the source string by non-word symbols, then counting the resulting array:Basically, this function counts all the substrings in the target string that start with either Letter or Number character, followed by any number (incl. zero) of Letters, Numbers, hyphens and single quote symbols (matching the description given in
str_word_count()
docs).