I have the following function that transforms special accent characters (like ă) into a-zA-Z characters in a string:
function tradu($sir){
$sir_aux = $sir;
$diacritice = array("ă"=>"a", "â"=>"a", "î"=>"i", "Î"=>"I", "ș"=>"s", "ş"=>"s", "ţ"=>"t", "ț"=>"t");
for($i=0; $i<strlen($sir_aux); $i++){
foreach($diacritice as $key=>$value){
if($sir_aux[$i]==$key)
$sir_aux[$i]=$value;
}
}
$sir_aux = strtr($sir, $diacritice);
return $sir_aux;
}
Let's say a is the original string and a_translated is the translated string.
When I use strpos(a, string_to_find)
and strpos(a_translated, string_to_find)
, the returned values are different. I also checked strlen(a)
and strlen(a_translate)
and they give different results.
Why is this happening?
I need this explanation because I need to search if a string with accents contains a given normal string (without accents), but I must return the portion from the original string where I found it even if it contains accents.
What I tried
I translate the original string and find the position where the searched_string starts, then I substr(ORIGINAL_STRING, position)
. This is where I noticed the positions do not correspond.
Example: ORIGINAL STRING: Universitatea a fost înființată în 2001 pentru a oferi... SEARCHED STRING: infiintata DESIRED RESULT: înființată în 2001 pentru a oferi...
The position you get from
strpos
is not correct, because your original string is multi-byte andstrpos
can't handle multibyte strings. Try mb_strpos instead.Try:
mb_strpos(a,string_to_find,0,'UTF-8');
and
mb_strpos(a_translated,string_to_find,0,'UTF-8');
you will see they have the same result.
See this code demonstrates the difference between strpos (which cant handle multi-byte strings) and mb_strpos:
http://3v4l.org/ksYal