strtr acting weird - removing diacritics from a string

168 Views Asked by At

I'm having hard times removing diacritics from some $string. My code is

<?php
$string = "Příliš žluťoučký kůň úpěl ďábelské ódy.";
$without_diacritics = strTr($string, "říšžťčýůúěďó", "risztcyuuedo");
echo $without_diacritics; 

while expected output would be Prilis zlutoucky kun upel dabelske ody.

Instead, I'm receiving very weird response:

Puiszliuc uuluueoudoks� ku�u� s�pd�l d�scbelsks� s�dy.

I've thought that it could be a problem with multi-byte characters, but I've found that the strtr is multi-byte safe. Is my assumption wrong? What am I missing?

2

There are 2 best solutions below

0
On BEST ANSWER

The problem is that your input translation string is twice as big as the output translation string (because of Unicode) and strtr() works with bytes instead of characters; a translation array would be better in this case:

$string = "Příliš žluťoučký kůň úpěl ďábelské ódy.";

echo strtr($string, [
  'ř' => 'r',
  'í' => 'i',
  'š' => 's',
  'ž' => 'z',
  'ť' => 't',
  'č' => 'c',
  'ý' => 'y',
  'ů' => 'u',
  'ú' => 'u',
  'ě' => 'e',
  'ď' => 'd',
  'ó' => 'o'
]);

Output:

Prilis zlutoucky kuň upel dábelské ody.

Demo

3
On

A simple and tried solution (based off this answer), harnesses iconv() to convert the string "from your given encoding to ASCII characters".

$input = 'Příliš žluťoučký kůň úpěl ďábelské ódy.';
$input = iconv('UTF-8', 'ASCII//TRANSLIT', $input);
echo $input;

Example


Explanation

The issue you're facing is due to the encoding of the string/document. The issue with strtr() is that it isn't multibyte aware, as @ChrisForrence stated in his comment.

It may be because some of those characters are more than one byte, so it doesn't map properly.