Is there way in perl to determine which of utf-8
or cp1252
the encoding of a string is?
How to determine whether utf-8 or cp1252 encoding?
3.2k Views Asked by CJ7 At
2
There are 2 best solutions below
0

The core Encode::Guess should be up to task for this†
use Encode::Guess;
my $enc = guess_encoding($data, qw(cp1252)); # utf8 among defaults
and then
ref($enc) or die "Can't guess: $enc"; # trap error this way $utf8 = $enc->decode($data);
(from docs).
In order to not also use the default "ascii, utf8 and UTF-16/32 with BOM" change that first
Encode::Guess->set_suspects(qw(utf8 cp1252));
and then get the encoding
my $enc = guess_encoding($data);
Or, copied from docs
my $decoder = Encode::Guess->guess($data); die $decoder unless ref($decoder); my $utf8 = $decoder->decode($data);
See documentation for details.
† There are plenty of differences; see comment by tripleee and for example this post
If you need to handle a string that contains a mix of both, see Fixing a file consisting of both UTF-8 and Windows-1252.