Is there way in perl to determine which of utf-8 or cp1252 the encoding of a string is?
How to determine whether utf-8 or cp1252 encoding?
3.2k Views Asked by CJ7 At
2
There are 2 best solutions below
0
ikegami
On
my $could_be_utf8 = utf8::decode( my $tmp = $string );
my $could_be_cp1252 = $string !~ /[\x81\x8D\x8F\x90\x9D]/;
If you need to handle a string that contains a mix of both, see Fixing a file consisting of both UTF-8 and Windows-1252.
Related Questions in PERL
- Perl Command Line Interpreter crashing on exit
- Perl Regex: Merge multiple one-character substrings
- Syntax error in Perl open
- Need help in understanding perl tr command with /d
- Referencing a Schema's table batch/perl
- Retrieving filtered list of files using template toolkit
- “Badly placed ()'s” error when running loc command
- getting google contacts using shuttlecloud
- Perl Module using %EXPORT_TAGS
- get all possible permutations of words in string using perl script
- Can't locate DBI.pm in @INC with Perl
- split string into several substring using indexes
- How to find strings between two specified texts
- Getting a json from a server and assigning it to a variable
- Is there anyway to plot timeline charts in excel sheets using Spreadhseet::WriteExcel module in Perl?
Related Questions in ENCODING
- how to turn characters in wrong codec into space in python?
- erlang os:cmd() command with UTF8 binary
- How to encode bytes as a printable unicode string (like base64 for ascii)
- weird characters in utf-8 encoded file
- Enforcing that inputs sum to 1 and are contained in the unit interval in scikit-learn
- Detecting corrupt characters in UTF-8 encoded text file
- Why does opening a file in two different encodings work as expected?
- Is there any function like iconv in Python?
- Control encoding when parsing SPSS file using package memisc
- Escape XML on Windows Mobile 6
- MySQL php utf-8 format issues
- Can we convert ANSI encoded CSV file to utf-8 encoded file with javascript?
- How can I compress four floats into a string?
- Represent string as an integer in python
- Character encoding is missing at a point
Related Questions in UTF-8
- Site code to enable UTF-8 to EBCDIC encoding
- Wrong output when str_replace with acute ( ´ ) in utf-8 website
- How to encode bytes as a printable unicode string (like base64 for ascii)
- showing umlauts in html with utf8 charset
- Replace special qoutes with normal
- wxWidgets and UTF8 - some characters missing
- Detecting corrupt characters in UTF-8 encoded text file
- Control encoding when parsing SPSS file using package memisc
- Slidify no longer renders accent marks
- javascript treating special characters as utf characters
- Character encoding is missing at a point
- Search special characters with pg_search
- Hot deploying HTML templates generates question marks in the place of chinese characters - only on CentOS
- Reading from property file containing utf 8 character
- Problems with UTF8 text in XE7 ReadLn command
Related Questions in CP1252
- UnicodeEncodeError when installing jupyter
- RTF CP1252 to Text UTF-8
- python unicode woes - convert cp1252 string to unicode
- vim doesn't display cp1252 characters
- Unmappable character for encoding Cp1252 when trying to compile Java program
- write csv with encoding cp1252 with python 3.12
- Adobe Font Metrics for Standard PDF Fonts in CP1252
- Why does my decoded Windows-1252 string show up as a unicode value in a dictionary but not the value, although I try to decode it as UTF-8?
- Encoding cp1252
- communicate with a process in utf-8 on a cp1252 consoless
- Can I avoid using CP1252 on Windows?
- Encoding cp-1252 as utf-8?
- Migrating a PostgreSQL database to a MySQL database
- Linux using command file -i return wrong value charset=unknow-8bit for a windows-1252 encoded file
- Get list of files in directory return encode error for persian files name
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
The core Encode::Guess should be up to task for this†
and then
(from docs).
In order to not also use the default "ascii, utf8 and UTF-16/32 with BOM" change that first
and then get the encoding
Or, copied from docs
See documentation for details.
† There are plenty of differences; see comment by tripleee and for example this post