Is there way in perl to determine which of utf-8 or cp1252 the encoding of a string is?
How to determine whether utf-8 or cp1252 encoding?
3.2k Views Asked by CJ7 At
2
There are 2 best solutions below
0
ikegami
On
my $could_be_utf8 = utf8::decode( my $tmp = $string );
my $could_be_cp1252 = $string !~ /[\x81\x8D\x8F\x90\x9D]/;
If you need to handle a string that contains a mix of both, see Fixing a file consisting of both UTF-8 and Windows-1252.
Related Questions in PERL
- Perl Regex for converting query strings
- Cross compiling perl for Android ld.lld: error: unable to find library -lpthread
- Regexp to remove small numbers and leave large ones
- `df` command not capturing entire output in perl
- Webmin CentOS7 AWS backup errors - perl(S3::AWSAuthConnection) can't be installed
- How to ignore perm errors with Path::Tiny 'visit'? (Windows)
- Why does setting `*\` to a scalar (string) reference not result in auto printing
- Regex for deconstructing SQL where statement
- Random characters in DS record from Net::DNS:RR when calling print/string
- Perl with Selenium: cannot save the Web page with Ctrl+S
- openssl pbkdf2 and perl
- Strawberry Perl using a separate winlibs distro
- Perl / Undefined value as a HASH reference when running SNMP queries
- Timestamp with timezone: works with isql but not with DBD::Firebird
- Slurping a file ... syntax error - example from perldoc
Related Questions in ENCODING
- When sanitize/encode while implementing tags system like on SO
- Generating synthetic data for .ORC file in python
- WebClient.UploadData is returning control characters after non-ascii characters
- How to switch encoding of LibreOffice strings in Java UNO API?
- Userform to answer original userform
- Encoding problem on MySQL: Why some non-ASCII characters get encoded on more than 4 bytes?
- What encoding does the 'text' response type option in HttpClient use?
- Issue downloading audio with ytdlp on a raspberry pi
- KeyError: "['Building Age', 'Floor', 'Number of Floors'] not in index"
- FFMPEG fast quality video encoding without quality loss & less storage occupancy (maybe using GPU)
- Encoding attributes in an Genetic Algorithm
- React - MP4 - The file was loaded in a wrong encoding - 'UTF-8'
- How to re-encode an audio to match another one, to avoid re-encoding the whole audio
- Sqlalchemy - PostgreSQL - UnicodeDecodeError
- Calculate difference in encoding WITHOUT actually writing to a file?
Related Questions in UTF-8
- Can't we make a better variable-length character encoding with just using the 1 bit extra in the 7 bit ASCII?
- UTF-8 issue with excel
- UTF-8 string has too many bytes using SBCL and babel on Windows 64 bits
- How to convert from Java ASCII properties to UTF8 (Java 9) properties
- How to read a file that contains both ANSI and UTF-8 encoded characters
- BSONError in MongoDB Compass
- Create HMAC SHA-1 in JS with byte array
- pdftk unicode works in preview but not adobe acrobat
- xml file from ISO-8859-2 to UTF-8 in python
- How to store metadata for a UTF-8 text file cross-platform?
- Encoding problem on MySQL: Why some non-ASCII characters get encoded on more than 4 bytes?
- How to get character position in a text file encode in UTF-8 in C?
- Unicode character ſ is matched as itself and as 's.'
- VS Code integrated terminal UTF-8 input problem
- pdftk generated pdf does not render correct utf-8
Related Questions in CP1252
- Calculate difference in encoding WITHOUT actually writing to a file?
- write csv with encoding cp1252 with python 3.12
- Convert Character Between UTF-8 to ANSI (windows 1252) on VSCode?
- Converting Special Characters in UTF-8 file to cp1252 file in Python
- Show æ, ø, å on a docusaurus page (.md and .mdx files)
- Why might my BufferedWriter output a UTF-8 text file when I'm specifying that the encoding is Cp1252?
- How to convert "\x81" to "ü" with codepage 1252 (windows 1252)
- Preserving special characters when writing to a CSV - What encoding to use?
- Find code page by code point and equivalent character
- Convert non UTF-8 ASCII literals in otherwise UTF-8 text to their respective character
- Get list of files in directory return encode error for persian files name
- Convert character Code Set from 1252 to 1256
- how to Insert utf-8 encoded csv into latin-1 database
- problem string special charset hextobin php
- Python utf-8 conversion to cp1252
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
The core Encode::Guess should be up to task for this†
and then
(from docs).
In order to not also use the default "ascii, utf8 and UTF-16/32 with BOM" change that first
and then get the encoding
Or, copied from docs
See documentation for details.
† There are plenty of differences; see comment by tripleee and for example this post