I am trying to write a Java app that will run on a linux server but that will process files generated on legacy Windows machines using cp-1252 as the character set. Is there anyway to encode these files as utf-8 instead of the cp-1252 it is generated as?
Encoding cp-1252 as utf-8?
38.7k Views Asked by IAmYourFaja AtThere are 2 best solutions below
Joni
On
If the file names as well as content is a problem, the easiest way to solve the problem is setting the locale on the Linux machine to something based on ISO-8859-1 rather than UTF-8. You can use locale -a to list available locales. For example if you have en_US.iso88591 you could use:
export LANG=en_US.iso88591
This way Java will use ISO-8859-1 for file names, which is probably good enough. To run the Java program you still have to set the file.encoding system property:
java -Dfile.encoding=cp1252 -cp foo.jar:bar.jar blablabla
If no ISO-8859-1 locale is available you can generate one with localedef. Installing it requires root access though. In fact, you could generate a locale that uses CP-1252, if it is available on your system. For example:
sudo localedef -f CP1252 -i en_US en_US.cp1252
export LANG=en_US.cp1252
This way Java should use CP1252 by default for all I/O, including file names.
Expanded further here: http://jonisalonen.com/2012/java-and-file-names-with-invalid-characters/
Related Questions in JAVA
- Add image to JCheckBoxMenuItem
- How to access invisible Unordered List element with Selenium WebDriver using Java
- Inheritance in Java, apparent type vs actual type
- Java catch the ball Game
- Access objects variable & method by name
- GridBagLayout is displaying JTextField and JTextArea as short, vertical lines
- Perform a task each interval
- Compound classes stored in an array are not accessible in selenium java
- How to avoid concurrent access to a resource?
- Why does processing goes slower on implementing try catch block in java?
- Redirect inside java interceptor
- Push toolbar content below statusbar
- Animation in Java on top of JPanel
- JPA - How to query with a LIKE operator in combination with an AttributeConverter
- Java Assign a Value to an array cell
Related Questions in LINUX
- How do I recursively find and replace only in files named index.php on Linux webserver?
- passing text with \n as one argument in shell
- kernel module does not print packet info
- How to send ESC/POS commands to thermal printer in Linux
- (x64 Nasm) Writeline function on Linux
- How do I set the Hive user to something different than the Spark user from within a Spark program?
- Default priority of thread with SCHED_FIFO
- Calling a python function with options from shell script
- How to split a directory into parts without compressing or archiving?
- Cross compile simple standard C program on Linux for Mac
- How to offload NAPI poll function to workqueue
- python netifaces - How to get currently used network interface
- Unexpected output from function
- mingw-64 conflicting declarations when cross-compiling
- Different behavior of async with Visual Studio 2013(Windows8.1) and GCC 4.9(Ubuntu14.10)
Related Questions in UTF-8
- Site code to enable UTF-8 to EBCDIC encoding
- Wrong output when str_replace with acute ( ´ ) in utf-8 website
- How to encode bytes as a printable unicode string (like base64 for ascii)
- showing umlauts in html with utf8 charset
- Replace special qoutes with normal
- wxWidgets and UTF8 - some characters missing
- Detecting corrupt characters in UTF-8 encoded text file
- Control encoding when parsing SPSS file using package memisc
- Slidify no longer renders accent marks
- javascript treating special characters as utf characters
- Character encoding is missing at a point
- Search special characters with pg_search
- Hot deploying HTML templates generates question marks in the place of chinese characters - only on CentOS
- Reading from property file containing utf 8 character
- Problems with UTF8 text in XE7 ReadLn command
Related Questions in CHARACTER-ENCODING
- How to encode bytes as a printable unicode string (like base64 for ascii)
- FPDF with iconv from utf8mb4
- Char encoding and SQL in C#
- How to set only one table charset to utf8mb4 without change mysql configuration?
- Why does opening a file in two different encodings work as expected?
- —- " added in HTML when converting MarkDown file to HTML using Jekyll tool
- Unicode error. database malfunctions
- Can we convert ANSI encoded CSV file to utf-8 encoded file with javascript?
- Determining ISO-8859-1 vs US-ASCII charset
- Unexpected Python String Encoding of '/b'
- Rails ActiveRecord string field encoding vs Ruby String encoding
- Jekyll JSON incorrect character encoding
- Nodejs encoding issue
- How do I encode HTML characters within Javascript functions?
- Specifying Encoding While Placing Files In InDesign Using Extendscript
Related Questions in CP1252
- UnicodeEncodeError when installing jupyter
- RTF CP1252 to Text UTF-8
- python unicode woes - convert cp1252 string to unicode
- vim doesn't display cp1252 characters
- Unmappable character for encoding Cp1252 when trying to compile Java program
- write csv with encoding cp1252 with python 3.12
- Adobe Font Metrics for Standard PDF Fonts in CP1252
- Why does my decoded Windows-1252 string show up as a unicode value in a dictionary but not the value, although I try to decode it as UTF-8?
- Encoding cp1252
- communicate with a process in utf-8 on a cp1252 consoless
- Can I avoid using CP1252 on Windows?
- Encoding cp-1252 as utf-8?
- Migrating a PostgreSQL database to a MySQL database
- Linux using command file -i return wrong value charset=unknow-8bit for a windows-1252 encoded file
- Get list of files in directory return encode error for persian files name
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
You can read and write text data in any encoding that you wish. Here's a quick code example:
If this still 'chokes' on read, see if you can verify that the the original encoding is what you think it is. In this case I've specified windows-1252, which is the java string for cp-1252.