I try to swap encoding from UTF-8 to windows-1251, but all my solutions works only with latin letters. So I want to change encoding in cyrillic String. How can I do it correctly?
All solutions with creating a new String from bytes don't save Cyrillic letters.
For ex: UTF-8 - Some текст с кириллицей and latin windows-1251 - Some текст СЃ кириллицей and latin
Specify character encoding for writing
You can specify a character encoding with the
CharSet
class.The NIO.2 framework in modern Java makes easy work of writing text to a file. For example,
Files.writeString
.This code works for me:
Or, this briefer code works too, per the Comment by Holger below.
I know nothing about Cyrillic text. I just read the Oracle tutorial first. Then I read Writing byte[] to a File in Java page at Baeldung.com. And in the Javadoc for
Charset
, I found a mention that if a character set is supported in Java, we should be able to use its name as listed in IANA Charset Registry. By following that link, I found the name"windows-1251"
.Run that code to create the file.
Specify character encoding for reading
Open the file in a text editor of your choice. Be sure to tell the app to interpret the octets in the file as Windows-1251 encoding.
Here I chose to use the TextEdit app by Apple, bundled with macOS. In the File > Open dialog box for TextEdit, notice the
Options
button used to display a list of character encodings. Choose Cyrillic (Windows) there, as that seems to mean Windows-1251.If the text is properly interpreted, we see the original Cyrillic characters.
Defaults
Be aware that until Java 17 and earlier, for most purposes the Java runtime defaults to the character encoding native to the host OS. This default applies to writing and reading text files, among other things.
As of Java 18 and later, the Java runtime defaults to UTF-8 character encoding for most purposes. This default applies across all host platforms (macOS, Linux, Windows, etc.). See JEP 400: UTF-8 by Default.
So when you need an alternate character encoding such as Windows 1251, always specify the
CharSet
explicitly.