Improper encoding in Java Read/Write file

563 Views Asked by At

I want to write to a csv file in UTF-8 in java

I am using BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("temp.csv"),Charset.forName("UTF-8").newEncoder())); after searching the internet

I am still getting illegal characters.

I want to write "Kürzlich" to my file and later on read and write again from the same file. When I do so I get "Kürzlich"

How I am getting "Kürzlich": I am parsing a xml file using DOM.

    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
    InputStream openstream = url.openStream();
    Document doc = dBuilder.parse(openstream);
    doc.getDocumentElement().normalize();

and then I extract my string.

I do not care how it is shown on the screen. I want to compare the stuff I write in the file with another file which is converted perfectly.

Is it happening because of DOM structure? Is there a way around?

1

There are 1 best solutions below

7
On

You appear to be writing UTF-8, but I don't see how you are reading UTF-8. Most likely you are reading the default encoding.

Try wrapping the openstream with an InputStreamReader specifying the encoding you want.

I suggest you try this to show you can write and read UTF-8

String text = "Kürzlich";
PrintWriter pw = new PrintWriter(new BufferedWriter(new OutputStreamWriter(new FileOutputStream("test.txt"), "UTF-8")));
pw.println(text);
pw.close();

BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("test.txt"), "UTF-8"));
String line = br.readLine();
br.close();
System.out.println("Text is the same is " + (line.equals(text)));

prints

Text is the same is true