Write to File without Encoding

2k Views Asked by At

I'm getting JSON from a webservice with encoded characters: \u201c, etc. As I'm parsing it works perfectly: double quotes inside texts have the encoded character value, while control double quotes are not encoded, so the parser see the right JSON structure. The problem is after I write it to a file and read it, it spoils the JSON. I no longer have \u201c, but " characters inside content texts.

  • If I encode it with utf-8, " are changed to the File Separator (28) character and - is changed to Control Device 3 (0x13) and results in a parsing exception.
  • If I encode it with ascii, " are changed to ? character.
  • If I encode it with iso-8859-1, " stays decoded ".

Is there any way to preserve the unencoded characters after writing and reading?

SAMPLE:

I'm using Newtonsoft.Json.Linq

Encoding encoding = Encoding.GetEncoding("ISO-8859-1");
webResponse = (HttpWebResponse)webRequest.GetResponse();
using (StreamReader streamReader = new StreamReader(webResponse.GetResponseStream(), encoding))
{
    responseString = streamReader.ReadToEnd();
}
JToken json = JObject.Parse(responseString);
using (StreamWriter stream = new StreamWriter(path, true, encoding))
{
    stream.Write(json.ToString());
}
string spoiledJsonString = File.ReadAllText(path, encoding);
JToken sureNotToBeCreated = JObject.Parse(spoiledJsonString); // EXCEPTION
1

There are 1 best solutions below

2
On BEST ANSWER

If I write the test program,

using System;
using System.Diagnostics;
using System.IO;
using System.Text;

class Program
{
    private static void Main()
    {
        var encoding = Encoding.GetEncoding("ISO-8859-1");
        var testString = new string(new[] { (char)0x201c });
        string roundTripped;

        using (var m = new MemoryStream())
        {
            using(var writer = new StreamWriter(m, encoding))
            {
                var reader = new StreamReader(m, encoding);
                writer.Write(testString);
                writer.Flush();
                m.Seek(0, SeekOrigin.Begin);
                roundTripped = reader.ReadToEnd();
            }
        }
    }

    Debug.Assert(
        string.Equals(testString, roundTripped),
        "These strings should be equal.");
}

I recreate your problem, the quote has been escaped.

If I change the encoding to Encoding.UTF8, it works successfully.


As supported here, ISO-8859-1 is not a Unicode charset so is a bad choice for encoding Unicode.

As supported here, JSON text is Unicode.

So we can deduce, ISO-8859-1 is a bad choice for encoding JSON strings.


The program,

using System;
using System.Diagnostics;
using System.IO;
using System.Text;

using Newtonsoft.Json.Linq;

class Program
{
    private static void Main()
    {
        var encoding = Encoding.UTF8;
        var testJson = new JObject
            {
                new JProperty(
                    "AQuote",
                    string(new[] { (char)0x201c }))
            };

        JObject roundTripped;

        using (var m = new MemoryStream())
        {
            using(var writer = new StreamWriter(m, encoding))
            {
                var reader = new StreamReader(m, encoding);
                writer.Write(testJson.ToString());
                writer.Flush();
                m.Seek(0, SeekOrigin.Begin);
                roundTripped = JObject.Parse(reader.ReadToEnd());
            }
        }
    }

    Debug.Assert(
        string.Equals(
            testJson["AQuote"].Value<string>(),
            roundTripped["AQuote"].Value<string>()),
        "These strings should be equal.");
}

runs without warning, so I suspect you have some other issue than UTF-8.