f" /> f" /> f"/>

Result of encodeURIComponent not being decoded correctly on server-side

104 Views Asked by At

I'm struggling to correctly encode/decode a JSON string for sending via query string in a GET request.

<html>
    <head>
        <script type="text/javascript">
            function executeRequest(applyUriEncode) {
                var json = '{"foo":"⚡&❤很久很久以前"}';
                var xmlhttp = new XMLHttpRequest();
                xmlhttp.open('GET', 'https://example.com/test.php?json='+(applyUriEncode ? encodeURIComponent(json) : json), false);
                
                xmlhttp.onreadystatechange = function() {
                    if (xmlhttp.readyState == 4 && xmlhttp.status == 200) {
                        console.log("applyUriEncode: "+(applyUriEncode ? "true\n" : "false\n"));
                        console.log(xmlhttp.responseText+"\n");
                    }
                };
                xmlhttp.send();
            }
        </script>
    </head>
    <body>
        <button onClick="executeRequest(true);">Submit encoded</button>
        <button onClick="executeRequest(false);">Submit unencoded</button>
    </body>
</html>
<?php // test.php
echo $_GET['json'];

Output when clicking Submit encoded and Submit unencoded:

applyUriEncode: true
{"foo":"💀ðŸ•⚡💎&ðŸŽâ¤å¾ˆä¹…很久以å‰"}

applyUriEncode: false
{"foo":"⚡

Desired output is

{"foo":"⚡&❤很久很久以前"}

I need to encode the JSON because otherwise, special characters such as & will break the string. However, the result of encodeURIComponent does not seem decoded correctly by PHP. I tried urldecode on the server side, but that didn't change a thing (output remains the same).

I feel this is a fundamental question, and it should have an answer somewhere here on StackOverflow, but I couldn't find it. I found tons of questions with similar problems, but none led me to a solution for this specific problem.


Edit: Inspired by the apparently AI-generated reply posted by @Adarsh Pattnaik I played around with ChatGPT a bit myself. After a few attempts, it suggested to add <meta charset="UTF-8"> to the HTML. This did indeed yield the correct output.

However, I don't understand why. The HTML file itself was always encoded as UTF-8. Request and response headers sent/received to/from test.php are (and always were) of content-type text/html; charset=UTF-8 as seen on the network tab of Chrome. Content-type of the HTML is text/html (without charset=UTF-8) and this didn't change when adding the meta-directive.

So what difference does <meta charset="UTF-8"> make that it now yields the correct result?

1

There are 1 best solutions below

2
Adarsh Pattnaik On

The issue you're encountering is related to character encoding. When you use encodeURIComponent in JavaScript, it correctly percent-encodes the JSON string, including the Unicode characters. However, when PHP receives the query string, it does not automatically decode the percent-encoded Unicode characters back to their original form.

To fix this, you need to ensure that PHP is interpreting the incoming data as UTF-8 and then use urldecode to decode the percent-encoded string. Here's how you can modify your PHP code to achieve the desired output:

<?php // test.php

// Get the raw, percent-encoded JSON string from the query parameter
$encodedJson = $_GET['json'];

// Manually decode the percent-encoded string
$decodedJson = urldecode($encodedJson);

// Ensure that the string is treated as UTF-8
$decodedJson = mb_convert_encoding($decodedJson, 'UTF-8', 'UTF-8');

// Output the decoded JSON string
echo $decodedJson;

This code snippet assumes that your PHP environment is configured to use UTF-8 as the default character encoding. If it's not, you might need to explicitly set the character encoding to UTF-8 using mb_internal_encoding('UTF-8') at the beginning of your script.

Additionally, it's important to note that when you're sending JSON data in a query string, you should always use encodeURIComponent to encode the JSON string. This is because the query string has certain reserved characters (like &, =, +, ?, etc.) that can break the structure of the URL if not encoded. The encodeURIComponent function ensures that these characters are safely encoded so that they do not interfere with the URL's format.

On the client side, your JavaScript code is correct in using encodeURIComponent when setting the applyUriEncode flag to true. Always use the encoded version for sending data in a query string to avoid issues with special characters.