Why 'ServletContext#setRequestCharacterEncoding' does not have an effect on 'HttpServletRequest#getReader'?

908 Views Asked by At

We can set the default character encoding to use for reading request bodies by ServletContext#setRequestCharacterEncoding (since Servlet 4.0).

I think that the character encoding for HttpServletRequest#getReader can be set using ServletContext#setRequestCharacterEncoding(*).

But the reader that HttpServletRequest#getReader returns seems to decode characters not using the encoding set by ServletContext#setRequestCharacterEncoding.

My questions are:

  • Why ServletContext#setRequestCharacterEncoding does not have an effect on HttpServletRequest#getReader(but it have an effect on HttpServletRequest#getParameter)?
  • Is there any specification describing such ServletContext#setRequestCharacterEncoding and HttpServletRequest#getReader behaviors?

(I read Servlet Specification Version 4.0, but I can't find any spec about such behaviors.)

I have created a simple war application and tested ServletContext#setRequestCharacterEncoding.

[Env]

  • Tomcat9.0.19 (I don't change any default configuration)
  • JDK11
  • Windows8.1

[index.html]

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
</head>
<body>
    <form action="/SimpleWarApp/app/simple" method="post">
        <!-- The value is Japanese character '\u3042' -->
        <input type="text" name="hello" value="あ"/>
        <input type="submit" value="submit!"/>
    </form>
    <button type="button" id="the_button">post</button>
    <script>
        document.getElementById('the_button').addEventListener('click', function() {
            var xhttp = new XMLHttpRequest();
            xhttp.open('POST', '/SimpleWarApp/app/simple');
            xhttp.setRequestHeader('Content-Type', 'text/plain');
            <!-- The body content is Japanese character '\u3042' -->
            xhttp.send('あ');
        });
    </script>
</body>
</html>

[InitServletContextListener.java]

@WebListener
public class InitServletContextListener implements ServletContextListener {
    @Override
    public void contextInitialized(ServletContextEvent sce) {
        sce.getServletContext().setRequestCharacterEncoding("UTF-8");
    }
}

[SimpleServlet.java]

@WebServlet("/app/simple")
@SuppressWarnings("serial")
public class SimpleServlet extends HttpServlet {

    @Override
    protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
        // req.setCharacterEncoding("UTF-8");
        System.out.println("requestCharacterEncoding : " + req.getServletContext().getRequestCharacterEncoding());
        System.out.println("req.getCharacterEncoding() : " + req.getCharacterEncoding());

        String hello = req.getParameter("hello");
        if (hello != null) {
            System.out.println("hello : " + req.getParameter("hello"));
        } else {
            System.out.println("body : " + req.getReader().readLine());
        }
    }
}

I don't have any servlet filters. The above three are all the components of this war application. (GitHub)

Case 1: When I submit the form with a parameter 'hello', the value of 'hello' is successfully decoded as follows.

requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
hello : あ

Case 2: When I click 'post' and send text content, the request body cannot be successfully decoded as follows. (Although I confirm that the request body is encoded by UTF-8 like this: E3 81 82)

requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
body : ???

Case 3: When I also set the encoding using HttpServletRequest#setCharacterEncoding at the first line of the servlet's 'doPost' method instead, the request body successfully decoded.

requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
body : あ

Case 4: When I use http.setRequestHeader('Content-Type', 'text/plain; charset=UTF-8'); javascript, the request body successfully decoded.

requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
body : あ

Case 5: When I do not call req.getParameter("hello"), the request body cannot be successfully decoded.

requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
body : ???

Case 6: When I do not call ServletContext#setRequestCharacterEncoding at InitServletContextListener.java, no character encoding is set.

requestCharacterEncoding : null
req.getCharacterEncoding() : null
body : ???

[NOTE]

  • (*)I think so because:

    • (1) The java doc of HttpServletRequest#getReader says

      "The reader translates the character data according to the character encoding used on the body".

    • (2) The java doc of HttpServletRequest#getCharacterEncoding says

      "Returns the name of the character encoding used in the body of this request".

    • (3) The java doc of HttpServletRequest#getCharacterEncoding also says

      "The following methods for specifying the request character encoding are consulted, in decreasing order of priority: per request, per web app (using ServletContext.setRequestCharacterEncoding, deployment descriptor)".

  • ServletContext#setResponseCharacterEncoding works fine. When I use ServletContext#setResponseCharacterEncoding, The writer that HttpServletResponse#getWriter returns encodes the response body by the character encoding set by it.

1

There are 1 best solutions below

0
Mark Thomas On BEST ANSWER

It is an Apache Tomcat bug (specific to getReader()) that will be fixed in 9.0.21 onwards thanks to your report on the Tomcat users mailing list.

For the curious, here is the fix.