We can set the default character encoding to use for reading request bodies by ServletContext#setRequestCharacterEncoding (since Servlet 4.0).
I think that the character encoding for HttpServletRequest#getReader can be set using ServletContext#setRequestCharacterEncoding(*).
But the reader that HttpServletRequest#getReader returns seems to decode characters not using the encoding set by ServletContext#setRequestCharacterEncoding.
My questions are:
- Why
ServletContext#setRequestCharacterEncodingdoes not have an effect onHttpServletRequest#getReader(but it have an effect onHttpServletRequest#getParameter)? - Is there any specification describing such
ServletContext#setRequestCharacterEncodingandHttpServletRequest#getReaderbehaviors?
(I read Servlet Specification Version 4.0, but I can't find any spec about such behaviors.)
I have created a simple war application and tested ServletContext#setRequestCharacterEncoding.
[Env]
- Tomcat9.0.19 (I don't change any default configuration)
- JDK11
- Windows8.1
[index.html]
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
<form action="/SimpleWarApp/app/simple" method="post">
<!-- The value is Japanese character '\u3042' -->
<input type="text" name="hello" value="あ"/>
<input type="submit" value="submit!"/>
</form>
<button type="button" id="the_button">post</button>
<script>
document.getElementById('the_button').addEventListener('click', function() {
var xhttp = new XMLHttpRequest();
xhttp.open('POST', '/SimpleWarApp/app/simple');
xhttp.setRequestHeader('Content-Type', 'text/plain');
<!-- The body content is Japanese character '\u3042' -->
xhttp.send('あ');
});
</script>
</body>
</html>
[InitServletContextListener.java]
@WebListener
public class InitServletContextListener implements ServletContextListener {
@Override
public void contextInitialized(ServletContextEvent sce) {
sce.getServletContext().setRequestCharacterEncoding("UTF-8");
}
}
[SimpleServlet.java]
@WebServlet("/app/simple")
@SuppressWarnings("serial")
public class SimpleServlet extends HttpServlet {
@Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
// req.setCharacterEncoding("UTF-8");
System.out.println("requestCharacterEncoding : " + req.getServletContext().getRequestCharacterEncoding());
System.out.println("req.getCharacterEncoding() : " + req.getCharacterEncoding());
String hello = req.getParameter("hello");
if (hello != null) {
System.out.println("hello : " + req.getParameter("hello"));
} else {
System.out.println("body : " + req.getReader().readLine());
}
}
}
I don't have any servlet filters. The above three are all the components of this war application. (GitHub)
Case 1: When I submit the form with a parameter 'hello', the value of 'hello' is successfully decoded as follows.
requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
hello : あ
Case 2:
When I click 'post' and send text content, the request body cannot be successfully decoded as follows.
(Although I confirm that the request body is encoded by UTF-8 like this: E3 81 82)
requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
body : ???
Case 3:
When I also set the encoding using HttpServletRequest#setCharacterEncoding at the first line of the servlet's 'doPost' method instead, the request body successfully decoded.
requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
body : あ
Case 4:
When I use http.setRequestHeader('Content-Type', 'text/plain; charset=UTF-8'); javascript, the request body successfully decoded.
requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
body : あ
Case 5:
When I do not call req.getParameter("hello"), the request body cannot be successfully decoded.
requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
body : ???
Case 6:
When I do not call ServletContext#setRequestCharacterEncoding at InitServletContextListener.java, no character encoding is set.
requestCharacterEncoding : null
req.getCharacterEncoding() : null
body : ???
[NOTE]
(*)I think so because:
- (1) The java doc of
HttpServletRequest#getReadersays"The reader translates the character data according to the character encoding used on the body".
- (2) The java doc of
HttpServletRequest#getCharacterEncodingsays"Returns the name of the character encoding used in the body of this request".
- (3) The java doc of
HttpServletRequest#getCharacterEncodingalso says"The following methods for specifying the request character encoding are consulted, in decreasing order of priority: per request, per web app (using ServletContext.setRequestCharacterEncoding, deployment descriptor)".
- (1) The java doc of
ServletContext#setResponseCharacterEncodingworks fine. When I useServletContext#setResponseCharacterEncoding, The writer thatHttpServletResponse#getWriterreturns encodes the response body by the character encoding set by it.
It is an Apache Tomcat bug (specific to
getReader()) that will be fixed in 9.0.21 onwards thanks to your report on the Tomcat users mailing list.For the curious, here is the fix.