We have recently come across a situation where iterating over a string (e.g. list(some_string)
) gives something completely different than printing some_string
directly. How can this happen?
Some background: We are running a python web application on IIS using wfastcgi, with the application server behind a load balancer. We had some issues with both the application server's internal and the load balancer's external host turning up in different parts of the application, so to narrow it down we wrote a small wsgi server to take a look at what exactly is passed inside.
This is the module, again run using wfastcgi on IIS:
# test-wsgi.py
def application(environ, start_response):
status = '200 OK'
headers = [('Content-type', 'text/plain; charset=utf-8')]
body = 'Host: {}\n\n'.format(environ['HTTP_HOST']).encode('utf-8')
chars = list(environ['HTTP_HOST'])
body += 'Host by char: {}\n\n'.format(chars).encode('utf-8')
start_response(status, headers)
return [body]
And, behold the madness, this is the response:
Host: pretty-domain.com
Host by char: ['i', 'n', 't', 'e', 'r', 'n', 'a', 'l', '.', 'h', 'o', 's', 't', '.', 'e', 'x', 'a', 'm', 'p', 'l', 'e', '.', 'c', 'o', 'm']
We get the same results with other methods of iterating over the string, like for loops or list comprehensions, or just using len()
.
Apart from the question of what causes these specific values to turn up in our setup - how can this happen in python at all?
This is on IIS 10, python 3.6.8 and wfastcgi 3.0.0.
Answering myself here: As it turns out, the issue was somewhere else entirely. All the variables above contained the same string, the
'internal.host.example.com'
one - it was the load balancer executing a rewrite rule that turned it into'pretty-domain.com'
wherever it was found in a response.This shall serve as a cautionary tale for me that load balancers can not only modify requests, but the responses as well.