ouput in brower differs from output in terminal char>128 python3 apache2

176 Views Asked by At

Trying to print euro sign in browser: Prints successfully on terminal but not on browser Same behavior in python 2.7 and 3 : would prefer python 3.4 solution Browsers tested fire fox and opera: url localhost/cgi-bin/test2.py Browser shows page information with correct encoding so header must be working Some incompatibility perhaps with decode instruction in python Can produce Chinese characters by mixing encodings deliberately but cannot get them to match. running usual LAMP set up; no issues using PHP Seems to find correct binaries Need to accept input in any language

how to isolate issue?

Could someone post correct minimalist code for python 3 for headers and print say euro sign without using html entities please? My current code below

#!/usr//bin/env python3
import cgi
#cgi.test()

import locale
import sys
import os
import io

import codecs

import cgitb
cgitb.enable() #this does not work properly either!!!


lf = chr(10)
cr  = chr(13)

h = "Content-Type: text/html; charset=utf-8 "
#h.encode("ascii")
print(h)
print(' Cache-Control: "no-cache, no-store, must-revalidate"'.encode('utf-8'))
#print(' Pragma: no-cache')
#print(' Expires: 0')
print(cr)
print(lf)

print()
print(lf)
print(cr)
print('<DOCTYPE! html>')
print('<meta HTTP-EQUIV="content-type" CONTENT="text/html; charset=utf-8">')
print('<html><body>')
hw = "Hello World!"
hw.encode('utf-8')
#hw.encode('utf-16le')
print(hw)

euro = "&euro;"
euro.encode('utf-8')
#euro.encode('utf-16')
print(euro) #THIS PRINTS OKAY


u = chr(8364)
u=u'This string includes a \u20AC sign'
u.encode('utf-8')
#u.encode('utf-16le')
print(u) #THIS PRINTS IN TERMINAL, BUT NOT IN BROWSER AND GENERATES FATAL ERROR 

end = "end"
end.encode('utf-8')
#end.encode('utf-16')
print(end)



Terminal output:
Content-Type: text/html; charset=utf-8 
b' Cache-Control: "no-cache, no-store, must-revalidate"'

<DOCTYPE! html>
<meta HTTP-EQUIV="content-type" CONTENT="text/html; charset=utf-8">
<html><body>
Hello World!
&euro;
This string includes a € sign
end


Python 3.4.0 (default, Apr 11 2014, 13:05:18) 
[GCC 4.8.2] on linux
2

There are 2 best solutions below

1
On BEST ANSWER

Python3 strings are unicode by default, but it seems that the console has to support unicode too. For example: print("€") works on the linux terminal, but not on the windows command line. Apparently Apache has a similar problem. You can try to send the bytes directly:

#!/usr/bin/python3

import sys
import cgitb
cgitb.enable()

print("Content-Type: text/html;charset=utf-8")
print()
sys.stdout.flush()
print(
    "<!DOCTYPE html>"
    "<html>"
    "<body>")
sys.stdout.buffer.write(bytes("€", "utf-8"))
print(
    "</body>"
    "</html>")

Or you could just use print("&euro;"):

#!/usr/bin/python3

import cgitb
cgitb.enable()

print("Content-Type: text/html;charset=utf-8")
print()
print(
    "<!DOCTYPE html>"
    "<html>"
    "<body>"
    "&euro;"
    "</body>"
    "</html>")

This is much saner.

You don't have to use the encode method like you did in your script. Of course it won't look right in the terminal, but your browser will display it correctly.

Keep in mind that you have to print an empty line to seperate the header from the rest. After that you just print regular html.

0
On

Probably not the best solution but the following at least works:

u = chr(8364)
#u='This string includes a \u20AC sign'
u=u+'This string includes a \u673A sign'  
out = ''

for ch in u:
    out = out+'&#'+str(ord(ch))+';' 
print(out)