converting HTML4 websites to haskell with blaze-from-html

211 Views Asked by At

I am trying to convert websites into the HTML data structure given by blaze.

curl -S http://jaspervdj.be/blaze | blaze-from-html

This example is taken from the end of the blaze-html tutorial. Curl obviously works, but this library can't build of the HTML

html $ do
    H.head $ H.title "301 Moved Permanently"
blaze-from-html: Attribute bgcolor is illegal in html5

Indeed, bgcolor has been deprecated. How to I get blaze to run with HTML4?


curl -S http://jaspervdj.be/blaze | blaze-from-html -v html4-transitional

As suggested by the comments I used some transitional features and I get a 301. Does this page get redirected?

html $ do
    H.head $ H.title "301 Moved Permanently"
    body ! bgcolor "white" $ do
        center $ h1 "301 Moved Permanently"
        hr
        center "nginx/1.2.1"

However, wget http://jaspervdj.be/blaze returns the HTML content of the page.

1

There are 1 best solutions below

2
On BEST ANSWER

This works for me:

curl -S http://jaspervdj.de/blaze | blaze-from-html -v html4-transitional

As suggested in the documentation you linked.

As for why one page is empty and says it's been redirected, it appears that curl sees a difference between http://jaspervdj.de/blaze and http://jaspervdj.de/blaze/, and the website you're downloading is erroneously treating them differently as well, while wget seems to automatically redirect like my browser does. I would suggest contacting the website author and suggesting that he fix this behavior.