As a new user of w3m I am trying to do something basic like:
w3m -dump_source nytimes.com > nytimes.html
The output produced gives crazy characters and symbols. However, when I browse using w3m nytimes
, it loads properly, and I can even view the HTML using v
.
Further when I tried:
w3m -dump_extra nytimes.com > nytimes.html
I get all the extra info associated with the site perfectly, except for the HTML source.
Any help would be appreciated.
By default, w3m requests compressed output from the server by sending the following HTTP header:
The value of the header may vary depending on the version of w3m, but the fact is that the latest versions of the program request compressed output from the host using
Accept-Encoding
header. You can find out the exact headers with the following command:The request and response headers will be logged to
~/.w3m/request.log
file.You can request uncompressed version by overriding the header as follows:
Or even
Alternatively, decompress the output via pipe:
The
-f
option causes gunzip to copy the input data without change to the standard output, if the input data is not in a format recognized by gunzip. According to the documentation, you should also pass--stdout
option, but the piped command should print the result to standard output even without this option.Note, the server may respond with content compressed in
bzip2
. In this case, you can pipe the output throughbunzip2 -f
command.