Wget changing character around download location

226 Views Asked by At

I have a little perl script that I updated to download images from tvrage. But I have a problem. This is the code line I have problems with:

system "wget -P '/home/user/script/cache/posters' $imgurl";

It usually works just fine but from time to time it fails with the same error.

HTTP request sent, awaiting response... 200 OK
Length: 16758 (16K) [image/jpeg]
Saving to: â/home/user/script/cache/posters/28386.jpgâ
ERROR! Wide character in syswrite at IO/Handle.pm line 207.
ERROR! Wide character in syswrite at IO/Handle.pm line 207.
Compilation failed in require.
Wide character in syswrite at IO/

I have located the problem to be that wget changes ‘ and ’ to â

â/home/user/script/cache/posters/28386.jpgâ

All successful downloads have the ‘ and ’

HTTP request sent, awaiting response... 200 OK
Length: 28218 (28K) [image/jpeg]
Saving to: ‘/home/user/script/cache/posters/6597.jpg’ 

I just tried adding this

system "wget --restrict-file-names=nocontrol -P '/home/tup/tuper4/cache/posters' $imgurl";

In the hope that it would work better and so far it has not failed but I suspect it's not the issue and would like some guidance if possible.

Should I maybe try

system "cd /location/ && wget $imgurl";

Would it make any difference?

I guess my real question here is: What could cause wget to change from ‘ and ’ to â ?

Thank you in advance for any help!

Output of locale is:

LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

And the images are also UTF-8

I did suspect that it had to do with the encoding and hence added

--restrict-file-names=nocontrol

Remains to see if it will work.

Edit: Several days later and I have not seen the error again so it looks like "nocontrol" helped.

1

There are 1 best solutions below

8
On

It's not wget changing the character.
The character encoding seems to be set to something wrong.

When the real encoding is UTF-8, as it probably is, but set to something else, showing the quote as character â is a typical symptom. Sometimes it's followed by more characters.

So it should work if you set the encoding to UTF-8.

--

What is the output of the command locale?

Background info:
http://askleo.com/why_do_i_get_odd_characters_instead_of_quotes_in_my_documents/

Googling "â quote" gives some good results.