JAVA url = new URL () malformedURLexception

2.8k Views Asked by At

I was trying to build a crawler that collects HTML sourcecodes from websites, which I have in a .csv file. Everything seems to be working fine whenever I place the link in

url = new URL ("http://example.com")

but whenever I try to place the link in a variable ("text" in this example) I get an error, telling me that there has been a malformedURLException.

Here is my code:

String text ="http://stackoverflow.com/questions/9827143/continuing-execution-after-an-exception-is-thrown-in-java";

// get the sourcecode of the link you just grabbed
url = new URL(text);
PrintWriter writer = new PrintWriter("sourcecode.txt", "UTF-8");
3

There are 3 best solutions below

1
On BEST ANSWER

You have hidden characters in your string. You probably copied the URL from a Word file or a text file that was converted in Windows. There is a BOM marker in its beginning. When I do this:

System.out.println( Arrays.toString(text.getBytes(StandardCharsets.UTF_16BE)));

This is the output I get:

[-2, -1, 0, 104, 0, 116, 0, 116, 0, 112, 0, 58, 0, 47, 0, 47, 0, 115, 0, 116, 0, 97, 0, 99, 0, 107, 0, 111, 0, 118, 0, 101, 0, 114, 0, 102, 0, 108, 0, 111, 0, 119, 0, 46, 0, 99, 0, 111, 0, 109, 0, 47, 0, 113, 0, 117, 0, 101, 0, 115, 0, 116, 0, 105, 0, 111, 0, 110, 0, 115, 0, 47, 0, 57, 0, 56, 0, 50, 0, 55, 0, 49, 0, 52, 0, 51, 0, 47, 0, 99, 0, 111, 0, 110, 0, 116, 0, 105, 0, 110, 0, 117, 0, 105, 0, 110, 0, 103, 0, 45, 0, 101, 0, 120, 0, 101, 0, 99, 0, 117, 0, 116, 0, 105, 0, 111, 0, 110, 0, 45, 0, 97, 0, 102, 0, 116, 0, 101, 0, 114, 0, 45, 0, 97, 0, 110, 0, 45, 0, 101, 0, 120, 0, 99, 0, 101, 0, 112, 0, 116, 0, 105, 0, 111, 0, 110, 0, 45, 0, 105, 0, 115, 0, 45, 0, 116, 0, 104, 0, 114, 0, 111, 0, 119, 0, 110, 0, 45, 0, 105, 0, 110, 0, 45, 0, 106, 0, 97, 0, 118, 0, 97]

The first two bytes are the unicode BOM character. Be careful where you get your strings from. If you export your CSV from Excel, and the file contains only URLs, try to export it as ASCII only.

0
On

There's a problem with your double quote.

I pasted your "text" line into Eclipse and tried to save, and it showed me that there was an invalid character at the start of your "text" string because there was a Cp1252 encoded character.

I deleted the first double quote you had, and retyped it. Then I ran

String text = "http://stackoverflow.com/questions/9827143/continuing-execution-after-an-exception-is-thrown-in-java";

try {
    URL url = new URL(text);
    PrintWriter writer = new PrintWriter("sourcecode.txt", "UTF-8");
    System.out.println("all good");
} catch (FileNotFoundException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
} catch (UnsupportedEncodingException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
} catch (MalformedURLException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

And it worked.

0
On

You have a special char in your text variable. Just tried your link in a Browser and it did not work because of this.

Copy the following and try again:

​String text ="http://stackoverflow.com/questions/9827143/continuing-execution-after-an-exception-is-thrown-in-java";