I'm trying to figure out how to find out base of an URL on a remote application or server. For example the url could be: http://www.server.com/app/something/else/page.html
Now the absolute base could be www.server.com, www.server.com/app or anything less than page.html. Base depends on how the application is configured in web.xml and if it is proxied through apache for example.
I need to know this information since I'm reading the url content as a client and I need to know how to handle various relative content found from the page.
Any hint would be appreciated...
This is impossible to determine as a client because the only thing you know about the server is the URL, the server could be configured in any number of ways internally that have nothing to do with the HTML content returned to your request.
If you need to crawl the site like a browser would, you should follow same rules it does when encountering a relative link. As a client you can't assume anything about the server that the server does not tell you.