I'm working on a project of web data mining to extract information directly from HTML by crawling server pages. My effort is concentrated only in an specific website which has a java web server, with caucho resin installed.
Parameters are passed by value pairs in url, like www.xxxxxx.com/jm/search?act=see&id=909&...
I have decoded many parameters by try but of course, results are comming very slowly.
My question is... do you Java Gurus know how to get all valid parameters of this kind of server? it is possible?
I don't have access to server and I don't know nothing about caucho resin, I'm coding an utility in Java to do the job.
Unless the server you're communicating with publishes a complete API, there can be any number of parameters. Consider this--a web form may not post all the parameters the server responds to, like parameters for internal usage, etc.
Since parameter handling is implemented away from "public" eyes, on the server side, it is opaque to the outside world.
If you're referring to the possible values of the parameters, the answer is basically the same. For example, how many valid product SKUs does Amazon have?
(Also note that it might be better to call these "request parameters", as servlets also have "init parameters", which is an entirely different question :)