How can i rightly configure my crawling program crawl-beans.cxml

80 Views Asked by At

When i start my crawling i realized that it took much more time then it should have and still not finished

I tried to check the process pid to see what's going on from another termminal and the outputs were not clear to me, they were all of this form:

REMOVED by Not SEED, Prod or Cat **** https://(url of a wanted to be crawled page)

perhaps if someone understand them it would be cool to let me know ! I highly doubt it's the crawling config code (crawl-beans.cxml) if someone knows how to deal with it pls let me know

1

There are 1 best solutions below

0
On

Going a bit deeper in it, i think i was stupid,it was a php site so i should be taking time so the thing is there's no problem at all So if