When i start my crawling i realized that it took much more time then it should have and still not finished
I tried to check the process pid to see what's going on from another termminal and the outputs were not clear to me, they were all of this form:
REMOVED by Not SEED, Prod or Cat **** https://(url of a wanted to be crawled page)
perhaps if someone understand them it would be cool to let me know ! I highly doubt it's the crawling config code (crawl-beans.cxml) if someone knows how to deal with it pls let me know
Going a bit deeper in it, i think i was stupid,it was a php site so i should be taking time so the thing is there's no problem at all So if