I have a working code which traverses one level of URL, I need some Help to implement two or three level of link traversing to detect 404's.
driver().navigate().to(URL);
driver().manage().window().maximize();
String orgWindow = driver().getWindowHandle();
List<WebElement> linksList = driver().findElements(By.tagName("a"));
for (WebElement linkElement : linksList) {
System.out.println("================ At First Level =================");
String link = linkElement.getAttribute("href");
if (link != null && link.contains("test")) {
verifyLinkActive(link); //This method has HTTP URL connection to detect for 404's
// Second Level Traversing.....
driver().navigate().to(link);
driver().manage().window().maximize();
List<WebElement> SecondLinkList = driver().findElements(By.tagName("a"));
for (WebElement linkSecondElement : SecondLinkList) {
System.out.println("================ At Second Level =================");
String Secondlink = linkSecondElement.getAttribute("href");
if (Secondlink != null && Secondlink.contains("test")) {
verifyLinkActive(Secondlink);
}// SecondIF
}//Second for
}//if
driver().switchTo().window(orgWindow); //Switching back to Original window
} //for
My Questions - 1) Is it the right way I have implemented for second or third level of iteration to find 404's. 2) Also is there a way I can ignore certain links which fall with specific tags or ID's , coz these standard links are repetitive and are found on each page and if possible i can ignore these...
looking forward to some inputs!!
If you mean how to structure the program itself, maybe the easiest way is to keep a list of URLs to check (to-check-urls), and a set of already checked URLs (checked-urls).
When your program start, the to-check-urls contains only the first page to visit, and the checked-urls is obviously empty.
Then you have a single loop that repeats until the list of URLs to check is empty, and does this :
The code is mostly there, just need to arrange it in a loop using the two lists. This way, you don't check an url twice, and don't care if they are second or third or fourth level, also because a site is a graph and not a tree, so no matter how many levels you add there could still be more.