I want to know how to find sitemap in each domain and sub domain using python? Some examples:
abcd.com/sitemap.xml
abcd.com/sitemap.html
abcd.com/sitemap.html
sub.abcd.com/sitemap.xml
And etc.
What is the most probable sitemap names, locations and also extensions?
Please take a look at the
robots.txtfile first. That's what I usually do.Some domains do offer more than one sitemap and there are cases with more than 200 xml files.
Please remember that according to the FAQ on sitemap.org, a sitemap file can be gzipped. Consequently, you might want to consider
sitemap.xml.gztoo!