On an existing .Net MVC3 site, we implemented paging where the URL looks something like www.mysite.com/someterm/anotherterm/_p/89/10
, where 89 is the page number and 10 is the number of results per page.
Unfortunately the rel="nofollow"
was missing from page number links greater than 3, and those pages were also missing <meta name="robots" content="noindex,nofollow" />
.
The problem is that Google and a few other search engines have now indexed those pages, and now attempting to crawl all of them, quite frequently, which as we found started having a drastic impact on the prod db server. We don't want all those additional thousands of pages crawled, only the first few.
I reverted the code back to a version of the site that does not include paging so that our DB server won't be hit so hard now. So while the search engines will get 404 errors for all those pages, I want to know if this is the best thing to do, since after a while I will introduce the paging site again?
I could add the following to the web.config to have all 404's redirected to the home page:
<httpErrors errorMode="Custom">
<remove statusCode="404"/>
<error statusCode="404" path="/" responseMode="ExecuteURL"/>
</httpErrors>
But I'm thinking that doing this will be rendered as "duplicate content" for all of those pages with pagination URL parameters.
Is the best idea here to just let those 404's continue for a week or two - then re-introduce the paging site?
Another option may be to release the paging site with some code added in to reject crawlers on pages greater than 3. Suggestions?
Is there a quicker way of getting those pages out of the indices so they won't be crawled?
Thanks.
Simply leaving the pages as 404 wouldn't do, as this is a permanent removal. Looking at the RFC 2616 Hypertext Transfer Protocol – HTTP/1.1 chapter 10. Status Code Definitions:
I simply added a new ActuionResult method:
and created new routes for matching "__p":