My site got hacked recently and has over 3 million pages now when it only has 30 pages (see screenshot).
How do I implement the correct 410 header in .htaccess?
I think the best tactic is to 410 all pages that contain a number OR .htm OR .html as none of the real pages have these in the URL. For example -
https://example.com/cixc-20050gsakuramar/-b00006.htmhttps://example.com/sfumato.php?nzlw-21833vetidm4https://example.com/bzmt-5694ceti.htmlhttps://example.com/pfks-14602sjp/ucqksti.htmhttps://example.com/admv-15974mitem/318
Would this code work?
Redirect 410 /*0*
Redirect 410 /*1*
Redirect 410 /*2*
Redirect 410 /*3*
Redirect 410 /*4*
Redirect 410 /*5*
Redirect 410 /*6*
Redirect 410 /*7*
Redirect 410 /*8*
Redirect 410 /*9*
Redirect 410 /*.html*
Redirect 410 /*.htm*
I've also pieced together a rewrite rule which might also work?
RewriteRule ^([0-9]+)$ - [G,L]
I am also thinking of adding Disallow to robots.txt like this -
Disallow: /*0*
Disallow: /*1*
Disallow: /*2*
Disallow: /*3*
Disallow: /*4*
Disallow: /*5*
Disallow: /*6*
Disallow: /*7*
Disallow: /*8*
Disallow: /*9*
Disallow: /*.htm
Disallow: /*.html

The redirect directive of mod_alias doesn't support wild cards. So your rules such as
Redirect 410 /*0*would not do what you expect. You could make them intoRedirectMatchdirectives which support regular expressions. I'd combine all the numbers into one rule, and html suffixes into another:From your Google Search Console screenshot, it looks like some of the URLS have query strings in them with a
?.mod_aliasdoesn't consult the query string at all when matching the URL. If the.htmlappears in the query string and not in the URL path,RedirectMatchwon't be able to match it.I'd recommend going with
mod_rewriterules which can match the query string. Another reason to prefer .htaccess would be if you have other rewrite rules in your.htaccess. Additional rewrite rules would be less likely to conflict than mod_alias rules.I've added a condition to skip
wp-contentURLs because in the comments, you say you actually have some CSS files with numbers in them.I wouldn't recommend using a
Disallowinrobots.txtbecause Google sometimes indexes disallowed URLs anyway even if it can't crawl them.