How do I redirect nonexistent pages to the 404 error page with .htaccess?

2.1k Views Asked by At

Apparently Bingbot is getting caught in an infinite loop on my site. It downloads pages like http://www.htmlcodetutorial.com/quicklist.html/applets/applets/applets/applets/applets/applets/applets/applets/applets/applets/applets/applets/applets/applets/sounds/forms/linking/frames/document/linking/images/_AREA_onMouseOver.html . Since I set my server to interpret .html as PHP the page is simply a copy of http://www.htmlcodetutorial.com/quicklist.html . How do I stop Bingbot from looking for these bogus copies?

Why is Bingbot looking for those pages to begin with?

I'd like to do something like the last line of the .htaccess file shown below (like at "Redirect to Apache built-in 404 page with mod_rewrite?"), but when I try RewriteRule ^.*\.html\/.*$ - [R=404] the entire site shows a 500 error.

Even when I use the last line below it redirects to http://www.htmlcodetutorial.com/home/htmlcode/public_html/help.html which is not what I wanted.

AddType application/x-httpd-php .php .html

RewriteEngine on 
Options +FollowSymlinks

RewriteRule ^help\/.* help.html [L]

RewriteCond %{HTTP_HOST} ^example.com
RewriteRule (.*) http://www.htmlcodetutorial.com/$1 [R=301,L]

ErrorDocument 404 /404.html

RewriteRule ^.*\.html\/.*$ help.html [R=301]

P.S. I know the site is way out of date.

2

There are 2 best solutions below

6
On

Change your last rule to this:

RewriteRule ^(.+?\.html)/.+$ - [R=404,L,NC]
6
On

The problem here is that you either have Multiviews turned on, or apache is interpreting requests like /quicklist.html/blah/blah as a PATH_INFO style request, which will be interpreted as a valid request.

So turn off multiviews by changing your options line to:

Options +FollowSymlinks -Multiviews

Then replace your last rule with:

RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-f
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-d
RewriteRule ^ - [L,R=404]