RewriteCond REMOTE_ADDR does not capture an IP

43 Views Asked by At

I run a home server on Apache 2.4. Have a problem that some BING robots do not respect to robots.txt. So, I wanted to make them read robots.txt if penetrated into certain directories. My thought is this:

Case1.

RewriteCond %{REMOTE_ADDR} ^40\.77\.167\.[0-255]$
RewriteCond %{REQUEST_URI} /forum/ [NC]
RewriteRule ^(.*)$ /robots.txt [L]

It did not work. 40.77.167.52 comes into forum (ex. /forum/foo.cgi), which is disallowed by robots.txt. Then, I tried to separate two RewriteCond lines:

Case2.

RewriteCond %{REMOTE_ADDR} ^40\.77\.167\.[0-255]$
RewriteRule .* - [F,NC,L]

This did not work either. So I conclude the "RewriteCond %{REMOTE_ADDR} ^40.77.167.[0-255]$" is somewhat wrong, but my eyeball does not find anything wrong.

"LoadModule rewrite_module" is active and RewriteEngine is On. Some other Rewrites do work.

My final wish is something like this:

Case3.

RewriteCond %{REMOTE_ADDR} (^40\.77\.167\.)|(^157\.55\.39\.)|(^207\.46\.13\.)|(^65\.55\.210\.)
RewriteCond %{REQUEST_URI} (/forum/)|(/Picture)|(/Lib/PhotoLib)|(/cgi-bin)|(/Lib/jso/) [NC]
RewriteRule .* /robots.txt [L]

Could any guru help me, please?

Want to know what is wrong in my httpd.conf description.

1

There are 1 best solutions below

1
Obsidian On

You probably want to try your rules with your own address instead of Bing's one at first to ensure these rules are working and you get the expected robots.txt's content.

But in any case, rewriting a rule is transparent from client's side, so whatever this client is (a spider bot or a regular browser), it will never know that what he sees is actually the robots.txt's content, not the regular target's one.

Moreover, be aware that this file is only advisory. Bots are not forced to respect them, even though every serious search engine does anyway.

What you probably aimed to do instead is probably a redirection using a 30x HTTP code, but even in this case, this is not the approach you want to follow.

Check first in robots.txt is at its correct place and is readable from outside (including from the bot's point of view), then ensure that this file's content format is correct too. In particular, note that rules are case-sensitive.

robots.txt at Google Search Central