Spam Traffic from Google Hosted IPs

282 Views Asked by At

I have been facing a serious issue on my website due to potential spam traffic originating from Google hosted IP addresses. Here are two examples:

Example 1: IP: 34.77.98.119 | User Agent: newspaper/0.2.8 Hostname: 119.98.77.34.bc.googleusercontent.com

Example 2: IP: 34.170.179.100 | User Agent: go-http-client/2.0 Hostname: 100.179.170.34.bc.googleusercontent.com

As you can see above, the IP address in the hostname has been reversed and the UA is cryptic / not mentioned in Google authorized docs such as [1] https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers and [2] https://developers.google.com/search/docs/crawling-indexing/verifying-googlebot.

I need to ensure my website remains safe and user-friendly. I also do not want to mistakenly block legitimate Google crawlers while addressing this issue.

I request the community's guidance on: How to tell apart legitimate traffic from malicious traffic from Google hosted IPs. (By legitimate, I am primarily concerned with Google crawlers and services, everyone else I will do a security profile and determine if we consider them to be malicious for us or not).

The lists in [1] and [2] seem to be incomplete because when I trigger a hit from Google Pagespeed Insights tool, the IP is 66.249.82.64, the UA is "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4590.2 Safari/537.36 Chrome-Lighthouse" and the hostname maps to google-proxy-66-249-82-64.google.com but both of them (UA and hostname) are not mentioned in the above two lists [1] & [2] of genuine UAs and crawlers including "user-triggered-fetchers". Similarly, in the two examples above, the hostnames end in bc.googleusercontent.com and this hostname is not listed in the above google genuine crawlers as well.

Look forward to understand on how based on UA and IP combination we can separate genuine Google triggered traffic from malicious traffic that is also generated from Google servers such as Google cloud / compute engine VMs, etc. that anyone in the world can "rent".

1

There are 1 best solutions below

3
NoCommandLine On

The second document you linked to, shows how to manually verify a Google crawler. Following the steps in that section, then for your first IP, you'd run the command

$ host 34.77.98.119

and this gives

119.98.77.34.in-addr.arpa domain name pointer 119.98.77.34.bc.googleusercontent.com.

and then running

host 119.98.77.34.bc.googleusercontent.com

gives

119.98.77.34.bc.googleusercontent.com has address 34.77.98.119

From the above, I'd say that the IP is from Google

Verify that the domain name is either googlebot.com, google.com, or googleusercontent.com