How to Block Spam Referrers like darodar.com from Accessing Website?

32.4k Views Asked by At

I have several websites that get daily around 5% of visits from spam referrers. There is one strange things I noticed about this referrers: they show in Google Analytics, but I cannot see them in my custom designed table where I insert all the visitors to the site, so I think that they only manipulate the GA code, never reaching the site itself.

If you follow their link, they redirect you to some affiliates link.

I don't know whether they have impact on my SEO/SERP, but I would like to get rid of them. May I do that via htaccess file?

One peculiar aspect is that I get visitors from different forum like pages. E.g.: forum.topic221122.darodar.com, forum.topic125512.darodar.com etc., so I would like to block the full darodar.com domain.

Besides darodar.com, there are also econom.co and iloveitaly.co that are bothering my stats. Can I block them all from htaccess?

14

There are 14 best solutions below

2
On BEST ANSWER

This blog post suggests that the spam referrers manipulate Google Analytics and never actually visit your site, so blocking them is pointless. Google Analytics offers filtering if you want to mitigate fake site hits.

0
On

You can restrict access use .htaccess or by filtering ALL robot visits from being tracked by Google Analytics. If that doesn't work, setup Google Analytics filtering. More details on how to do that can be found here: http://www.wiyre.com/google-analytics-darodar-forum-spam-what-is-it/

They are Russian based but routing their spiders through China and the Philippines. Maybe it would be best to block the whole IP address at this point, they have multiple sub-domains.

0
On

I used these mod_rewrite methods for semalt:

RewriteCond %{HTTP_REFERER} ^http(s)?://(www\.)?semalt\.com.*$ [NC]
RewriteCond %{HTTP_REFERER} ^http(s)?://(.*\.)?semalt\.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^https?://([^.]+\.)*semalt\.com\ [NC,OR]

or with the .htaccess module mod_setenvif

SetEnvIfNoCase Referer semalt.com spambot=yes
SetEnvIfNoCase REMOTE_ADDR "217\.23\.11\.15" spambot=yes
SetEnvIfNoCase REMOTE_ADDR "217\.23\.7\.144" spambot=yes

Order allow,deny
Allow from all
Deny from env=spambot

I even created an Apache, Nginx & Varnish blacklist plus Google Analytics segment to prevent referrer spam traffic, you can find it here:

https://github.com/Stevie-Ray/referrer-spam-blocker/

0
On

apparently, this is done by a spammer by communicating directly with google analytics using your website's account ID. So they effectively tell google analytics they visited your page while in fact they never did. They identify themselves to analytics by means of an URL which THEY WANT YOU TO VISIT. So you see their traffic in google analytics and go check them out. They will have an amazon affiliate account hooked up and so they attempt to get a commission from your amazon purchases, for example.

so .htaccess did nothing for me when I was fighting this one; you need to create a filter which filters out things like (.*)/.darodar/.com

the real bad effect I have found from this is it invalidates my website statistics

4
On

Yes you can block with .htaccess and actually you should do it.

Your .htaccess file could look like this:

<IfModule mod_setenvif.c>
# Set spammers referral as spambot
SetEnvIfNoCase Referer darodar.com spambot=yes
SetEnvIfNoCase Referer 7makemoneyonline.com spambot=yes
## add as many as you find

Order allow,deny
Allow from all
Deny from env=spambot
</IfModule>

When traffic comes from these sites, they are blocked with this .htaccess, so the HTML is never loaded and therefore GA script is not fired up (from these sites).

They try to collect traffic from you, once you see the incoming traffic in Google Analytics then trying to find out what is the source you go to that URL. It is harmless to your site, except your statistics are full of junk data.

Google Analytics should prevent this, the same way GMail prevents spam email.

0
On

According to this entry, they are never visiting your site, they are faking HTTP request to GA using your UA-code. So, it seems it's pointless to block them using .htaccess or any other method, because they never actually enter to your site, they are only sending fake "visit" data to Google.

1
On

We have found that using htaccess is a good way to stop these spams. I have implemented below solution on my clients site which is working really well so far. Best way is to stop them by contains clause, e.g. spam priceg.com check for priceg in referrer url.

Because many of these sites are creating sub domains and re hitting and when they tweak the url, hard coded conditions fail

RewriteCond %{HTTP_REFERER} (priceg) [NC,OR]
RewriteCond %{HTTP_REFERER} (darodar) [NC,OR]

It is explained in detail here

3
On

Lunametrics posted a nice article to solve this issue using Google Tag Manager: http://www.lunametrics.com/blog/2014/03/11/goodbye-to-exclude-filters-google-analytics/

0
On

Blocking any bots at your web server level makes no sense - spammers are sending fake requests to Google Analytics web server. All they have to know is website domain name and Google Analytics ID linked to it. So you have to mask your Google Analytics ID at website code. For example, you can do like this at Google Analytics JS code:

ga('create', 'UA-X' + 'XXXXX' + 'XX-X', 'auto');

Spammer's bot should be able to execute JS code to parse your Google Analytics ID after this change (and not so many bots will be able to do it).

https://nobodyonsecurity.com/security/fighting-google-analytics-referrer-spam

0
On

Filter future and historical ga spam of all types with the link provided. Hostname filtering is particularly easy.

https://www.ohow.co/ultimate-guide-to-removing-irrelevant-traffic-in-google-analytics/

0
On

.htaccess is not the best way. In my site I use GA, The option tracking information and then Reference exclusion list.

Regards!

10
On

Most of the Spam in Google Analytics never access your site so you can't block them using any server-side solution.

Ghost Spam hits directly GA and usually shows up only for a few days and then disappear, that's why some people think they blocked them from the .htaccess file but is just coincidence.

This type of Spam is easy to spot since they use either a fake hostname or is not set. (See image below)

The other type, Crawlers like semalt, actually access your site and can be blocked from the .htaccess file, however, there are just a few of them.

So in summary, to stop spam in Google Analytics:

  • Crawlers: server-side solutions or filters in GA
  • Ghosts: ONLY filters in GA

The only efficient solution to prevent being hit by ghost spam is by making an include filter with all your valid hostnames.

First you need to make a REGEX with all the valid hostnames, something like this (you can find them on the network report)

yoursite\.com|shoppingcart\.com|translateservice\.net

These are some examples; you might have more or fewer hostnames. Once you have the REGEX, follow the same steps as above and change this:

  • Go to the admin tab in Google Analytics
  • Select FILTER under the View Column > New Filter
  • Filter type Custom > Include > Filter Field Hostname
  • File Pattern Copy the hostname expression you built

For Crawlers you will have to create a different filter building an expression with all spammers

spammer1|spammer2|spammer3|spammer4|spammer5
  • Filter type Custom > Exclude > Filter Field Campaign source
  • File Pattern Copy the referral expression

Everytime you work with filters it is important that you keep an unfiltered view.

If you need detailed steps for this solutions you can check this complete guide about Spam in Google Analytics.

Guide to stop and remove All the spam in Google Analytics

Hope it helps.

Hostname report Example valid hostnames

0
On

I think that the most effective way to avoid ghost spam is to add a custom dimension that let you know the site was indeed visited, because as we know they never visit the site.

ga('set', 'dimension1', "Hey I'm really here!!");
ga('send', 'pageview');

You should simply add this lines in your pages and then add a filter to "include" only when the dimension has the expected value ("Hey I'm really here!!") in this case

0
On

2019 update

I may have a solution to this problem as I find none of the other solutions to be effective.

Let me address the problems of the existing solutions first

  1. Add a filter for each referrer spam domain.
  2. How many domains will you add?
  3. Most of these referrer spam domains exist for sometime and then disappear
  4. Maintain a blacklist of referrer spam domains.
  5. This gets even more complicated as they are basically endless in numbers.
  6. You would have to keep updating the blacklist.
  7. Also bigger the blacklist, the more time you need to scan it
  8. Anything else such as maintaining a manual htaccess or something will require manual intervention which will not scale as your site becomes more popular
  9. Anything automatic such as using AI to determine patterns in how referrer spam domains appear will have a hit/miss thing

How do these bots work?

First, it is crucial to understand how these bots work

  1. They use regex patterns at the least such as /UA-\d{6}/ to load tracking ids which they visit recursively after starting at a seed website

I believe I have a solution that offers the following advantages

  1. No need to maintain whitelists and blacklist
  2. Will work against 99% of them easily and can always be modified to take it to 100%
  3. Requires almost NO manual intervention
  4. The idea is to NOT have a tracking ID at all in the script

Here is an example

script.
      //- Google Analytics ID
      var a = [85, 65, 45, 49, 49, 49, 49, 49, 49, 49, 49, 49, 45, 50];

      var newScript = document.createElement("script");
      newScript.type = "text/javascript";
      newScript.setAttribute("async", "true");
      newScript.setAttribute("src", "https://www.googletagmanager.com/gtag/js?id=" + a.map(i => String.fromCharCode(i)).join(""));
      document.documentElement.firstChild.appendChild(newScript);

      window.dataLayer = window.dataLayer || [];
      function gtag(){dataLayer.push(arguments);}
      gtag('js', new Date());
      gtag('config', a.map(i => String.fromCharCode(i)).join(""), { 'send_page_view': false });
      // Feature detects Navigation Timing API support.
      if (window.performance) {
        // Gets the number of milliseconds since page load
        // (and rounds the result since the value must be an integer).
        var timeSincePageLoad = Math.round(performance.now());
        console.log(timeSincePageLoad)
        // Sends the timing event to Google Analytics.
        gtag('event', 'timing_complete', {
          'name': 'load',
          'value': timeSincePageLoad,
          'event_category': '#{title}'
        });
      }
  1. We take a very simple approach, break the tracking ID of the form 'UA-1111111-1' into a char code array

  2. Now we construct the tracking ID dynamically from the char code array at any point we need a reference to the tracking ID

  3. The approach can be made infinitely more complex by turning it into encrypted bunch of numbers, base 8 , hexadecimal, adding a fixed offset, a random offset during each run, RSA encrypting the tracking ID with a private key on the server and decrypting it with a public key but the basic approach is REALLY fast, as arrays in JS are really fast, can easily beat 99% of the bots