Does it matter if the Disallow path is different from Drupal’s directory?

75 Views Asked by At

I'm looking to NOINDEX all my tag pages i.e.

http://example.com/tags/tabs
http://example.com/tags/people

etc.

If I add the following to my robots.txt page (see: http://jsfiddle.net/psac2uzy/)

Disallow: /tags/
Disallow: /tags/*

will this stop Google from indexing all my tag pages?

Even though those paths aren't the same as the Drupal structure (since Drupal keeps content in the database)?

2

There are 2 best solutions below

1
On

Add before:

User-Agent: *
Crawl-Delay: 10
Disallow: /tags

(Maybe you can try out not clean URLs too: Disallow: /?q=tags )

Check this page for more information.

Hope that helps

0
On

Note: You can’t disallow indexing with robots.txt, you can only disallow crawling (related answer).

What matters are the actual URLs which your users, among them search engines, see. They don’t have access to your backend, so they don’t even know how your site works interally.

The line Disallow: /tags/ (no need for the other one with *) means that all URLs whose paths start with /tags/ should not be crawled. So, assuming that the robots.txt is at http://example.com/robots.txt, this would block for example:

  • http://example.com/tags/
  • http://example.com/tags/foo
  • http://example.com/tags/foo/bar

If your tags are available under a different URL (for example, Drupal’s default /taxonomy/term/…), and a bot finds these alternative URLs, it may of course crawl them. So it’s generally a good idea to always redirect to the one canonical URL you want to use.