I am looking for a solution to deindex all the URLs with query strings ?te=
from Google. From example I want to deindex all the URLs https://example.com/?te=
from Google.
Google has currently indexed 21k URLs with the same query string and I want them all to be deindex. Should I used X robot files to do so?
What are the possible solution to do that?
I have tried blocking them using robot.txt
using the command
Disallow: /*?te=
But it didn't help me out.
Your robots.txt solution would mostly work if you gave it enough time. Google usually stops indexing URLs it can't crawl. However, Google occasionally indexes such URLs based in external links without indexing the contents of the page.
Using
X-Robots-Tag
is a much better idea. It will prevent Google from indexing the pages. You will need to remove yourdisallow
rule fromrobots.txt
or Googlebot won't be able to crawl your URLs and see theX-Robots-Tag
. You'll also need to give Googlebot time to crawl all the pages. Some pages will start getting de-indexed in a few days, but it could take months for Googlebot to get through all of them.If you are using Apache 2.4 or later, you can do this in
.htaccess
using Apache's built in expressions:If you are still on Apache 2.2 or earlier, you'll have to use a rewrite rule and environment variable to achieve the same effect:
I recommend testing to see if it is working using curl on the command line.
should NOT show a line that is
X-Robots-Tag: noindex
, but the following command should show it: