To index or not to index? What is Google CSE doing? Alternatives?

335 Views Asked by At

I am trying to understand what Google CSE (Custom Search Engine) is doing. I use the free version and submit a sitemap.php.

Google CSE takes this and indexes 200 (out of 2500 pages). I did this some time ago and is starting to wonder if it ever will index the rest.

If I look in Google Webmaster Tools, dashboard for the site in question it says 200 pages are indexed.

If I look in Google Webmaster Tools, Index Status it tells me that 0 pages are indexed. That looks incorrect to me. 200 is what I guess is correct at the moment, but I really do not know.

I suspect that the differences are due to that Google knows about the website before. However the sitemap.php points to pages it can not find without this file.

I am starting to wonder if this will work at all. Google CSE has previously sometimes returned 0 and sometimes a lot of hits. I have not been able to understand what is going on and that is why I am adding this sitemap. The sitemap presents the pages in question in a new way that I think is better for Google. (The same pages are also in a different form on http://zotero.org/.)

Any suggestion for what I can do to get this search working? (I am considering using OpenSearchEngine, but I do not have a webhost available at the moment where I can run Java. And this is a free project, on my spare time, so I do not have a lot of economic resources for this. Maybe I can get Apache Lucy to work, but I am unsure. I tried to compile it under Cygwin, but it failed due to a problem with the gcc-4-link which is fixed in perl 5.18, but Cygwin only have 5.14. My web hotell hosts of course runs Linux, but it looks a bit early for Lucy. Maybe I am wrong?)

1

There are 1 best solutions below

3
On

Every free Custom Search Engine is assigned a quota of 200 pages for immediate indexing: https://support.google.com/customsearch/answer/115958?hl=en

But, I think On-demand indexing may not be what you want, you simply want your 2.500 URLs to be searchable by CSE (not crawled as soon as possible). And this could be the problem: "If I look in Google Webmaster Tools, Index Status it tells me that 0 pages are indexed".

If your site is not indexed by Google, so it doesn't appear in www.google.com results, then you probably can't use CSE (yet). You can see how many pages you have indexed using site: operator - https://www.google.com/webhp#q=site%3Azotero.org (and in Google Webmaster Tools, Index Status, as you said).

I think you should submit sitemap in Webmaster Tools, and to make sure your site is easy to crawl (pages are loading fine, and they are interlinked, navigation is "hard coded" in plain HTML and not generated by JavaScript, or you provided AJAX HTML snapshots, etc.), and there are no technical issues (like invalid robots.txt file, and similar), and when you see your 2.500 pages on site:your-domain.com search on www.google.com, they will automatically appear on your CSE, too.