Full text search with Amazon Services

3.8k Views Asked by At

I would to move my application to Amazon SimpleDB, since I’m not going to maintain database service on my own. This application lives under heavy load. There are a lot of reads/writes per second. I don’t need consistency and atomicity and I want to keep things as simple as possible, so SimpleDB is good choice.

The problem is, that I need full-text search capacities. And I don’t know how to make it better with Amazon SimpleDb. I had implemented before hand-written full-text search with MongoDB database. I had to split text to words in my application layer, and build my own index. It was not hard, but I don’t want to do it again with SimpleDB.

I found an interesting article http://codingthriller.blogspot.com/2008/04/simpledb-full-text-search-or-how-to.html

But I would like to not have to implement it myself. I’m looking for a pre-made solution

What are the options?

Is it better to use Amazon RDS + Lucene?

Or probably there are out of the box solutions for SimpleDB?

Requirements are:

  • Ability to handle a lot of concurrency requests
  • Full-text search (text size would not be greater then 1MB (SimpleDB restriction))
  • Preferable not to admin it on my own.
3

There are 3 best solutions below

1
On BEST ANSWER

Lucene or similar is usually the way people do it, but not knowing what platform you're working with its hard to suggest anything in particular. Simol is an .NET object-persistence framework for SimpleDB which can use Lucene.NET for indexing. I've also looked at some basic Lucene.NET examples which aren't too bad. If you're looking for a hosted indexing service you could take a look at this question.

For your indexing to do its job well, you're more than likely going to have to tailor it to your application.

0
On

Amazon looks like they will announce something to do with search on Jan 18 2012. http://pandodaily.com/2012/01/17/good-news-for-ec2-customers-amazon-may-launch-new-cloud-search-tomorrow/

SimpleDB for full text search is not great. It will not search more than about 300,000 documents on a single field, using the %like% operator, for instance. It will take about 2 or three tries - about 15 seconds to run through only a hundred MB of text looking for a match. I think its too slow, as do others. See the AWS forums...

0
On

Amazon CloudSearch has been released but does not have an easy way to move data from your SimpleDB to CloudSearch without you writing code.

The API, however, is fairly simple and it probably could get up in running in a week or two depending on your needs (if you use the existined SDKs). If you're using a programming language without an SDK, then it will take you longer.

http://aws.amazon.com/cloudsearch/