Suggestions for a multi-faceted search software stack

775 Views Asked by At

I need to create a search facility as part of a new project for a client. The records will be things that happen on one or more specific dates. It would be great to get SO's advice on what tools would be best used for the following requirements:

  1. Needs to (multi-faceted) search tens of thousands of records (based on fields such as category, date, price etc)
  2. Needs to search on multi-value fields (i.e. tags)
  3. Needs to be able to order by static factors (such as price, distance etc)
  4. Needs to be able to order by dynamic / frequently changing factors (such as user engagement / traffic etc)
  5. Needs to be able to only return records for which there has been activity in the user's own social network (i.e. 'only show me results my friends have engaged with').
  6. Will be deployed in EC2

My current thoughts are:

  1. Use a hybrid of something like Amazon CloudSearch and Redis
  2. 10s of thousands are not actually that many records. Perhaps do the bulk of the work in an RDBMS, with CloudSearch for full-text searching?
  3. Use Redis to maintain a sets of recently interacted with records for each user, then union them to get the records in a user's network.

My main concern is the latency of pulling back perhaps many thousands of IDs from various services (Redis/CloudSearch) and then having to union them in the client code. However, perhaps this is unfounded.

I'm hoping that there is perhaps a technology stack out there which I have missed that can solve a lot of this for me. I don't want to go reinventing the wheel.

Any suggestions welcome!

1

There are 1 best solutions below

4
On BEST ANSWER

I recommend you Amazon CloudSearch for your requirement:

  • Needs to (multi-faceted) search tens of thousands of records (based on fields such as category, date, price etc)

CloudSearch is really great when it come to muti-faceted search. It's wildly use on Amazon own website. And it's process blazing fast. The search index is kept in memory to ensure that requests can be served at very high rates.

  • Needs to search on multi-value fields (i.e. tags)

No problem (for any search engine)

  • Needs to be able to order by static factors (such as price, distance etc)

No problem (for any search engine)

  • Needs to be able to order by dynamic / frequently changing factors (such as user engagement / traffic etc)

You can set "formula" in CloudSearch. It will make your request rank higher or lower. It's usually used for: - providing "fresh" content by boosting the result raking based on the published date. - boosting popular result

CloudSearch is really good at this task. So it look like it will fit you well.

  • Needs to be able to only return records for which there has been activity in the user's own social network (i.e. 'only show me results my friends have engaged with').

I guess no problem with that.

  • Will be deployed in EC2

A win for CloudSearch. You request will stay in Amazon networking. Making you request much faster than going through the internet.

My main concern is the latency of pulling back perhaps many thousands of IDs from various services (Redis/CloudSearch)

CloudSearch will not slow down. Based on the load it might automatically: update the hardware (bigger instance), start new instance, split data across instances.

Maybe you can switch for SimpleDB from Amazon instead of Redis? It will allow you to scale up easily. But it's not rare to use an other database when using Amazon Cloud Search (or any search engine database).

Perhaps do the bulk of the work in an RDBMS, with CloudSearch for full-text searching? Maybe. But be carefull RDBMS don't scale up as easily as Cloud Search.

By the way, It's the creator of Amazing Cloud Search but I'm not working for Amazon Cloud Search :-) I just feel the technology is really great (when it fit your need).

Hope it help. And hope it's not too messy.