Fast and dynamic autosuggest using JQuery and flat files in JSON format

882 Views Asked by At

I've been busting my head on this one so i finally decided to put it on Stackoverflow hoping someone could help me with this.

Let me sketch the problem:

  • The most basic usage of an autosuggest feature would be to query the database on each keystroke.

The advantage of this method is that you always get the most recent results.

The disadvantages are that it can be slow and that basically it is overkill to address your database on each keystroke. Suppose you want to autosuggest a country table, searching for something like 'america' would generate 7 SELECT query's on the database ! This method is therefor not an option.

  • The second option (the one i am currently using) is to collect all the possible results and save it into a flat file in JSON format. So basically what you would have is a single file e.g. countries.json which contains all the countries. Instead of querying the database i just point to the countries json file and each keystroke will just search within the loaded json file.

This method is fast and leverages the database. In the example of the countries this would be a good solution since the content is not very dynamic and it does not contain that much data so the json flat file will be quite small so it will download fast.

But the problem arises when the json file contains dynamic data and has a large dataset. What if you want to build an autosuggest search containing all the movies and actors? Putting all the results inside one json file would be ridiculous since the file would probably be too large to download. Let's take IMDB (www.imdb.com) as an example. They have an autosuggest feature on their website. Searching on a movie title or actor is lightning fast and it searches within hundreds of thousands of records. Looking deeper into their method i notice the following:

If you use the Google Chrome debug window (or Filezilla Firefox Firebug) : Look inside the network tab for scripts. As you type say for example for "star wars" you notice that by typing each letter a separate json file is loaded. first it's s.json, then st.json, sta.json, etc... It stops at star_w.json Apparently it always stops at the 6th letter.

Each json file it loads contains 8 results matching the letters. There seem to be 36 folders matching each letter of the alphabet including numbers. E.g the link for star_w.json is http://sg.media-imdb.com/suggests/s/star_w.json

My question is how would you build a similar intelligent system? Do they have every possible combination of 6 letters as a json file? How would you generate the files? Also notice it does not only look at start of the title to make a match. So if you would search for "wars" you will also find "star wars" inside the search results.

Any help in figuring out building a similar system would be much appreciated. I think the system is very neat and fast and it can be used for many purposes!

EDIT

Ok, worked it out. My solution is a combination of querying the DB and caching the results in flat json files. Basically when the user types e.g. "star wars", with each keystroke i will first check if there is a json file for the searchstring. If it exists i'll just display the contents of the json file. If the file does not exist or if the file is older than a certain timelimit you set (e.g older than 12hrs) you query the database and echo the output in json. At the same time you write the output to a json file. The next time you make the search it will just get it's content from the flat files and not from the DB (until ofcourse the TTL is reached). I've limited the length of letters to 6 letters so if the search string is larger than 6 letters nothing happens, you just continue to look within the search results of the last found results, in the example of "star wars" that would be star_w

With this method the collection of cached json results is automatically populated based on the user input. You could also write a script that loops your table and generates the json files...

I know it's not the perfect solution but this way at least you can take some load of the DB by caching results in flat files.

1

There are 1 best solutions below

1
On

I would use some search engine software that creates indexes, better suited for these scenarios than what the RDMS can provide.

Take a look at Boolware from Softbool: http://www.softbool.se/ (click the In English) link...

Horrible web page, great product.