How to search values in real time on a badly designed database?

61 Views Asked by At

I have a collection named Company which has the following structure:

{
    "_id" : ObjectId("57336ea1a7454c0100d889e4"),
    "currentMonth" : 62,
    "variables1": { ... },
    ...
    "variables61": { ... },
    "variables62" : {
        "name" : "Test",
        "email": "[email protected]",
         ...
    },
    "country" : "US",
}

My need is to be able to search for companies by name with up-to-date data. I don't have permission to change this data structure because many applications still use it. For the moment I haven't found a way to index these variables with this data structure, which makes the search slow.

Today each of these documents can be several megabytes in size and there are over 20,000 of them in this collection.

The system I want to implement uses a search engine to index the names of companies, but for that it needs to be able to detect changes in the collection.

MongoDB's change stream seems like a viable option but I'm not sure how to make it scalable and efficient.

Do you have any suggestions that would help me solve this problem? Any suggestion on the steps needed to set up the above system?

2

There are 2 best solutions below

0
On BEST ANSWER

Using the change detection pattern with monstache, I was able to synchronise in real time MongoDB with ElasticSearch, performing a Filter based on the current month and then Map the result of the variables to be indexed

1
On

Usually with MongoDB you can add new fields to documents and existing applications would simply ignore the extra fields (though they naturally would not be populated by old code). Therefore:

  1. Create a task that is regularly executed which goes through all documents in your collection, figures out the name for each document from its fields, then writes the name into a top-level field.
  2. Add an index on that field.
  3. In your search code, look up by the values of that field.
  4. Compare the calculated name to the source-of-truth name. If different, discard the document.

If names don't change once set, step 1 only needs to go through documents that are missing the top-level name and step 4 is not needed.