Can someone suggest how to solve the below search problem easily, I mean is there any algorithm, or full text search will be suffice for this?
There is below classification of items data,
| ItemCategory | ItemCluster | ItemSubCluster | SubCluster | Items |
|---|---|---|---|---|
| Vegetable | Root vegetables | Root | WithOutSkin | potato, sweet potato, yam |
| Vegetable | Root vegetables | Root | WithSkin | onion, garlic, shallot |
| Vegetable | Greens | Leafy green | Leaf | lettuce, spinach, silverbeet |
| Vegetable | Greens | Cruciferous | Flower | cabbage, cauliflower, Brussels sprouts, broccoli |
| Vegetable | Greens | Edible plant stem | Stem | celery, asparagus |
The inputs will be some thing like,
sweet potato, yam
Yam, Potato
garlik, onion
lettuce, spinach, silverbeet
lettuce, silverbeet
lettuce, silverbeet, spinach
From the input, I want to get the mapping of the input items those belongs to which ItemCategory, ItemCluster, ItemSubCluster, SubCluster.
Any help will be much appreciated.
You are nearly following the right approach.
You don't need full text searching here.
What you can create here is a kind of inverted index as follows:
If we take example of
potato, create a map forpotatostoring what is its ItemCategory, ItemCluster, ItemSubCluster, SubCluster.For example -
Now, to store this kind of data for each vegetable would be expensive.
You can optimise the storage by using an encoding scheme:
For example -
let
ItemCategorybe denoted by0, letItemClusterbe denoted by1, letItemSubclusterbe denoted by2, letSubclusterbe denoted by3and the values be denoted by a similar encoding scheme:
let
Vegetablebe denoted by0, letRoot vegetablesbe denoted by1, letRootbe denoted by2, letWithout Skinbe denoted by3Now, your mapping becomes:
To further optimise this, you can also make maintain an index of vegetables. For example,
potatocan be denoted by0.So your final index becomes: