I currently try to prototype a product recommendation system using the Elasticsearch Significant Terms aggregation. So far, I didn't find a good example yet which deals with "flat" JSON structures of sales
(here: The itemId
) coming from a relational database, such as mine:
Document 1
{
"lineItemId": 1,
"lineNo": 1,
"itemId": 1,
"productId": 1234,
"userId": 4711,
"salesQuantity": 2,
"productPrice": 0.99,
"salesGross": 1.98,
"salesTimestamp": 1234567890
}
Document 2
{
"lineItemId": 1,
"lineNo": 2,
"itemId": 1,
"productId": 1235,
"userId": 4711,
"salesQuantity": 1,
"productPrice": 5.99,
"salesGross": 5.99,
"salesTimestamp": 1234567890
}
I have around 1.5 million of these documents in my Elasticsearch index. A lineItem
is a part of a sale
(identified by itemId
), which can consist of 1 or more lineItems
What I would like to receive is the, say, 5 most uncommonly common products which were bought in conjunction with the sale of one specific productId
.
The MovieLens example (https://www.elastic.co/guide/en/elasticsearch/guide/current/_significant_terms_demo.html) deals with data in the structure of
{
"movie": [122,185,231,292,
316,329,355,356,362,364,370,377,420,
466,480,520,539,586,588,589,594,616
],
"user": 1
}
so it's unfortunately not really useful to me. I'd be very glad for an example or a suggestion using my "flat" structures. Thanks a lot in advance.
Since I don't have the amount of data that you do, try this:
itemId
s for bundles that contain a certainproductId
that you want to find "stuff" for:Then