I have following document
{
"url": "/some/listing/url",
"title": "HOTELS WITH RIVER VIEW IN SALZBURG",
"pageType": "LISTING_PAGE",
"pageMetaData": {
"$placeId": 123,
"hotels" : "1,2,3",
"priority": 1
}
},
{
"url": "/some/listing/url",
"title": "HOTELS FOR FAMILY IN SALZBURG",
"pageType": "LISTING_PAGE",
"pageMetaData": {
"$placeId": 123,
"hotels" : "1,2",
"priority": 2
}
},
{
"url": "/some/listing/url",
"title": "HOTELS in AUSTRIA",
"pageType": "LISTING_PAGE",
"pageMetaData": {
"$placeId": 1,
"hotels" : "1,2,3",
"priority": 1
}
},
{
"url": "/some/listing/url",
"title": "HOTELS with MOUNTAIN VIEW IN SALZBURG",
"pageType": "LISTING_PAGE",
"pageMetaData": {
"$placeId": 123,
"hotels" : "1,2,3",
"priority": 2
}
},
{
"url": "/some/listing/url",
"title": "HOTELS WITH RIVER VIEW IN HALLSTATT",
"pageType": "LISTING_PAGE",
"pageMetaData": {
"$placeId": 34,
"hotels" : "5,6",
"priority": 1
},
{
"url": "/some/listing/url",
"title": "HOTELS FOR FAMILY IN HALLSTATT",
"pageType": "LISTING_PAGE",
"pageMetaData": {
"$placeId": 34,
"hotels" : "5,6",
"priority": 2
},
{
"url": "/some/external/url",
"title": "HOTELS FOR FAMILY IN HALLSTATT",
"pageType": "EXTERNAL_PAGE",
"pageMetaData": {
"$placeId": 34,
"hotels" : "5,6",
"priority": 3
}
}
//just a note: hotels is string and already sorted, so if two pages have same set of hotels it will be same string
I am trying to find all the docs which are duplicate for place and sort it based on priority for a page type
For example for Listing_page type the result would based on priority ascending
"HOTELS WITH RIVER VIEW IN SALZBURG" //has prio 1 and has duplicate hotels
"HOTELS WITH RIVER VIEW IN HALLSTAT" //has prio 1 and has duplicate hotels
"HOTELS with MOUNTAIN VIEW IN SALZBURG" //has prio 2 and has duplicate hotels
"HOTELS WITH FAMILY IN HALLSTAT" //has prio 2 and has duplicate hotels
since hotels in these places are duplicate of each other. Note even if austria as same hotel, but since it has different place is not considered duplicate
I know how to create a sql query where I could use "not exists" and get the duplicates. But I see that mongo does not have something like not exists. Also I dont want to do group by, as it will lose the pages and I want all pages with duplicate hotels sorted on priority