I have 2 phonograph objects, each one having millions of rows, which I have linked by using the Search Around methods.
On the example below, I filter to an Object Set of Flights based on the departure code, then I Search Around to the Passengers on those flights and then I filter again based on an attribute of Passengers Object.
const passengersDepartingFromAirport = Objects.search()
.flights()
.filter(flight => flight.departureAirportCode.exactMatch(airportCode))
.searchAroundPassengers()
.filter(passenger => passenger.passengerAttribute.exactMatch(value));
The result of the above code is:
LOG [2022-04-19T14:25:58.182Z] { osp: {},
objectSet:
{ objectSetProvider: '[Circular]',
objectSet: { type: 'FILTERED', filter: [Object], objectSet: [Object] } },
objectTypeIds: [ 'passengers' ],
emptyOrderByStep:
{ objectSet: '[Circular]',
orderableProperties:
{ attributeA: [Object],
attributeB: [Object],
attributeB: [Object],
...
Now, when I am trying to use take() or takeAsync() or to aggregate the result using groupBy(), I receive the below error:
RemoteError: INVALID_ARGUMENT ObjectSet:ObjectSetTooLargeForSearchAround with instance ID xxx.
Error Parameters: {
"RemoteError.type": "STATUS",
"objectSetSize": "2160870",
"maxAllowedSize": "100000",
"relationSide": "TARGET",
"relationId": "flights-passengers"
}
SafeError: RemoteError: INVALID_ARGUMENT ObjectSet:ObjectSetTooLargeForSearchAround with instance ID xxx
What could be the way to aggregate or to reduce the result of the above ObjectSet?
The current object storage infrastructure has a limit on the size of the "left side" or "starting object set" for a search around of
100,000
objects.You can define and object set that uses a search around, which is what you're seeing as the result when you execute the Function before attempting any further manipulations.
Using
take()
orgroupBy
"forces" the resolution of the object set definition. I.e. you no longer need the pointer to the objects, but you need to actually materialize some data from each individual object to do that operation.It's in this materialization step that the limit comes into play - the object sets are resolved and, if the object set at the search around step is larger than 100,000 objects, the request will fail with the above message.
There is ongoing work for Object Storage v2, which will eventually support much larger search-around requests, but for now it's necessary create a query pattern that results in less than 100,000 objects before making a search around.
In some cases it's possible to create an "intermediate" object type that represents a different level of granularity in your data or two invert the direction of your search around to find a way to address these limits.