I am trying to build a query to match two columns and I have tried the following:
obj= obj.filter(e => e.colOne.exactMatch(e.colTwo))
I am not be able to get this working, is there any way to filter by comparing the content of 2 columns?
I am trying to build a query to match two columns and I have tried the following:
obj= obj.filter(e => e.colOne.exactMatch(e.colTwo))
I am not be able to get this working, is there any way to filter by comparing the content of 2 columns?
The filter()
method can't dynamically grab the value to filter based on each object, but can be used to filter on a static value.
You can filter a smaller object set (<100K rows) named myUnfilteredObjects
of type ObjectType
this way:
let myFilteredObjects = new Set<ObjectType>();
for (const unfilteredObj of myUnfilteredObjects) {
if (unfilteredObj.colOne === unfilteredObj.colTwo) {
myFilteredObjects.add(unfilteredObj);
}
}
Edit: updating with a solution for larger-scale object sets:
You can create a new boolean
column in your object's underlying dataset that is true
if colOne
and colTwo
match, and false
otherwise. Filtering on this new column via the filter()
method will then work as you expect.
It is not possible to compare two columns when writing Functions. A recommended strategy here would be to create a new column that captures your equality. For example in your pyspark pipeline, right before you generate the end objects that get indexed:
And then filter on that new column: