I need to join 2 pipes with same set of fields, i.e ('id, 'groupName, 'name), same way as SQL UNION works. How it is possible to do it in Twitter Scalding?
SQL Union equivalent in Twitter Scalding
1.2k Views Asked by victor.sarapin At
3
There are 3 best solutions below
1

to join two pipes on three sets of fields, you first want to know which pipe operates on the smaller dataset:
largerPipe1.joinWithSmaller(('id1, 'groupName1, 'name1) -> ('id2, 'groupName2, 'name2), smallerPipe2)
notice that the field names do not need to be the same. you just have to have them in the same order. The result will contain only the Symbol names in the largerPipe1.
note on the comment below: the ++ concatenate operation merely appends the data from one pipe to another. This is not a join.
def ++[U >: T](other: TypedPipe[U]): TypedPipe[U]
Merge two TypedPipes (no order is guaranteed) This is only realized when a group (or join) is performed.