SQL Union equivalent in Twitter Scalding

1.2k Views Asked by At

I need to join 2 pipes with same set of fields, i.e ('id, 'groupName, 'name), same way as SQL UNION works. How it is possible to do it in Twitter Scalding?

3

There are 3 best solutions below

0
On

def ++[U >: T](other: TypedPipe[U]): TypedPipe[U]

Merge two TypedPipes (no order is guaranteed) This is only realized when a group (or join) is performed.

1
On

to join two pipes on three sets of fields, you first want to know which pipe operates on the smaller dataset:

  largerPipe1.joinWithSmaller(('id1, 'groupName1, 'name1) -> ('id2, 'groupName2, 'name2), smallerPipe2)

notice that the field names do not need to be the same. you just have to have them in the same order. The result will contain only the Symbol names in the largerPipe1.

note on the comment below: the ++ concatenate operation merely appends the data from one pipe to another. This is not a join.

1
On

Use ++ to concatenate the pipes then use project to get rid of the id field.

If this answer is too concise, let me know and I'll try to expand.