how build parent child relationship in pyspark or python?

1k Views Asked by At

I have numbers like key,value(1,2),(3,4),(5,6) ,(7,8),(9,10),(2,11),(4,12),(6,13),(8,14),(14,19)

my input is (1,2),(3,4),(5,6) ,(7,8),(9,10),(2,11),(4,12),(6,13),(8,14)

here i need to create relation 1 --> 2 and 2--> 11 my final output is(1,11)..i.e. if you consider first tuple key is 1 and 2 value again one of the other given tuple 2 is key and 11 is value.i.e.parent and child and grand child relation i want my output is like (parent,grand child)

my final output should be: (1,11),(3,12),(5,13),(7,19),(9,10)

Suppose i have a dataframe like below:

key   value
 1     2
 3     4
 5     6
 7     8
 9     10
 2     11
 4     12
 6     13
 8     14
14     19
19     23
13     17

my excepted output is new df:

key  value
1    11
3    12
5    17
7    19
9    10

how to implement in python /pyspark?

1

There are 1 best solutions below

1
On

Not tested, but something like this should do the trick:

s = [(1,2),(3,4),(5,6),(7,8),(9,10),(2,11),(4,12),(6,13),(8,14)]

for parent, child in s:
    g_child = [x[1] for x in s if x[0] == child]
    if g_child:
        print((parent, g_child[0]))
    else:
        print((parent, child))
        break

OUTPUT:

(1, 11)
(3, 12)
(5, 13)
(7, 14)
(9, 10)