My problem is the following: I am parsing users interactions, each time an interaction is detected I emit ((user1,user2),((date1,0),(0,1))). The zero's are here for the direction of the interaction.
I cannot figure out why I cannot reduce this output with the following reduce function:
def myFunc2(x1,x2):
return (min(x1[0][0],x2[0][0]),max(x1[0][0],x2[0][0]),min(x1[0][1],x2[0][1]),max(x1[0][1],x2[0][1]),x1[1][0]+x2[1][0],x1[1][1]+x2[1][1])
The output of my mapper (flatmap(myFunc)) is correct:
((7401899, 5678002), ((1403185440.0, 0), (1, 0)))
((82628194, 22251869), ((0, 1403185452.0), (0, 1)))
((2162276, 98056200), ((1403185451.0, 0), (1, 0)))
((0509420, 4827510), ((1403185449.0, 0), (1, 0)))
((7974923, 9235930), ((1403185450.0, 0), (1, 0)))
((250259, 6876774), ((0, 1403185450.0), (0, 1)))
((642369, 6876774), ((0, 1403185450.0), (0, 1)))
((82628194, 22251869), ((0, 1403185452.0), (0, 1)))
((2162276, 98056200), ((1403185451.0, 0), (1, 0)))
But running
lines.flatMap(myFunc) \
.map(lambda x: (x[0], x[1])) \
.reduceByKey(myFunc2)
Gives me the error
return (min(x1[0][0],x2[0][0]),max(x1[0][0],x2[0][0]),min(x1[0][1],x2[0][1]),max(x1[0][1],x2[0][1]),x1[1][0]+x2[1][0],x1[1][1]+x2[1][1])
TypeError: 'int' object has no attribute 'getitem'
I guess I am messing something up in my keys but I don't know why (I tried to recast the key to tuple as said here but same error)
Some idea ? Thanks a lot
Okay, I think the problem here is that you are indexing too deep in items that don't go as deep as you think.
Let's examine
myFunc2Given your question above, the input data will look like this:
((467401899, 485678002), ((1403185440.0, 0), (1, 0)))Let's go ahead and assign that data row equal to a variable.
x = ((467401899, 485678002), ((1403185440.0, 0), (1, 0)))What happens when we run
x[0]? We get(467401899, 485678002). When we runx[1]? We get((1403185440.0, 0), (1, 0)). That's what yourmapstatement is doing, I believe.Okay. That's clear.
In your function
myFunc2, you have two parameters,x1andx2. Those correspond to the variables above:x1 = x[0] = (467401899, 485678002)andx2 = x[1] = ((1403185440.0, 0), (1, 0))Now let's examine just the first part of your
returnstatement in your function.min(x1[0][0], x2[0][0])So,
x1 = (467401899, 485678002). Cool. Now, what'sx1[0]? Well, that's467401899. Obviously. But wait! What'sx1[0][0]? You're tryinig to get the zeroth index of the item atx1[0], but the item atx1[0]isn't alistor atuple, it's just anint. And objects of<type 'int'>don't have a method calledgetitem.To summarize: you're digging too deep into objects that are not nested that deeply. Think carefully about what you are passing into
myFunc2, and how deep your objects are.I think the first part of the return statement for
myFunc2should look like:return min(x1[0], x2[0][0]). You can index deeper onx2becausex2has more deeply nested tuples!When I run the following, it works just fine: