Hadoop and Cassandra to Compare 2 Rows

229 Views Asked by At

i have two Rows on a Cassandra ColumnFamily an want to Compare the Values of Columns with the same Columnname, eg:

CF: User

Key: Columns:
......................................................

K1: {Col1: "Andy" V1: "100"} {Col2: "Tom" V2: "100"}

K2: {Col1: "Andy" V1: "120"} {Col2: "Tom" V2: "90"}

Now i want to compare difference K2 Columns With K1 Columns to get this Result in Cassandra:

Key: Columns:
.........................................................................

K1: {Col1: "Andy" V1: "100"} {Col2: "Tom" V2: "100"}

K2: {Col1: "Andy" V1: "120" Diff: 20} {Col2: "Tom" V2: "90" Diff: -10}

At first i want to Code this with Hadoop but i see A Problem that i can#t define two Keys for a Map Process?

Haddop was the choice because it must be a scalable solution.

I hope anyone has an tipp for?

BG, Danny

1

There are 1 best solutions below

0
On BEST ANSWER

I dont understand by which row the base of substraction will be represented? K1[V1]-K2[V1] or vice versa?

Ok, lets say the row with recent timestamp will be a base.

You Map step should emit the following (K => V):

// each value is a WritableComparable object to allow sorting by timestamp

"Andy" => {"key":K1, "value":100, timestamp1} 
"Tom"  => {"key":K1, "value":100, timestamp2} 
"Andy" => {"key":K2, "value":120, timestamp3} 
"Tom"  => {"key":K2, "value":90,  timestamp4} 

Reduce step will receive array of pair, for each values are sorted by the timestamp:

"Andy" => [ {"key":K1, "value":100, timestamp1},
            {"key":K2, "value":120, timestamp3} ]

"Tom"  => [ {"key":K1, "value":100, timestamp2},
            {"key":K2, "value":90,  timestamp4} ]

Now in reduce step you can easly perform a substraction and write necessary columns like "diff" to database