Combine two array's data using inner join

6.6k Views Asked by At

I've two data sets in array:

arr1 = [
  ['2011-10-10', 1, 1],
  ['2007-08-09', 5, 3],
  ...
]

arr2 = [
  ['2011-10-10', 3, 4],
  ['2007-09-05', 1, 1],
  ...
]

I want to combine them into one array like this:

arr3 = [
  ['2011-10-10', 1, 1, 3, 4],
  ...
]

I mean, just combine those lines with the same date column.

Just for clarification, I don't need those lines which not appear in both array, just drop them.

6

There are 6 best solutions below

4
On BEST ANSWER

Organize your data differently (you can easily convert what you already have to two dicts):

d1 = { '2011-10-10': [1, 1],
       '2007-08-09': [5, 3]
     }
d2 = { '2011-10-10': [3, 4],
       '2007-09-05': [1, 1]
     }

Then:

d3 = { k : d1[k] + d2[k] for k in d1 if k in d2 }
0
On

Unless both are very large lists, I'd use a dictionary:

arr1 = [
  ['2011-10-10', 1, 1],
  ['2007-08-09', 5, 3]
]

arr2 = [
  ['2011-10-10', 3, 4],
  ['2007-09-05', 1, 1]
]

table_1 = dict((tup[0], tup[1:]) for tup in arr1)
table_2 = dict((tup[0], tup[1:]) for tup in arr2)
merged = {}
for key, value in table_1.items():
    other = table_2.get(key)
    if other:
        merged[key] = value + other

Otherwise, it would be more efficient to sort each, and then do a merge that way. But I imagine for most purposes this would be fast enough.

0
On

A single dictionary approach:

tmp = {}
# add as many as you like into the outermost array.
for outer in [arr1,arr2]:
    for inner in outer:
        start, rest = inner[0], inner[1:]
        # the list if key exists, else create a new list. Append to the result
        tmp[start] = tmp.get(start,[]) + rest

output = []

for k,v in tmp.iteritems():
   output.append([k] + v)

That would be the equivalent of a full outer join (returns data from both sides even if one side is null). If you wanted an inner join, you might change it to this:

tmp = {}
keys_with_dupes = []

for outer in [arr1,arr2]:
    for inner in outer:
        start, rest = inner[0], inner[1:]
        original = tmp.get(start,[])
        tmp[start] = original + rest
        if original:
            keys_with_dupes.append(start)

output = []

for k in keys_with_dupes:
   v = tmp[k]
   output.append([k] + v)
1
On

You can convert the arrays to a dict, and back again.

d1 = dict((x[0],x[1:]) for x in arr1)
d2 = dict((x[0],x[1:]) for x in arr2)
keys = set(d1).union(d2)
n = []
result = dict((k, d1.get(k, n) + d2.get(k, n)) for k in keys)
1
On

Generator function approach, skipping corresponding elements whose dates don't match:

import itertools
def gen(a1, a2):
    for x,y in itertools.izip(a1, a2):
        if x[0] == y[0]:
            ret = list(x)
            ret.extend(y[1:])
            yield ret
        else:
            continue

>>print list(gen(arr1, arr2))
[['2011-10-10', 1, 1, 3, 4]]

But yeah, if possible, organize your data differently.

0
On

It may be worth mentioning set data types. as their methods align to the type of problem. The set operators allow you to join sets easily and flexibly with full, inner, outer, left, right joins. As with dictionaries, sets do not retain order, but if you cast a set back into a list, you may then apply an order on the result join. Alternatively, you could use an ordered dictionary.

set1 = set(x[0] for x in arr1)
set2 = set(x[0] for x in arr2)
resultset = (set1 & set2)

This only gets you the union of dates in the original lists, in order to reconstruct arr3 you would need to append the [1:] data in arr1 and arr2 where the dates are in the result set. This reconstruction would not be as neat as using the dictionary solutions above, but using sets is worthy of consideration for similar problems.