Combine two array's data using inner join

6.6k Views Asked by At

I've two data sets in array:

arr1 = [
  ['2011-10-10', 1, 1],
  ['2007-08-09', 5, 3],
  ...
]

arr2 = [
  ['2011-10-10', 3, 4],
  ['2007-09-05', 1, 1],
  ...
]

I want to combine them into one array like this:

arr3 = [
  ['2011-10-10', 1, 1, 3, 4],
  ...
]

I mean, just combine those lines with the same date column.

Just for clarification, I don't need those lines which not appear in both array, just drop them.

6

There are 6 best solutions below

4
jason On BEST ANSWER

Organize your data differently (you can easily convert what you already have to two dicts):

d1 = { '2011-10-10': [1, 1],
       '2007-08-09': [5, 3]
     }
d2 = { '2011-10-10': [3, 4],
       '2007-09-05': [1, 1]
     }

Then:

d3 = { k : d1[k] + d2[k] for k in d1 if k in d2 }
0
Dan Lecocq On

Unless both are very large lists, I'd use a dictionary:

arr1 = [
  ['2011-10-10', 1, 1],
  ['2007-08-09', 5, 3]
]

arr2 = [
  ['2011-10-10', 3, 4],
  ['2007-09-05', 1, 1]
]

table_1 = dict((tup[0], tup[1:]) for tup in arr1)
table_2 = dict((tup[0], tup[1:]) for tup in arr2)
merged = {}
for key, value in table_1.items():
    other = table_2.get(key)
    if other:
        merged[key] = value + other

Otherwise, it would be more efficient to sort each, and then do a merge that way. But I imagine for most purposes this would be fast enough.

0
cwallenpoole On

A single dictionary approach:

tmp = {}
# add as many as you like into the outermost array.
for outer in [arr1,arr2]:
    for inner in outer:
        start, rest = inner[0], inner[1:]
        # the list if key exists, else create a new list. Append to the result
        tmp[start] = tmp.get(start,[]) + rest

output = []

for k,v in tmp.iteritems():
   output.append([k] + v)

That would be the equivalent of a full outer join (returns data from both sides even if one side is null). If you wanted an inner join, you might change it to this:

tmp = {}
keys_with_dupes = []

for outer in [arr1,arr2]:
    for inner in outer:
        start, rest = inner[0], inner[1:]
        original = tmp.get(start,[])
        tmp[start] = original + rest
        if original:
            keys_with_dupes.append(start)

output = []

for k in keys_with_dupes:
   v = tmp[k]
   output.append([k] + v)
1
jh314 On

You can convert the arrays to a dict, and back again.

d1 = dict((x[0],x[1:]) for x in arr1)
d2 = dict((x[0],x[1:]) for x in arr2)
keys = set(d1).union(d2)
n = []
result = dict((k, d1.get(k, n) + d2.get(k, n)) for k in keys)
1
Rao On

Generator function approach, skipping corresponding elements whose dates don't match:

import itertools
def gen(a1, a2):
    for x,y in itertools.izip(a1, a2):
        if x[0] == y[0]:
            ret = list(x)
            ret.extend(y[1:])
            yield ret
        else:
            continue

>>print list(gen(arr1, arr2))
[['2011-10-10', 1, 1, 3, 4]]

But yeah, if possible, organize your data differently.

0
user3212761 On

It may be worth mentioning set data types. as their methods align to the type of problem. The set operators allow you to join sets easily and flexibly with full, inner, outer, left, right joins. As with dictionaries, sets do not retain order, but if you cast a set back into a list, you may then apply an order on the result join. Alternatively, you could use an ordered dictionary.

set1 = set(x[0] for x in arr1)
set2 = set(x[0] for x in arr2)
resultset = (set1 & set2)

This only gets you the union of dates in the original lists, in order to reconstruct arr3 you would need to append the [1:] data in arr1 and arr2 where the dates are in the result set. This reconstruction would not be as neat as using the dictionary solutions above, but using sets is worthy of consideration for similar problems.