How to effectively leave outer join with two sorted lists
I have two lists already sorted. I need the outside to join them. The following code does the job:
left_sorted_list = [1, 2, 3, 4, 5]
right_sorted_list = [[2, 21], [4, 45], [6, 67]]
right_dict = {r[0]: r[1] for r in right_sorted_list}
left_outer_join = [[l, right_dict[l] if l in right_dict.keys() else None]
for l in left_sorted_list]
print(left_outer_join)
[[1, None], [2, 21], [3, None], [4, 45], [5, None]]
However, I'm not sure if this approach is very efficient. Is there a better way to take advantage of the fact that the correct list is already sorted without writing loops?
Edit: The keys I'm connecting to are unique in both the left and right list.
source to share
This answer directly depends on two comments that Mamilson made in the OP's question.
It's not more efficient than what you have, but it's more pythonic.
left_sorted_list = [1, 2, 3, 4, 5]
right_sorted_list = [[2, 21], [4, 45]]
right_dict = dict(right_sorted_list)
left_outer_join = [[l, right_dict.get(l)] for l in left_sorted_list]
As for the time complexity, it left_sorted_list
also right_sorted_list
repeats once each time, so they are both O (N). For dictionary searches, the average search is O (1) , so finding all keys is also O (N). Your time complexity won't be much better than what you already have.
source to share
Here's a performance-efficient version that generates a key / value pair at the same time:
def left_outer_join(keys, pairs, default=None):
right = iter(pairs)
right_key = float('-inf') # sentinel: any left key must be larger than it
for left_key in keys:
if left_key == right_key: # *keys* and *right* are in sync
value = right_value # from previous iteration
elif left_key < right_key: # *keys* is behind *right*
value = default
else: # left_key > right_key: *keys* is ahead of *right*
for right_key, right_value in right: # catch up with *keys*
if left_key <= right_key: # drop while left_key > right_key
break
value = right_value if left_key == right_key else default
yield left_key, value
This is a single pass algorithm O(n+m)
.
Example:
left_sorted_list = [1, 2, 3, 4, 5]
right_sorted_list = [[2, 21], [4, 45], [6, 67]]
print(list(left_outer_join(left_sorted_list, right_sorted_list)))
# -> [(1, None), (2, 21), (3, None), (4, 45), (5, None)]
keys
and pairs
can be unlimited sorted iterators (eg function generated heapq.merge()
) of keys and key / value pairs, respectively.
source to share
I ended up using tuples as a result, so fewer square brackets;)
left_sorted_list = [1, 2, 3, 4, 5]
right_sorted_list = [[2, 21], [4, 45]]
d = dict(right_sorted_list) # if you have a list of pairs, just pass it to dict()
print [(x, d[x] if x in d else None) for x in left_sorted_list]
## -- End pasted text --
[(1, None), (2, 21), (3, None), (4, 45), (5, None)]
source to share