How to effectively leave outer join with two sorted lists

I have two lists already sorted. I need the outside to join them. The following code does the job:

left_sorted_list = [1, 2, 3, 4, 5]
right_sorted_list = [[2, 21], [4, 45], [6, 67]]
right_dict = {r[0]: r[1] for r in right_sorted_list}
left_outer_join = [[l, right_dict[l] if l in right_dict.keys() else None]
                   for l in left_sorted_list]
print(left_outer_join)
[[1, None], [2, 21], [3, None], [4, 45], [5, None]]

      

However, I'm not sure if this approach is very efficient. Is there a better way to take advantage of the fact that the correct list is already sorted without writing loops?

Edit: The keys I'm connecting to are unique in both the left and right list.

+3


source to share


3 answers


This answer directly depends on two comments that Mamilson made in the OP's question.

It's not more efficient than what you have, but it's more pythonic.



left_sorted_list = [1, 2, 3, 4, 5]
right_sorted_list = [[2, 21], [4, 45]]

right_dict = dict(right_sorted_list)
left_outer_join = [[l, right_dict.get(l)] for l in left_sorted_list] 

      

As for the time complexity, it left_sorted_list

also right_sorted_list

repeats once each time, so they are both O (N). For dictionary searches, the average search is O (1) , so finding all keys is also O (N). Your time complexity won't be much better than what you already have.

+3


source


Here's a performance-efficient version that generates a key / value pair at the same time:

def left_outer_join(keys, pairs, default=None):
    right = iter(pairs)
    right_key = float('-inf') # sentinel: any left key must be larger than it
    for left_key in keys:
        if left_key == right_key: # *keys* and *right* are in sync
            value = right_value  # from previous iteration
        elif left_key < right_key: # *keys* is behind *right*
            value = default
        else: # left_key > right_key: *keys* is ahead of *right*
            for right_key, right_value in right: # catch up with *keys*
                if left_key <= right_key: # drop while left_key > right_key
                    break
            value = right_value if left_key == right_key else default
        yield left_key, value

      

This is a single pass algorithm O(n+m)

.



Example:

left_sorted_list = [1, 2, 3, 4, 5]
right_sorted_list = [[2, 21], [4, 45], [6, 67]]
print(list(left_outer_join(left_sorted_list, right_sorted_list)))
# -> [(1, None), (2, 21), (3, None), (4, 45), (5, None)]

      

keys

and pairs

can be unlimited sorted iterators (eg function generated heapq.merge()

) of keys and key / value pairs, respectively.

+4


source


I ended up using tuples as a result, so fewer square brackets;)

left_sorted_list = [1, 2, 3, 4, 5]
right_sorted_list = [[2, 21], [4, 45]]

d  = dict(right_sorted_list) # if you have a list of pairs, just pass it to dict()
print [(x, d[x] if x in d else None) for x in left_sorted_list]

## -- End pasted text --
[(1, None), (2, 21), (3, None), (4, 45), (5, None)]

      

+1


source







All Articles