Getting longest link in a link list in python

Question

Getting longest link in a link list in python

I have a list of edges as follows:

edges_1 = ['a::b::c', 'a::b', 'a', 'a::d','a::d::e', 'a::d::e::f']
edges_2 = ['a', 'b', 'c']

I am trying to write a function that returns the longest links .. which for the above cases will return

['a::b::c','a::d::e::f']

and for the second case

['a', 'b', 'c']

My thoughts are using networkx and plotting and then going through the longest sequence. But I was wondering if there is another data structure or approach that I am missing to solve this case.

+3

python

Fraz May 25 '17 at 1:13

source to share

2 answers

I think this problem has to do with the concept of top set .

Here's a solution that will convert your structures to edgeset lists. Then it is found out if each of the sets has a different member of the list, which is the top set of itself.

def has_upset(down, set_list):
    return any(down.intersection(s) == down for s in set_list if s != down)

def filter_downsets(set_list):
    return filter(lambda d: not has_upset(d, set_list), set_list)

And here's the usage (including getting your structures in a set-form).

edges_1 = ['a::b::c', 'a::b', 'a', 'a::d','a::d::e', 'a::d::e::f']
edges_2 = ['a', 'b', 'c']

edge_sets_1 = [set(e.split('::')) for e in edges_1]
print filter_downsets(edge_sets_1)
# [set(['a', 'c', 'b']), set(['a', 'e', 'd', 'f'])]

edge_sets_2 = [set(e.split('::')) for e in edges_2]
print filter_downsets(edge_sets_2)
# [set(['a']), set(['b']), set(['c'])]

+2

Jared goguen May 25 '17 at 3:06

source to share

Nathaniel Saul · Accepted Answer · 2017-05-25T01:21:16+0000

Based on your comments in the comments, I changed the code.

This assumes the graphs are DAGs and the goal is to exclude all sequences that are subsequences of others on the list.

Instead of looking at the length of each sequence, we can simply discard any sequences that are subsets of other sequences. For this we can use the subset operator str

.

print(f"Initial list: {edges_1}")
keepers = []

for edge in edges_1:
    other_edges = edges_1[:]
    other_edges.remove(edge)
    print(f"Test {edge} against {other_edges}")
    for rest in other_edges:
        if edge in rest:
            # Discard any edges that are subsets an an edge in `other_edges`
            print(f"Found match {edge} -> {rest}")
            break
    else:
        # If the edge hasn't been discarded, it is a "longest edge"
        keepers.append(edge)
        print(f"No greater matches found for {edge}")

print(f"The longest edges: {keepers}")

This code is not the most efficient and cleanest, but highlights the basic mechanics of solving the problem.

Old answer: Solved without description from OP

If you can always assume that the edge between vertices is denoted by "::", you can use some simple string processing to get the length of each sequence.

This solution assumes that you always want the longest two sequences.

edges_1 = ['a::b::c', 'a::b', 'a', 'a::d','a::d::e', 'a::d::e::f']
edges_2 = ['a', 'b']

def print_longest_two_sequences(edges):
    links = [edge.split("::") for edge in edges]
    links.sort(key=len, reverse=True)
    links = ["::".join(edge) for edge in links]
    print(links[:2])

print_longest_two_sequences(edges_1)
print_longest_two_sequences(edges_2)

Getting longest link in a link list in python

More articles: