Creating lists of anagrams from a list of words

Question

Creating lists of anagrams from a list of words

I want to find a list of lists of anagrams from a list of words. Should I use a different loop in my code or recursion?

some_list = ['bad', 'app', 'sad', 'mad', 'dab','pge', 'bda', 'ppa', 'das', 'dba']

new_list = [some_list[0]]
i = 0
while i+1 < len(some_list):
    if (''.join(sorted(some_list[0]))) == (''.join(sorted(some_list[i+1]))):
        new_list.append(some_list[i+1])
        i = i+1
    else:
        i = i+1

print(new_list)

My conclusion ['bad', 'dab', 'bda', 'dba']

. But I also want more lists of other anagrams from some_list

.

I want the result to be: - ['app', 'ppa']

- ['bad', 'dab', 'bda', 'dba']

-['sad', 'das']

+3

python list anagram

pk. May 14 '15 at 19:58

source to share

6 answers

Stefan Pochmann · Answer 1 · 2015-05-14T20:23:31+0000

I recommend that you write Python, not Java or any other language that you imitate. Here's your main Python code, with a regular loop and no extra stuff:

new_list = [some_list[0]]
for word in some_list[1:]:
    if sorted(some_list[0]) == sorted(word):
        new_list.append(word)

I don't see any use for recursion, but yes, you could wrap an outer loop around this to find other groups of anagrams.

Although I would do it using the helpful itertools.groupby :

for _, group in groupby(sorted(some_list, key=sorted), sorted):
    group = list(group)
    if len(group) > 1:
        print(group)

What prints:

['bad', 'dab', 'bda', 'dba']
['sad', 'das']
['app', 'ppa']

Alternative solution for modified question when sorting groups:

groups = (list(group) for _, group in groupby(sorted(some_list, key=sorted), sorted))
print([group for group in sorted(groups) if len(group) > 1])

Output:

[['app', 'ppa'], ['bad', 'dab', 'bda', 'dba'], ['sad', 'das']]

Kasramvd · Answer 2 · 2015-05-14T20:21:59+0000

Your problem is that you are looping once over your list as you need to loop over all the words.

But I suggest another way for this task, you can use the itertools.groupby

sorted function as well with operator.itemgetter

:

some_list = ['bad', 'app', 'sad', 'mad', 'dab','pge', 'bda', 'ppa', 'das', 'dba']

from operator import itemgetter
from itertools import groupby 
s=sorted([(i,''.join(sorted(j))) for i,j in enumerate(some_list)],key=itemgetter(1))
inds= [zip(*g)[0] for _,g in groupby(s,itemgetter(1))]
print [itemgetter(*i)(some_list) for i in inds]

Result:

[('bad', 'dab', 'bda', 'dba'), 'mad', ('sad', 'das'), ('app', 'ppa'), 'pge']

All I have done here is create a list of sorted words with these indices using sorted

and enumerate

:

sorted([(i,''.join(sorted(j))) for i,j in enumerate(some_list)],key=itemgetter(1))
[(0, 'abd'), (4, 'abd'), (6, 'abd'), (9, 'abd'), (3, 'adm'), (2, 'ads'), (8, 'ads'), (1, 'app'), (7, 'app'), (5, 'egp')]

then we need to group these pairs based on the second element and get the first element (indices), so we will have the following list of tuples:

[(0, 4, 6, 9), (3,), (2, 8), (1, 7), (5,)]

that each tuple contains word indices, that these sorted views are the same.

and finally, all you need is to collect the elements of the main list based on the previous indices:

[itemgetter(*i)(some_list) for i in inds]
[('bad', 'dab', 'bda', 'dba'), 'mad', ('sad', 'das'), ('app', 'ppa'), 'pge']

Eugene Sh. · Answer 3 · 2015-05-14T20:01:31+0000

1) Create a function anagrams(word)

that returns a list of anagrams for one word like your code.
2) a map

function above your wordlist.

dermen · Answer 4 · 2015-05-14T20:08:51+0000

Here's the solution:

from itertools import groupby
some_list = ['bad', 'app', 'sad', 'mad', 'dab','pge', 'bda', 'ppa', 'das', 'dba']
some_list_ordered = map( lambda x : "".join( sorted( x) ), some_list )
some_lists = sorted(zip( some_list_ordered, some_list ) )
anagrams = filter( lambda x : len( x ) > 1, [ zip(*v)[1]  for k,v in groupby( some_lists, lambda x : x[0] ) ] )    

for a in anagrams:
    print a

#('bad', 'bda', 'dab', 'dba')
#('das', 'sad')
#('app', 'ppa')

xnx · Answer 5 · 2015-05-14T20:30:15+0000

A natural way to do it, if you can afford the extra Slovak memory overhead, it seems to me:

words = ['bad', 'app', 'sad', 'mad', 'dab','pge', 'bda', 'ppa', 'das', 'dba']

anagrams = {}
for word in words:
    sword = ''.join(sorted(word))
    try:
        anagrams[sword].append(word)
    except KeyError:
        anagrams[sword] = [word]

anagrams_list = [v for v in anagrams.values() if len(v) > 1]
print anagrams_list

Output:

[['app', 'ppa'], ['bad', 'dab', 'bda', 'dba'], ['sad', 'das']]

EDIT: As mentioned in the comment below, you can replace the block try...except

with a method dict

setdefault

if the syntax doesn't bother you:

words = ['bad', 'app', 'sad', 'mad', 'dab','pge', 'bda', 'ppa', 'das', 'dba']

anagrams = {}
for word in words:
    sword = ''.join(sorted(word))
    anagrams.setdefault(sword, []).append(word)

anagrams_list = [v for v in anagrams.values() if len(v) > 1]
print anagrams_list

Padraic cunningham · Answer 6 · 2015-05-14T20:47:22+0000

You can group words in a dict using the sorted word as a key, filtering out words whose values do not have at least two elements, using OrderedDict to preserve order:

some_list = ['bad', 'app', 'sad', 'mad', 'dab','pge', 'bda', 'ppa', 'das', 'dba']


from collections import OrderedDict

od = OrderedDict()
for ele in some_list:
    srt = "".join(sorted(ele))
    od.setdefault(srt,[]).append(ele)

print(filter(lambda x: len(x) > 1, od.values()))


[['bad', 'dab', 'bda', 'dba'], ['app', 'ppa'], ['sad', 'das']]

Or using a loop and adding to the list, using the temp list to collect common words:

new_list = []
from collections import OrderedDict
for ele in OrderedDict.fromkeys("".join(sorted(ele)) for ele in some_list):
    temp = []
    for s in some_list:
        if ele == ''.join(sorted(s)):
            temp.append(s)
    if len(temp) > 1:
        new_list.append(temp)

If the order doesn't matter the defaultdict will be more efficient:

from collections import defaultdict

d = defaultdict(list)
for ele in some_list:
    d[''.join(sorted(ele))].append(ele)

print(filter(lambda x: len(x) > 1, d.values()))
[['app', 'ppa'], ['bad', 'dab', 'bda', 'dba'], ['sad', 'das']]

Creating lists of anagrams from a list of words

More articles: