Remove list items that are not the same length as most of the entries

I know how to remove an item in a list if it is not of a certain size, for example:

x = [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2],[1,2,3],[1,2,3],[1,2,3,4]]
y = [s for s in x if len(s) == len(x[0])]

      

Where x

is the original list and y

is the new list. As you can see in the first one, there is one entry that is not as long as the others and the other is longer than the others.

I want to remove an item every time it is not the same length as most of the items in the list. The approach shown works as long as the first item in the list is the same length as most of the items.

So the question is, how do you get the most total length of the elements? No loop repeating in length. The average will not work as the average will not represent most of the length, but the average length of the elements (e.g. lengths 3,3,3,30 would give an average of ~ 10, and the majority number of lengths is 3.)

+3


source to share


2 answers


You can use an object collections.Counter

to keep track of the number of all lengths, and then filter using the length most_common

:

from collections import Counter

x = [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2],[1,2,3],[1,2,3],[1,2,3,4]]
lens = Counter(len(i) for i in x)
y = [s for s in x if len(s) == lens.most_common(1)[0][0]]
print y
# [[1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]]

      



Please note that if there is a link, one of the lengths is chosen at random.

+6


source


The most common value is called "mode" (statistically speaking), so to get the modal value just use statistics.mode

(but it requires python 3.4+):

>>> from statistics import mode
>>> l = [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2],[1,2,3],[1,2,3],[1,2,3,4]]
>>> most_common_length = mode([len(sublist) for sublist in l])
>>> most_common_length
3
>>> [sublist for sublist in l if len(sublist) == most_common_length]
[[1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]]

      



In case statistics.mode

too slow (or you are using older Python), there is also an implementation in:

>>> from scipy.stats import mode
>>> most_common_length = mode([len(sublist) for sublist in l]).mode[0]  

      

+2


source







All Articles