Sorting problems when using a list

I have a .txt file containing a list of IP addresses: 


There's a lot more to it than that :)

Anyway, imported this into a list using Python, and I'm trying to sort it, but I'm having problems. Does anyone have any idea?

EDIT: Ok, since it was vague, this is what I had so fair.

f = open("/Users/jch5324/Python/Proxy/resources/data/list-proxy.txt", 'r+')
lines = [x.split() for x in f]
new_file = (sorted(lines, key=lambda x:x[:18]))



source to share

3 answers

You are probably sorting them versus ascii string comparisons ('.' <'5', etc.) when you would prefer them sorted numerically. Try converting them to ints tuples and then sorting:

def ipPortToTuple(string):
        '' -> (12,34,5,678,910)
    ip,port = string.strip().split(':')
    return tuple(int(i) for i in ip.split('.')) + (port,)

with open('myfile.txt') as f:
    nonemptyLines = (line for line in f if line.strip()!='')
    sorted(nonemptyLines, key=ipPortToTuple)


edit: The ValueError you are getting is because your text files are not entirely in # format. #. #. #: # as you mean. (There may be comments or blank lines, although in this case the error hints that there is a line with multiple ":".) You can use debugging methods at home in your problem by catching the exception and emitting useful debug data:

def tryParseLines(lines):
    for line in lines:
            yield ipPortToTuple(line.strip())
        except Exception:
            if __debug__:
                print('line {} did not match #.#.#.#:# format'.format(repr(line)))

with open('myfile.txt') as f:


In the example above, I was a little sloppy as it still allows some invalid IPs (e.g. #. #. #. #. # Or 257.-1. #. #). Below is a more verbose solution to do things like comparing IP addresses with operators <

, as well as the natural job of sorting:


import functools
import re

class Ipv4Port(object):
    regex = re.compile(r'(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3}):(\d{1,5})')

    def __init__(self, ipv4:(int,int,int,int), port:int):
            assert type(ipv4)==tuple and len(ipv4)==4, 'ipv4 not 4-length tuple'
            assert all(0<=x<256 for x in ipv4), 'ipv4 numbers not in valid range (0<=n<256)'
            assert type(port)==int, 'port must be integer'
        except AssertionError as ex:
            print('Invalid IPv4 input: ipv4={}, port={}'.format(repr(ipv4),repr(port)))
            raise ex

        self.ipv4 = ipv4
        self.port = port

        self._tuple = ipv4+(port,)

    def fromString(cls, string:''):
            a,b,c,d,port = cls.regex.match(string.strip()).groups()
            ip = tuple(int(x) for x in (a,b,c,d))
            return cls(ip, int(port))
        except Exception as ex:
            args = list(ex.args) if ex.args else ['']
            args[0] += "\n...indicating ipv4 string {} doesn't match #.#.#.#:# format\n\n".format(repr(string))
            ex.args = tuple(args)
            raise ex

    def __lt__(self, other):
        return self._tuple < other._tuple
    def __eq__(self, other):
        return self._tuple == other._tuple

    def __repr__(self):
        #return 'Ipv4Port(ipv4={ipv4}, port={port})'.format(**self.__dict__)
        return "Ipv4Port.fromString('{}.{}.{}.{}:{}')".format(*self._tuple)


and then:

def tryParseLines(lines):
    for line in lines:
        line = line.strip()
        if line != '':
                yield Ipv4Port.fromString(line)
            except AssertionError as ex:
                raise ex
            except Exception as ex:
                if __debug__:
                raise ex



>>> lines = ' \n222.1.1.1:234\n'.splitlines()
>>> sorted(tryParseLines(lines))
[Ipv4Port.fromString(''), Ipv4Port.fromString(''), Ipv4Port.fromString('')]


Changing the values, for example, 264...

or ...-35...

will result in corresponding errors.



@ Ninjagecko's solution is the best, but here's another way to do it with re:

>>> import re
>>> with open('ips.txt') as f:
        print sorted(f, key=lambda line: map(int, re.split(r'\.|:', line.strip())))

['\n', '\n', '\n',
'\n', ' \n']




You can preprocess the list so that it can be sorted using the built-in compare function. and then process it in a more normal format.


will be the same length and can be sorted. Subsequently, we will simply remove all spaces.

you can google and find other examples of this.

for i in range(len(address)):
    address[i] = "%3s.%3s.%3s.%3s" % tuple(ips[i].split("."))
for i in range(len(address)):
    address[i] = address[i].replace(" ", "")


If you have a ton of ip address you will get better processing times if using C ++. it will be more work, but you will get better processing time.



All Articles