Python str.split () incompatible?

Question

Python str.split () incompatible?

>>> ".a string".split('.')
['', 'a string']

>>> "a .string".split('.')
['a ', 'string']

>>> "a string.".split('.')
['a string', '']

>>> "a ... string".split('.')
['a ', '', '', ' string']

>>> "a ..string".split('.')
['a ', '', 'string']

>>> 'this  is a test'.split(' ')
['this', '', 'is', 'a', 'test']

>>> 'this  is a test'.split()
['this', 'is', 'a', 'test']

Why split()

is it different from split(' ')

when the called string has spaces as spaces?

Why split('.')

is it divided "..."

into ['','']

? split()

does not consider an empty word between two delimiters ...

The docs are clear about this (see @agf below), but I would like to know why this is the selected behavior.

I looked in the source code ( here ) and thought line 136 should be less: ... i < str_len

..

+3

python split whitespace

ijverig 22 Mar 12 at 5:43

source to share

1 answer

agf · Accepted Answer · 2012-03-22T05:48:21+0000

See the str.split

docs , this is a special mention:

If specified sep

, consecutive delimiters are not grouped together and are treated as delimited empty strings (for example, '1,,2'.split(',')

returns ['1', '', '2']

). Sep can be multiple characters (for example, '1<>2<>3'.split('<>')

returns ['1', '2', '3']

). Splitting an empty string with the specified delimiter returns ['']

.

If sep

not specified, or None

, then the other splitting algorithm applied: runs of consecutive spaces are treated as single separator, and the result will not contain blank lines at the beginning or end, if the string has leading or trailing spaces . As a consequence, splitting an empty string or a string consisting of simple whitespace characters with a delimiter None

returns []

.

Python tries to do what you expect. Most people who don't think too much are probably expecting

'1 2 3 4 '.split()

to return

['1', '2', '3', '4']

Consider splitting the data, which used spaces instead of tables to create fixed-width columns - if the data is of different widths, each row will have a different number of spaces.

There is often a space at the end of the line that you don't see, and the default ignores it - it gives you the answer you would expect.

When it comes to the algorithm used when specifying a delimiter, think about a line in a CSV file:

1,,3

means there is data in 1st and 3rd columns, and there is no data in 2nd, so you need

'1,,3'.split(',')

to return

['1', '', '3']

otherwise, you won't be able to determine which column the row came from.

Python str.split () incompatible?

More articles: