Python str.split () incompatible?
>>> ".a string".split('.')
['', 'a string']
>>> "a .string".split('.')
['a ', 'string']
>>> "a string.".split('.')
['a string', '']
>>> "a ... string".split('.')
['a ', '', '', ' string']
>>> "a ..string".split('.')
['a ', '', 'string']
>>> 'this is a test'.split(' ')
['this', '', 'is', 'a', 'test']
>>> 'this is a test'.split()
['this', 'is', 'a', 'test']
Why split()
is it different from split(' ')
when the called string has spaces as spaces?
Why split('.')
is it divided "..."
into ['','']
? split()
does not consider an empty word between two delimiters ...
The docs are clear about this (see @agf below), but I would like to know why this is the selected behavior.
I looked in the source code ( here ) and thought line 136 should be less: ... i < str_len
..
source to share
See the str.split
docs , this is a special mention:
If specified
sep
, consecutive delimiters are not grouped together and are treated as delimited empty strings (for example,'1,,2'.split(',')
returns['1', '', '2']
). Sep can be multiple characters (for example,'1<>2<>3'.split('<>')
returns['1', '2', '3']
). Splitting an empty string with the specified delimiter returns['']
.If
sep
not specified, orNone
, then the other splitting algorithm applied: runs of consecutive spaces are treated as single separator, and the result will not contain blank lines at the beginning or end, if the string has leading or trailing spaces . As a consequence, splitting an empty string or a string consisting of simple whitespace characters with a delimiterNone
returns[]
.
Python tries to do what you expect. Most people who don't think too much are probably expecting
'1 2 3 4 '.split()
to return
['1', '2', '3', '4']
Consider splitting the data, which used spaces instead of tables to create fixed-width columns - if the data is of different widths, each row will have a different number of spaces.
There is often a space at the end of the line that you don't see, and the default ignores it - it gives you the answer you would expect.
When it comes to the algorithm used when specifying a delimiter, think about a line in a CSV file:
1,,3
means there is data in 1st and 3rd columns, and there is no data in 2nd, so you need
'1,,3'.split(',')
to return
['1', '', '3']
otherwise, you won't be able to determine which column the row came from.
source to share