Remove empty entries and add them as a space in the previous list item

I am currently working on a project that requires splitting sentences in order to compare two words (a given word that the user has to enter while giving us the second) to each other and to check the accuracy of the user input. I used x.split(" ")

to do this, however this is causing me a problem.

Let's say the given sentence was The quick brown fox

, and the user enters into The quick brown fox

. Instead of returning, ['The','quick ', 'brown', 'fox']

it returns ['The', 'quick', '', 'brown', fox']

. This makes it difficult to check for precision, as I would like it to be word checked.

In other words, I would like to add extra spaces to the word that came before, but the function split

creates separate (empty) elements instead. How can I remove any empty entries and add them to the word that came before?

I would like this to work for lists where there are multiple entries ''

per line, for example ['The', 'quick', '', '', 'brown', fox']

.

Thank!

EDIT - The code I'm using to test this is just some variation x = The quick brown fox".split(' ')

with different spaces.

EDIT 2 - I didn't think about it (thanks to Malonge), but if the sentence starts with a space, I'd really like it to be accounted for. I don't know how easy it would be, since I needed to make this particular instance an exception where a space should be added to the next word, not the one that precedes it. However, I will make an informed choice to ignore this scenario when calculating accuracy due to the complexity of its implementation.

+3


source to share


2 answers


You can use a regex for this, this will match any whitespace that comes after the first space:

>>> import re
>>> s = "The quick  brown fox"
>>> re.findall(r'\S+\s*(?=\s\S|$)', s)
['The', 'quick ', 'brown', 'fox']

      




Debuggex Demo :

\S+\s*(?=\s\S|$)

      

Regular expression visualization




Update:

Some modification of the above expression is required to match leading spaces at the beginning of a line:

>>> s = "The quick  brown fox"
>>> re.findall(r'((?:(?<=^\s)\s*)?\S+\s*(?=\s\S|$))', s)
['The', 'quick ', 'brown', 'fox']
>>> s1 = "  The quick  brown fox"
>>> re.findall(r'((?:(?<=^\s)\s*)?\S+\s*(?=\s\S|$))', s1)
[' The', 'quick ', 'brown', 'fox']

      




Debuggex Demo :

((?:(?<=^\s)\s*)?\S+\s*(?=\s\S|$))

      

Regular expression visualization

+3


source


You can find several ways, but perhaps the simplest one you have demonstrated is to just split without specifying the split parameter, which makes it split by spaces, not just one space:

>>> s = "The quick  brown fox"
>>>
>>> s.split(' ')
['The', 'quick', '', 'brown', 'fox']
>>> s.split()
['The', 'quick', 'brown', 'fox']

      

You can also get there:



>>> words = [w for w in s.split(" ") if w]
>>> words
['The', 'quick', 'brown', 'fox']

      

Or using regex:

>>> import re
>>>
>>> re.split('\s*', s)
['The', 'quick', 'brown', 'fox']

      

+1


source







All Articles