`re.split ()` works strange in Python
Has a bit of a fix in python. I would like to take a .txt file with many comments and split it into a list. However, I would like to separate all punctuation marks, spaces and \ n. When I run the following python code, it breaks my text file at strange points. NOTE. Below I try to split into periods and ends to test this. But he still often gets rid of the last letter with words.
import regex as re
with open('G:/My Documents/AHRQUnstructuredComments2.txt','r') as infile:
nf = infile.read()
wList = re.split('. | \n, nf)
print(wList)
+3
source to share
3 answers
You forgot to close the line and you need \ before.
import regex as re
with open('G:/My Documents/AHRQUnstructuredComments2.txt','r') as infile:
nf = infile.read()
wList = re.split('\. |\n |\s', nf)
print(wList)
For details see Split Multiple Delimited Lines? ...
Also, RichieHindle answers your question perfectly:
import re
DATA = "Hey, you - what are you doing here!?"
print re.findall(r"[\w']+", DATA)
# Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']
+2
source to share