Splitting a string in python after a number
I am very new to python and mostly new to programming. I am trying to parse certain .txt files into excel and have had success with several of them that were easily split into lines that I could encode.
However, I now have a bunch of files that contain my information, but without sensible line breaks. My data looks like this:
company1 name _______ 123 company2 name 456 company3 name
789
without any good indicators between names and numbers - sometimes underscores, sometimes only spaces, sometimes there is a line break between them. If I could split the whole thing into lines that ended after each complete number, then the code I already wrote would do the rest. Ideally, I would have a line that looks like this:
company1 name ______ 123
company2 name 456
company3 name 789
with line breaks in the original line.
I hope someone can help!
source to share
You should probably use a regex for this, which looks for patterns in the text and allows you to modify that pattern with a newline.
For example:
import re
line = 'company1 name _______ 123 company2 name 456 company3 name 789'
output = re.sub(r'(\s\d+\s*)', r'\1\n', line)
print output
which returns
company1 name _______ 123
company2 name 456
company3 name 789
source to share
Try using split, then checking the type of each element to see if it's a number:
new_string = ''
data_string = data_string.replace('\n','')
data_array = data_string.split(' ')
for portion in data_array:
if type(portion) in [int, float]:
new_string = new_string + portion + '\n'
else:
new_string = new_string + portion + ' '
source to share