Splitting a string in python after a number

I am very new to python and mostly new to programming. I am trying to parse certain .txt files into excel and have had success with several of them that were easily split into lines that I could encode.

However, I now have a bunch of files that contain my information, but without sensible line breaks. My data looks like this:

company1 name _______ 123   company2 name 456 company3 name 
789

      

without any good indicators between names and numbers - sometimes underscores, sometimes only spaces, sometimes there is a line break between them. If I could split the whole thing into lines that ended after each complete number, then the code I already wrote would do the rest. Ideally, I would have a line that looks like this:

company1 name ______ 123
company2 name 456
company3 name 789

      

with line breaks in the original line.

I hope someone can help!

+3


source to share


3 answers


You should probably use a regex for this, which looks for patterns in the text and allows you to modify that pattern with a newline.

For example:

import re
line = 'company1 name _______ 123   company2 name 456 company3 name 789'
output = re.sub(r'(\s\d+\s*)', r'\1\n', line)
print output

      



which returns

company1 name _______ 123   
company2 name 456 
company3 name 789

      

+3


source


Try using split, then checking the type of each element to see if it's a number:



new_string = ''
data_string = data_string.replace('\n','')
data_array = data_string.split(' ')
for portion in data_array:
    if type(portion) in [int, float]:
        new_string = new_string + portion + '\n'
    else:
        new_string = new_string + portion + ' '

      

0


source


import re
p = re.compile(r'(\b\d+)\s+')
test_str = "company1 name _______ 123   company2 name 456 company3 name 789"
subst = "\1\n"

result = re.sub(p, subst, test_str)

      

You can do this using re.sub

.

0


source







All Articles