How to split a string and return its delimiter in python?

I have a line that looks like this:

string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"

      

I want to split alphabetically (i.e. AZ or az) and put the associated value into a dictionary of lists. Each set of numbers is associated with alphabets. eg,

'M' is associated with 47482, 14, 7I7, etc.

"I" is associated with 4, 1, etc.

'H' is associated with 236792

My final data structure will be like

    dict = { 
      M:[47482, 14, 717],
      I:[4, 1],
      H:[236792]

    }

      

My attempt:

import re
string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"
tmp = re.split('[a-zA-Z]', string1)
print(tmp)

      

I cannot get these alphabets as delimiters. Need help creating data structure.

+3


source to share


5 answers


You are on the right track, but you have to use a slightly different regex and use re.findall

. Like this:

In [1]: string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"

In [2]: import re, collections

In [3]: p = re.compile("([0-9]+)([A-Za-z])")

In [4]: dct = collections.defaultdict(list)

In [5]: for number, letter in p.findall(string1):
    ...:     dct[letter].append(number)
    ...:      

In [6]: dct
Out[6]: 
defaultdict(list,
            {'D': ['8', '1', '17', '5', '7', '1', '5', '6', '3'],
             'H': ['236792'],
             'I': ['4', '7', '1', '4', '2', '7', '7', '22', '3', '3', '2', '4', '11', '3', '3', '15'],
             'M': ['47482', '14', '7', '26', '25', '20', '11', '17', '7', '14', '35', '30', '15', '16', '4', '15', '37', '24', '5', '27', '35', '10', '5', '24', '175', '13']})

      



This identifies all pairs of numbers followed by a letter in the string and puts all those pairs in a dictionary with the letter as the key, duplicate numbers are allowed.

+6


source


Another solution, without user regex:

import string
string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"

result = dict()
tempValue = ''
for char in string1:

    if char not in string.ascii_letters:
        tempValue += char

    else:

        if char not in result:
            result[char] = []

        result[char].append(int(tempValue))
        tempValue = ''

print(result)

      



Result:

{
  'M': [47482, 14, 7, 26, 25, 20, 11, 17, 7, 14, 35, 30, 15, 16, 4, 15, 37, 24, 5, 27, 35, 10, 5, 24, 175, 13],
  'I': [4, 7, 1, 4, 2, 7, 7, 22, 3, 3, 2, 4, 11, 3, 3, 15],
  'D': [8, 1, 17, 5, 7, 1, 5, 6, 3],
  'H': [236792]
}

      

+1


source


If you don't want to use regex, you can write your own method.

myDict = {}
num_string = ''

for char in string1:
    if char.isalpha():
        myDict.setdefault(char,[]).append(int(num_string))
        num_string = ''
    else if char.isdigit():
        num_string += char

      

Note. Don't use a keyword dict

to reference a variable.

+1


source


Without using regex:

string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"


d = {}
str_num = ''
for c in string1:
    if c.isdigit():
        str_num += c
    else:
        if not c in d:
            d[c] = []
        d[c].append(int(str_num))
        str_num = ''

print(d)
>>>  {'I': ['4', '7', '1', '4', '2', '7', '7', '22', '3', '3', '2', '4', '11', '3', '3', '15'], 'H': ['236792'], 'M': ['47482', '14', '7', '26', '25', '20', '11', '17', '7', '14', '35', '30', '15', '16', '4', '15', '37', '24', '5', '27', '35', '10', '5', '24', '175', '13'], 'D': ['8', '1', '17', '5', '7', '1', '5', '6', '3']}

      

0


source


Also without rexexp:

string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"
abc = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

s = ''
for k in string1:
    if k.isalpha():
        print('found', k, 'value', s)
        #add to dict here
        s = ''
    else:
        s += k

      

0


source







All Articles