How to split a string and return its delimiter in python?

Question

How to split a string and return its delimiter in python?

I have a line that looks like this:

string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"

I want to split alphabetically (i.e. AZ or az) and put the associated value into a dictionary of lists. Each set of numbers is associated with alphabets. eg,

'M' is associated with 47482, 14, 7I7, etc.

"I" is associated with 4, 1, etc.

'H' is associated with 236792

My final data structure will be like

    dict = { 
      M:[47482, 14, 717],
      I:[4, 1],
      H:[236792]

    }

My attempt:

import re
string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"
tmp = re.split('[a-zA-Z]', string1)
print(tmp)

I cannot get these alphabets as delimiters. Need help creating data structure.

+3

python dictionary

Arijit Jul 26 17 at 12:12

source to share

5 answers

Another solution, without user regex:

import string
string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"

result = dict()
tempValue = ''
for char in string1:

    if char not in string.ascii_letters:
        tempValue += char

    else:

        if char not in result:
            result[char] = []

        result[char].append(int(tempValue))
        tempValue = ''

print(result)

Result:

{
  'M': [47482, 14, 7, 26, 25, 20, 11, 17, 7, 14, 35, 30, 15, 16, 4, 15, 37, 24, 5, 27, 35, 10, 5, 24, 175, 13],
  'I': [4, 7, 1, 4, 2, 7, 7, 22, 3, 3, 2, 4, 11, 3, 3, 15],
  'D': [8, 1, 17, 5, 7, 1, 5, 6, 3],
  'H': [236792]
}

+1

Antwane Jul 26 17 at 12:23

source to share

If you don't want to use regex, you can write your own method.

myDict = {}
num_string = ''

for char in string1:
    if char.isalpha():
        myDict.setdefault(char,[]).append(int(num_string))
        num_string = ''
    else if char.isdigit():
        num_string += char

Note. Don't use a keyword dict

to reference a variable.

+1

yinnonsanders Jul 26 17 at 12:25

source to share

Without using regex:

string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"


d = {}
str_num = ''
for c in string1:
    if c.isdigit():
        str_num += c
    else:
        if not c in d:
            d[c] = []
        d[c].append(int(str_num))
        str_num = ''

print(d)
>>>  {'I': ['4', '7', '1', '4', '2', '7', '7', '22', '3', '3', '2', '4', '11', '3', '3', '15'], 'H': ['236792'], 'M': ['47482', '14', '7', '26', '25', '20', '11', '17', '7', '14', '35', '30', '15', '16', '4', '15', '37', '24', '5', '27', '35', '10', '5', '24', '175', '13'], 'D': ['8', '1', '17', '5', '7', '1', '5', '6', '3']}

0

LeopoldVonBuschLight Jul 26 17 at 12:19

source to share

Also without rexexp:

string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"
abc = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

s = ''
for k in string1:
    if k.isalpha():
        print('found', k, 'value', s)
        #add to dict here
        s = ''
    else:
        s += k

0

Joe Jul 26 17 at 12:28

source to share

Raniz · Accepted Answer · 2017-07-26T12:20:08+0000

You are on the right track, but you have to use a slightly different regex and use re.findall

. Like this:

In [1]: string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"

In [2]: import re, collections

In [3]: p = re.compile("([0-9]+)([A-Za-z])")

In [4]: dct = collections.defaultdict(list)

In [5]: for number, letter in p.findall(string1):
    ...:     dct[letter].append(number)
    ...:      

In [6]: dct
Out[6]: 
defaultdict(list,
            {'D': ['8', '1', '17', '5', '7', '1', '5', '6', '3'],
             'H': ['236792'],
             'I': ['4', '7', '1', '4', '2', '7', '7', '22', '3', '3', '2', '4', '11', '3', '3', '15'],
             'M': ['47482', '14', '7', '26', '25', '20', '11', '17', '7', '14', '35', '30', '15', '16', '4', '15', '37', '24', '5', '27', '35', '10', '5', '24', '175', '13']})

This identifies all pairs of numbers followed by a letter in the string and puts all those pairs in a dictionary with the letter as the key, duplicate numbers are allowed.

How to split a string and return its delimiter in python?

More articles: