How to split a string and return its delimiter in python?
I have a line that looks like this:
string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"
I want to split alphabetically (i.e. AZ or az) and put the associated value into a dictionary of lists. Each set of numbers is associated with alphabets. eg,
'M' is associated with 47482, 14, 7I7, etc.
"I" is associated with 4, 1, etc.
'H' is associated with 236792
My final data structure will be like
dict = {
M:[47482, 14, 717],
I:[4, 1],
H:[236792]
}
My attempt:
import re
string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"
tmp = re.split('[a-zA-Z]', string1)
print(tmp)
I cannot get these alphabets as delimiters. Need help creating data structure.
source to share
You are on the right track, but you have to use a slightly different regex and use re.findall
. Like this:
In [1]: string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"
In [2]: import re, collections
In [3]: p = re.compile("([0-9]+)([A-Za-z])")
In [4]: dct = collections.defaultdict(list)
In [5]: for number, letter in p.findall(string1):
...: dct[letter].append(number)
...:
In [6]: dct
Out[6]:
defaultdict(list,
{'D': ['8', '1', '17', '5', '7', '1', '5', '6', '3'],
'H': ['236792'],
'I': ['4', '7', '1', '4', '2', '7', '7', '22', '3', '3', '2', '4', '11', '3', '3', '15'],
'M': ['47482', '14', '7', '26', '25', '20', '11', '17', '7', '14', '35', '30', '15', '16', '4', '15', '37', '24', '5', '27', '35', '10', '5', '24', '175', '13']})
This identifies all pairs of numbers followed by a letter in the string and puts all those pairs in a dictionary with the letter as the key, duplicate numbers are allowed.
source to share
Another solution, without user regex:
import string
string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"
result = dict()
tempValue = ''
for char in string1:
if char not in string.ascii_letters:
tempValue += char
else:
if char not in result:
result[char] = []
result[char].append(int(tempValue))
tempValue = ''
print(result)
Result:
{
'M': [47482, 14, 7, 26, 25, 20, 11, 17, 7, 14, 35, 30, 15, 16, 4, 15, 37, 24, 5, 27, 35, 10, 5, 24, 175, 13],
'I': [4, 7, 1, 4, 2, 7, 7, 22, 3, 3, 2, 4, 11, 3, 3, 15],
'D': [8, 1, 17, 5, 7, 1, 5, 6, 3],
'H': [236792]
}
source to share
If you don't want to use regex, you can write your own method.
myDict = {}
num_string = ''
for char in string1:
if char.isalpha():
myDict.setdefault(char,[]).append(int(num_string))
num_string = ''
else if char.isdigit():
num_string += char
Note. Don't use a keyword dict
to reference a variable.
source to share
Without using regex:
string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"
d = {}
str_num = ''
for c in string1:
if c.isdigit():
str_num += c
else:
if not c in d:
d[c] = []
d[c].append(int(str_num))
str_num = ''
print(d)
>>> {'I': ['4', '7', '1', '4', '2', '7', '7', '22', '3', '3', '2', '4', '11', '3', '3', '15'], 'H': ['236792'], 'M': ['47482', '14', '7', '26', '25', '20', '11', '17', '7', '14', '35', '30', '15', '16', '4', '15', '37', '24', '5', '27', '35', '10', '5', '24', '175', '13'], 'D': ['8', '1', '17', '5', '7', '1', '5', '6', '3']}
source to share
Also without rexexp:
string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"
abc = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
s = ''
for k in string1:
if k.isalpha():
print('found', k, 'value', s)
#add to dict here
s = ''
else:
s += k
source to share