Regular expression for multiple events in python

Question

Regular expression for multiple events in python

I need to parse lines with multiple language codes below

008800002     Bruxelles-Nord$Br ussel Nord$<deu>$Brussel Noord$<nld>

008800002

is id
Bruxelles-Nord$Br ussel Nord$

is a name1
deu

is language 1
$Brussel Noord$

name 2
nld

is a second language.

SO, an idea is a name and a language can appear N times. I need to collect them all. language in <>

is 3 characters long (fixed) and all names end with a sign $

.

I tried this one but it doesn't give the expected result.

x = re.compile('(?P<stop_id>\d{9})\s(?P<authority>[[\x00-\x7F]{3}|\s{3}])\s(?P<stop_name>.*)
    (?P<lang_code>(?:[<]\S{0,4}))',flags=re.UNICODE)

I don't know how to get duplicate items. Required

Bruxelles-Nord$Br ussel Nord$<deu>$Brussel Noord$

as stop_name and <nld>

as language.

+3

python regex

Ishan bhatt 01 oct. '14 at 9:18

source to share

2 answers

\b(\d+)\b\s*|(.*?)(?=<)<(.*?)>

Try it. Just grab the capture.see demo.

http://regex101.com/r/hS3dT7/4

+2

vks 01 oct. '14 at 9:30

source to share

Amadan · Accepted Answer · 2014-10-01T09:31:47+0000

Do this in two steps. The first single identifier from the name / language pairs; then use re.finditer

in the name / language section to iterate over the pairs and stuff them into a dict.

import re

line = u"008800002     Bruxelles-Nord$Br ussel Nord$<deu>$Brussel Noord$<nld>"
m = re.search("(\d+)\s+(.*)", line, re.UNICODE)
id = m.group(1)
names = {}
for m in re.finditer("(.*?)<(.*?)>", m.group(2), re.UNICODE):
    names[m.group(2)] = m.group(1)
print id, names

Regular expression for multiple events in python

More articles: