Regular expression processing using software provided by Kimonolabs

Question

Regular expression processing using software provided by Kimonolabs

I am trying to use the software provided by Kimonolabs to get the list of doctors from the website. The problem I am facing is that the line I cleared from the website has an address and postcode separated by a tag <br>

.

Kimono uses this syntax for regex:

/^()(.*?)()$/

first group => to the left of the required content

second group => is what to extract

third group => to the right of the required content

Specifically, here are the regex expressions I've tried:

/^()(.*?)(\<)$/ 
/^()(.*?)(\n)$/
/^()(.*?)(\r)$/

And this is the site I'm trying to clean up: http://www.jameda.de/

Here's an example of a string I'm trying to parse with a regular expression:

<p>Altlaufstr. 22<br>85635 Höhenkirchen-Siegertbrn</p>

However, each of the regex patterns I've tried does not capture any data. I am having trouble understanding regular expressions because I found that the referenced stuff I found is quite complex.

+3

regex web-scraping

Andi Giga 09 Sep 14 at 12:07

source to share

1 answer

Brian stephens · Answer 1 · 2014-09-09T13:16:22+0000

It looks like you are trying to match German zipcodes, which are always 5 digits. This will do it:

/(<br\/?>)(\d{5})()/

Structure:

<br\/?>

indicates that it must be preceded by a tag <br>

(with or without a slash)

\d{5}

- 5 digits

Note: Leave the anchors ^

and $

, which were in the default kion regex, because this regex does not try to match all text - only ZIP.

Regular expression processing using software provided by Kimonolabs

More articles: