Matching rounds

I have a text with the following structure:

Round 1

some multiline text ...

Round 2

some multiline text ...

...

Round N

some multiline text ...

      

I would like to combine the rounds with their multi-line text.

None of the expressions give the correct result:

(Round \ S \ D +) ((?! Round). *?)

(Round \ S \ D +) (. *?)

Can anyone help me?

Thanks in advance.

+1


source to share


6 answers


The period ( .

) character by default matches all characters except newlines. In many languages, you can use the modifier s

to match dots with all characters , including newlines. It should look something like this:

/(Round\s\d+)(.*?)(Round\s\d+|$)/s

      



(Not 100% sure if this regex will work, I'll just show you how to use the modifier s

.)

Edit: Tested on regexpal.com and seems to work.

+1


source


Using a regex directly on multiple lines can be tricky (in terms of readability and maintainability).



I would process text line by line and use a data structure to store what has been seen so far. You can compare this to handling email when you have headers, body, etc.

+1


source


Is this a C # question?

(Round \ S \ D +) (. *?)

Use RegexOptions.Singleline

Singleline Specifies single line mode. Modifies the value of the period (.) To match every character (instead of every character except \ n).

And you should probably use Match instead of Match.

+1


source


It is rare if ever correct to use a reluctant quantifier like the latter in a regular expression. In this regex:

/(Round\s+\d+)(.*?)/s

      

... the first thing the part does (.*?)

is try to match null characters. This is a perfectly legitimate coincidence, and since the quantifier is reluctant, it stops right there. If you are going to do this, there must be something after (.*?)

, like:

/(Round\s+\d+)(.*?)(Round\s+\d+)/s

      

Thus, (.*?)

it cannot stop at the null character; it has to keep consuming characters consistent until it reaches the place where the next regex part - (Round\s+\d+)

. But you don't want to use that regex because it consumes some of what should be the next match. By sticking to this format, you can use lookahead as an end condition:

/(Round\s+\d+)(.*?)(?=Round\s+\d+|$)/s

      

Now it is forced to match the whole record, but the match position remains at the beginning of the next record, so the next match attempt will start. (EDIT: Added |$

to lookahead to match last entry.)

EDIT: I wanted to comment on your other expression too:

/(Round\s+\d+)((?!Round).*?)/s

      

Here, instead of using a positive lookahead as an end condition, it looks like you are trying to use a negative lookahead. For this to work, you need to look at each position before the point is able to consume the character. This means that period must be enclosed in parentheses using a lookahead with a quantifier outside of them:

/(Round\s+\d+)((?:(?!Round).)*)/s

      

You also cannot use the ambiguity quantifier in this regex for the same reason as the other.

Probably the best way to do this, but I will need to know more about the data and your requirements before I can suggest anything.

(Note that I used a Perl-like syntax with slash delimiters and trailing 's' modifier for single line mode, because regexes tend to confuse site syntax without them.)

0


source


This will do the trick with RegexOptions.SingleLine:

Round\s+\d+(.*?)(?=Round\s\d|$)

      

0


source


Alan, great tips for regular expressions. I lacked practice with looks.

/ (Round \ s + \ d +) (. *?) (? = Round \ s + \ d + | $) / s does exactly what I need.

/ (Round \ s + \ d +) ((! Round).) * / S also works, but each letter is a separate capture.

Many thanks.

To describe my details more precisely, you can look here for example: http://www.rsssf.com/tablesi/ital09.html

Actually I need to import all information about rounds, matches, results, their dates into my database.

I have another problem to solve: how to correlate my already saved commands with those in the mapping results. For example, I have an "Inter" command in my db. But the result of the match may look like

Internazionale 1: 1 Juventus or FC Inter 1-1 Juventus

In the future, I would like to do regex queries like "get all match results for Inter" so I don't see all the content.

So my idea was to store their possible names (tags) in each command and then concatenate them with ..

For example, / (Inter | Internazionale | FC Inter) \ s + \ d + - \ d + \ d + (\ w +) / s

Also I have doubts about (\ w +) for any team match. I am afraid that I need to concatenate all the team name tags with | and use there. For 30 commands and 2-3 tags, this would be a huge regex.

I appreciate your help.

0


source







All Articles