Matching rounds
I have a text with the following structure:
Round 1
some multiline text ...
Round 2
some multiline text ...
...
Round N
some multiline text ...
I would like to combine the rounds with their multi-line text.
None of the expressions give the correct result:
(Round \ S \ D +) ((?! Round). *?)
(Round \ S \ D +) (. *?)
Can anyone help me?
Thanks in advance.
The period ( .
) character by default matches all characters except newlines. In many languages, you can use the modifier s
to match dots with all characters , including newlines. It should look something like this:
/(Round\s\d+)(.*?)(Round\s\d+|$)/s
(Not 100% sure if this regex will work, I'll just show you how to use the modifier s
.)
Edit: Tested on regexpal.com and seems to work.
source to share
It is rare if ever correct to use a reluctant quantifier like the latter in a regular expression. In this regex:
/(Round\s+\d+)(.*?)/s
... the first thing the part does (.*?)
is try to match null characters. This is a perfectly legitimate coincidence, and since the quantifier is reluctant, it stops right there. If you are going to do this, there must be something after (.*?)
, like:
/(Round\s+\d+)(.*?)(Round\s+\d+)/s
Thus, (.*?)
it cannot stop at the null character; it has to keep consuming characters consistent until it reaches the place where the next regex part - (Round\s+\d+)
. But you don't want to use that regex because it consumes some of what should be the next match. By sticking to this format, you can use lookahead as an end condition:
/(Round\s+\d+)(.*?)(?=Round\s+\d+|$)/s
Now it is forced to match the whole record, but the match position remains at the beginning of the next record, so the next match attempt will start. (EDIT: Added |$
to lookahead to match last entry.)
EDIT: I wanted to comment on your other expression too:
/(Round\s+\d+)((?!Round).*?)/s
Here, instead of using a positive lookahead as an end condition, it looks like you are trying to use a negative lookahead. For this to work, you need to look at each position before the point is able to consume the character. This means that period must be enclosed in parentheses using a lookahead with a quantifier outside of them:
/(Round\s+\d+)((?:(?!Round).)*)/s
You also cannot use the ambiguity quantifier in this regex for the same reason as the other.
Probably the best way to do this, but I will need to know more about the data and your requirements before I can suggest anything.
(Note that I used a Perl-like syntax with slash delimiters and trailing 's' modifier for single line mode, because regexes tend to confuse site syntax without them.)
source to share
Alan, great tips for regular expressions. I lacked practice with looks.
/ (Round \ s + \ d +) (. *?) (? = Round \ s + \ d + | $) / s does exactly what I need.
/ (Round \ s + \ d +) ((! Round).) * / S also works, but each letter is a separate capture.
Many thanks.
To describe my details more precisely, you can look here for example: http://www.rsssf.com/tablesi/ital09.html
Actually I need to import all information about rounds, matches, results, their dates into my database.
I have another problem to solve: how to correlate my already saved commands with those in the mapping results. For example, I have an "Inter" command in my db. But the result of the match may look like
Internazionale 1: 1 Juventus or FC Inter 1-1 Juventus
In the future, I would like to do regex queries like "get all match results for Inter" so I don't see all the content.
So my idea was to store their possible names (tags) in each command and then concatenate them with ..
For example, / (Inter | Internazionale | FC Inter) \ s + \ d + - \ d + \ d + (\ w +) / s
Also I have doubts about (\ w +) for any team match. I am afraid that I need to concatenate all the team name tags with | and use there. For 30 commands and 2-3 tags, this would be a huge regex.
I appreciate your help.