Use the same line to test for regular expressions
I am new to regex but I learned a thing or two. I got into a problem that might not be possible to solve with a regular expression, so I need some advice.
I have the following line:
some text key 12, 32, 311 ,465 and 345. some other text dog 612,
12, 32, 9 and 10. some text key 1, 2.
I'm trying to figure out if it's possible (using a regex only) to extract numbers 12
32
311
465
345
1
2
- as a collection of individual matches.
When I approach this problem, I have tried to find a pattern that matches only the relevant results. So I came up with:
- get numbers prefixed with "key" and NOT have the prefix "dog".
But I'm not sure if this is possible. I mean I know that for a number 1
I can use (?<=key )+[\d]+
and get it as a result, but for other numbers (i.e. 2..5
) I can use the prefix againkey
source to share
You can do this in steps 2
.
(?<=key\\s)\\d+(?:\\s*(?:,|and)\\s*\\d+)*
Capture all numbers. See demo.
https://regex101.com/r/uK9cD8/6
Then split
or extract \\d+
out of it. See demo.
source to share
In Java, you can use a constrained width view that accepts a {n,m}
constraint quantifier.
So you can use
(?<=key(?:(?!dog)[^.]){0,100})[0-9]+
Or, if key
and dog
are whole words, use a \b
word boundary:
String pattern = "(?<=\\bkey\\b(?:(?!\\bdog\\b)[^.]){0,100})[0-9]+";
The only problem can arise if the distance between dog
or key
and numbers is greater than m
. You can increase it to 1000 and I think this will work in most cases.
Example IDEONE demo
String str = "some text key 12, 32, 311 ,465 and 345. some other text dog 612,\n12, 32, 9 and 10. some text key 1, 2.";
String str2 = "some text key 1, 2, 3 ,4 and 5. some other text dog 6, 7, 8, 9 and 10. some text, key 1, 2 dog 3, 4 key 5, 6";
Pattern ptrn = Pattern.compile("(?<=key(?:(?!dog)[^.]){0,100})[0-9]+");
Matcher m = ptrn.matcher(str);
while (m.find()) {
System.out.println(m.group(0));
}
System.out.println("-----");
m = ptrn.matcher(str2);
while (m.find()) {
System.out.println(m.group(0));
}
source to share
I would not recommend using code you cannot understand and configure, but here is my one-pass solution using the method described in this answer of mine . If you want to understand the construction method, read the other answer.
(?:key(?>\s+and\s+|[\s,]+)|(?!^)\G(?>\s+and\s+|[\s,]+))(\d+)
Compared to the method described in another post, I dropped the prediction as we don't need to check the suffix in that case.
Here is the separator (?>\s+and\s+|[\s,]+)
. It currently allows "and" with spaces on either side, or any combination of spaces and commas. I use (?>pattern)
to suppress the countdown, so the order of rotation is significant. Change it to (?:pattern)
if you want to change it and you don't know what you are doing.
Sample code:
String input = "some text key 12, 32, 311 ,465 and 345. some other text dog 612,\n12, 32, 9 and 10. some text key 1, 2. key 1, 2 dog 3, 4 key 5, 6. key is dog 23, 45. key 4";
Pattern p = Pattern.compile("(?:key(?>\\s+and\\s+|[\\s,]+)|(?!^)\\G(?>\\s+and\\s+|[\\s,]+))(\\d+)");
Matcher m = p.matcher(input);
List<String> numbers = new ArrayList<>();
while (m.find()) {
numbers.add(m.group(1));
}
System.out.println(numbers);
source to share
You can use a positive look and feel, which ensures that your sequence doesn't precede any word other than key
:
(?<=key)\s(?:\d+[\s,]+)+(?:and )?\d+
Note, here you don't need to use negative lookahead for dog
, because this regex will just match if your sequence is preceded by key
.
See demo https://regex101.com/r/gZ4hS4/3
source to share