How do I write a regex that captures the first non-numeric part of a string that also doesn't include 3 or more spaces?
I am using Ruby 2.4. I want to extract from a string the first consecutive occurrence of non-numeric characters that do not include at least three or more spaces. For example, in this line
str = "123 aa bb cc 33 dd"
The first such entry is " aa bb "
. I thought the below expression would help me
data.split(/[[:space:]][[:space:]][[:space:]]+/).first[/\p{L}\D+\p{L}\p{L}/i]
but if a string "123 456 aaa"
, it won't be able to return " aaa"
to which I would like.
source to share
Remove all numbers + spaces from the beginning of the line. Then strip 3 or more spaces and take the first item.
def parse_it(s)
s[/\A(?:[\d[:space:]]*\d)?(\D+)/, 1].split(/[[:space:]]{3,}/).first
end
puts parse_it("123 aa bb cc 33 dd")
# => aa bb
puts parse_it("123 456 aaa")
# => aaa
See Ruby demo
The first regular expression \A(?:[\d[:space:]]*\d)?(\D+)
matches:
-
\A
- beginning of line -
(?:[\d[:space:]]*\d)?
- optional sequence:-
[\d[:space:]]*
- 0 + numbers or spaces -
\d
- digit
-
-
(\D+)
-group 1 spanning 1 or more non-digits
Regular expression [[:space:]]{3,}
, it matches 3 or more spaces.
source to share
r = /
(?: # begin non-capture group
[ ]{,2} # match 0, 1 or 2 spaces
[^[ ]\d]+ # match 1+ characters that are neither spaces nor digits
)+ # end non-capture group and perform 1+ times
[ ]{,2} # match 0, 1 or 2 spaces
/x # free-spacing regex definition mode
str = "123 aa bb cc 33 dd"
str[r] #=> " aa bb "
Note that this [ ]
can be replaced with a space if the free-spacing regex detection mode is not used:
r = /(?: {,2}[^ \d]+)+ {,2}/
source to share
It looks like it did it:
regex = /(?: {1,2}[[:alpha:]]{2,})+/
"123 aa bb cc 33 dd"[regex] # => " aa bb"
"123 456 aaa"[regex] # => " aaa"
-
(?: ... )
- non-exciting group. -
{1,2}
means "find at least one and no more than two". -
[[:alpha:]]
is the POSIX definition for alphabet characters. It is more comprehensive than[a-z]
.
You should be able to figure out the rest, which is fully documented in the documentation Regexp and String []
documentation .
source to share