How do I write a regex that captures the first non-numeric part of a string that also doesn't include 3 or more spaces?

Question

How do I write a regex that captures the first non-numeric part of a string that also doesn't include 3 or more spaces?

I am using Ruby 2.4. I want to extract from a string the first consecutive occurrence of non-numeric characters that do not include at least three or more spaces. For example, in this line

str = "123 aa bb      cc 33 dd"

The first such entry is " aa bb "

. I thought the below expression would help me

data.split(/[[:space:]][[:space:]][[:space:]]+/).first[/\p{L}\D+\p{L}\p{L}/i]

but if a string "123 456 aaa"

, it won't be able to return " aaa"

to which I would like.

+3

string ruby regex

Dave May 26 '17 at 20:47

source to share

4 answers

r = /
    (?:         # begin non-capture group
      [ ]{,2}   # match 0, 1 or 2 spaces
      [^[ ]\d]+ # match 1+ characters that are neither spaces nor digits
    )+          # end non-capture group and perform 1+ times
    [ ]{,2}     # match 0, 1 or 2 spaces
    /x          # free-spacing regex definition mode

str = "123 aa bb      cc 33    dd"

str[r] #=> " aa bb  "

Note that this [ ]

can be replaced with a space if the free-spacing regex detection mode is not used:

r = /(?: {,2}[^ \d]+)+ {,2}/

+3

Cary swoveland May 26 '17 at 21:51

source to share

It looks like it did it:

regex = /(?: {1,2}[[:alpha:]]{2,})+/
"123 aa bb      cc 33 dd"[regex] # => " aa bb"
"123      456 aaa"[regex] # => " aaa"

(?: ... )

- non-exciting group.
{1,2}

means "find at least one and no more than two".
[[:alpha:]]

is the POSIX definition for alphabet characters. It is more comprehensive than [a-z]

.

You should be able to figure out the rest, which is fully documented in the documentation Regexp and String []

documentation .

+1

the Tin Man May 26 '17 at 21:27

source to share

Will this work?

str.match(/(?:  ?)?(?:[^ 0-9]+(?:  ?)?)+/)[0]

or apparently

str[/(?:  ?)?(?:[^ 0-9]+(?:  ?)?)+/]

or using Cary nice space match,

str[/ {,2}(?:[^ 0-9]+ {,2})+/]

-1

NetMage May 26 '17 at 21:05

source to share

Wiktor Stribiżew · Accepted Answer · 2017-05-26T21:37:11+0000

Remove all numbers + spaces from the beginning of the line. Then strip 3 or more spaces and take the first item.

def parse_it(s)
    s[/\A(?:[\d[:space:]]*\d)?(\D+)/, 1].split(/[[:space:]]{3,}/).first
end

puts parse_it("123 aa bb      cc 33 dd")
# =>  aa bb
puts parse_it("123      456 aaa")
# =>  aaa

See Ruby demo

The first regular expression \A(?:[\d[:space:]]*\d)?(\D+)

matches:

\A

- beginning of line
(?:[\d[:space:]]*\d)?

- optional sequence:
- [\d[:space:]]*
  
  - 0 + numbers or spaces
- \d
  
  - digit
(\D+)

-group 1 spanning 1 or more non-digits

Regular expression [[:space:]]{3,}

, it matches 3 or more spaces.

How do I write a regex that captures the first non-numeric part of a string that also doesn't include 3 or more spaces?

More articles: