How do I write a regex that captures the first non-numeric part of a string that also doesn't include 3 or more spaces?

I am using Ruby 2.4. I want to extract from a string the first consecutive occurrence of non-numeric characters that do not include at least three or more spaces. For example, in this line

str = "123 aa bb      cc 33 dd"

      

The first such entry is " aa bb "

. I thought the below expression would help me

data.split(/[[:space:]][[:space:]][[:space:]]+/).first[/\p{L}\D+\p{L}\p{L}/i]

      

but if a string "123 456 aaa"

, it won't be able to return " aaa"

to which I would like.

+3


source to share


4 answers


Remove all numbers + spaces from the beginning of the line. Then strip 3 or more spaces and take the first item.

def parse_it(s)
    s[/\A(?:[\d[:space:]]*\d)?(\D+)/, 1].split(/[[:space:]]{3,}/).first
end

puts parse_it("123 aa bb      cc 33 dd")
# =>  aa bb
puts parse_it("123      456 aaa")
# =>  aaa

      

See Ruby demo



The first regular expression \A(?:[\d[:space:]]*\d)?(\D+)

matches:

  • \A

    - beginning of line
  • (?:[\d[:space:]]*\d)?

    - optional sequence:
    • [\d[:space:]]*

      - 0 + numbers or spaces
    • \d

      - digit
  • (\D+)

    -group 1 spanning 1 or more non-digits

Regular expression [[:space:]]{3,}

, it matches 3 or more spaces.

+1


source


r = /
    (?:         # begin non-capture group
      [ ]{,2}   # match 0, 1 or 2 spaces
      [^[ ]\d]+ # match 1+ characters that are neither spaces nor digits
    )+          # end non-capture group and perform 1+ times
    [ ]{,2}     # match 0, 1 or 2 spaces
    /x          # free-spacing regex definition mode

str = "123 aa bb      cc 33    dd"

str[r] #=> " aa bb  "

      

Note that this [ ]

can be replaced with a space if the free-spacing regex detection mode is not used:



r = /(?: {,2}[^ \d]+)+ {,2}/

      

+3


source


It looks like it did it:

regex = /(?: {1,2}[[:alpha:]]{2,})+/
"123 aa bb      cc 33 dd"[regex] # => " aa bb"
"123      456 aaa"[regex] # => " aaa"

      

  • (?: ... )

    - non-exciting group.
  • {1,2}

    means "find at least one and no more than two".
  • [[:alpha:]]

    is the POSIX definition for alphabet characters. It is more comprehensive than [a-z]

    .

You should be able to figure out the rest, which is fully documented in the documentation Regexp and String []

documentation
.

+1


source


Will this work?

str.match(/(?:  ?)?(?:[^ 0-9]+(?:  ?)?)+/)[0]

      

or apparently

str[/(?:  ?)?(?:[^ 0-9]+(?:  ?)?)+/]

      

or using Cary nice space match,

str[/ {,2}(?:[^ 0-9]+ {,2})+/]

      

-1


source







All Articles