REGEX to find the first or two headwords in a string

I am looking for REGEX to find the first or two headwords in a string. If the first two words are capitalized, I want the first two words. The hyphen should be considered part of the word.

  • for Madonna has a new album

    I'm looking formadonna

  • for Paul Young has no new album

    I'm looking forPaul Young

  • for Emmerson Lake-palmer is not here

    I'm looking forEmmerson Lake-palmer

I am using ^[A-Z]+.*?\b( [A-Z]+.*?\b){0,1}

, which works great on the first two, but for the third example, I get Emmerson Lake

instead Emmerson Lake-palmer

.

Which REGEX can I use to find the first one or two headwords in the examples above?

+3


source to share


2 answers


you can use

^[A-Z][-a-zA-Z]*(?:\s+[A-Z][-a-zA-Z]*)?

      

See regex demo

Basically, use a character class [-a-zA-Z]*

instead of the dot match pattern to match letters and hyphens.

More details



  • ^

    - beginning of line
  • [A-Z]

    - uppercase ASCII letter
  • [-a-zA-Z]*

    - zero or more ASCII letters / hyphen
  • (?:\s+[A-Z][-a-zA-Z]*)?

    - optional (1 or 0 due to the quantifier ?

    ) sequence:
    • \s+

      - 1+ spaces
    • [A-Z]

      - uppercase ASCII letter
    • [-a-zA-Z]*

      - zero or more ASCII letters / hyphen

Unicode equivalent (for regex flavors that support Unicode property classes):

^\p{Lu}[-\p{L}]*(?:\s+\p{Lu}[-\p{L}]*)?

      

where \p{L}

matches any letter, but \p{Lu}

matches any uppercase letter.

+5


source


This is probably easier:

^([A-Z][-A-Za-z]+)(\s[A-Z][-A-Za-z]+)?

      



Replace +

with *

if you expect single letter words.

+2


source







All Articles