REGEX to find the first or two headwords in a string
I am looking for REGEX to find the first or two headwords in a string. If the first two words are capitalized, I want the first two words. The hyphen should be considered part of the word.
- for
Madonna has a new album
I'm looking formadonna
- for
Paul Young has no new album
I'm looking forPaul Young
- for
Emmerson Lake-palmer is not here
I'm looking forEmmerson Lake-palmer
I am using ^[A-Z]+.*?\b( [A-Z]+.*?\b){0,1}
, which works great on the first two, but for the third example, I get Emmerson Lake
instead Emmerson Lake-palmer
.
Which REGEX can I use to find the first one or two headwords in the examples above?
source to share
you can use
^[A-Z][-a-zA-Z]*(?:\s+[A-Z][-a-zA-Z]*)?
See regex demo
Basically, use a character class [-a-zA-Z]*
instead of the dot match pattern to match letters and hyphens.
More details
-
^
- beginning of line -
[A-Z]
- uppercase ASCII letter -
[-a-zA-Z]*
- zero or more ASCII letters / hyphen -
(?:\s+[A-Z][-a-zA-Z]*)?
- optional (1 or 0 due to the quantifier?
) sequence:-
\s+
- 1+ spaces -
[A-Z]
- uppercase ASCII letter -
[-a-zA-Z]*
- zero or more ASCII letters / hyphen
-
Unicode equivalent (for regex flavors that support Unicode property classes):
^\p{Lu}[-\p{L}]*(?:\s+\p{Lu}[-\p{L}]*)?
where \p{L}
matches any letter, but \p{Lu}
matches any uppercase letter.
source to share