Extract headwords with sub-numbers and underscores
I'm a Perl newbie in need of some help from Perl regex. I want to extract words from a file that can only contain UPPERCASE characters and / OR numeric digits and underscores (either at the beginning, or at the end of a word, or in the middle). The word separator can be a space or any other uppercase, non-numeric, no underscore.
3 examples:
abcd _PARAM123="dfd"; (I want to extract _PARAM123)
abcd PARAM2_:12; (I want to extract PARAM2_)
abcd PARA_M-1; (I want to extract PARA_M)
source to share
Since you didn't answer my last query, I'm going to assume that a word is not considered a word if there are only numbers (and / or underscores), for example, 12
and are 1_2
not considered words.
In this case, I suggest this regex:
(?=[0-9_]*[A-Z])\b[A-Z0-9_]+\b
(?=[A-Z0-9_]*[A-Z])
is a positive lookahead and ensures that there is at least one uppercase character in the regexp. If you count 1_2
as a word, use (?=[A-Z0-9_]*[A-Z_])
instead.
\b
is a word boundary and is what ensures that there are no lowercase characters attached to the search word.
[A-Z0-9_]
is a character class and will match any character in the range A-Z
(uppercase), 0-9
(numbers), and underscore.
+
means that the previous group or character can occur 1 or more times.
source to share