Extract headwords with sub-numbers and underscores

I'm a Perl newbie in need of some help from Perl regex. I want to extract words from a file that can only contain UPPERCASE characters and / OR numeric digits and underscores (either at the beginning, or at the end of a word, or in the middle). The word separator can be a space or any other uppercase, non-numeric, no underscore.

3 examples:

abcd _PARAM123="dfd"; (I want to extract _PARAM123)
abcd PARAM2_:12; (I want to extract PARAM2_)
abcd PARA_M-1; (I want to extract PARA_M)

      

0


source to share


2 answers


You can



my @words = $str =~ /( [A-Z_] [0-9A-Z_]+ )/xg;

      

+1


source


Since you didn't answer my last query, I'm going to assume that a word is not considered a word if there are only numbers (and / or underscores), for example, 12

and are 1_2

not considered words.

In this case, I suggest this regex:

(?=[0-9_]*[A-Z])\b[A-Z0-9_]+\b

      

regex101 demo



(?=[A-Z0-9_]*[A-Z])

is a positive lookahead and ensures that there is at least one uppercase character in the regexp. If you count 1_2

as a word, use (?=[A-Z0-9_]*[A-Z_])

instead.

\b

is a word boundary and is what ensures that there are no lowercase characters attached to the search word.

[A-Z0-9_]

is a character class and will match any character in the range A-Z

(uppercase), 0-9

(numbers), and underscore.

+

means that the previous group or character can occur 1 or more times.

+1


source







All Articles