Extract headwords with sub-numbers and underscores

Question

Extract headwords with sub-numbers and underscores

I'm a Perl newbie in need of some help from Perl regex. I want to extract words from a file that can only contain UPPERCASE characters and / OR numeric digits and underscores (either at the beginning, or at the end of a word, or in the middle). The word separator can be a space or any other uppercase, non-numeric, no underscore.

3 examples:

abcd _PARAM123="dfd"; (I want to extract _PARAM123)
abcd PARAM2_:12; (I want to extract PARAM2_)
abcd PARA_M-1; (I want to extract PARA_M)

0

regex perl

user2805732 05 oct. 13 at 17:52

source to share

2 answers

Dry27 · Answer 1 · 2013-10-05T17:56:02+0000

You can

my @words = $str =~ /( [A-Z_] [0-9A-Z_]+ )/xg;

Jerry · Answer 2 · 2013-10-05T20:18:13+0000

Since you didn't answer my last query, I'm going to assume that a word is not considered a word if there are only numbers (and / or underscores), for example, 12

and are 1_2

not considered words.

In this case, I suggest this regex:

(?=[0-9_]*[A-Z])\b[A-Z0-9_]+\b

regex101 demo

(?=[A-Z0-9_]*[A-Z])

is a positive lookahead and ensures that there is at least one uppercase character in the regexp. If you count 1_2

as a word, use (?=[A-Z0-9_]*[A-Z_])

instead.

\b

is a word boundary and is what ensures that there are no lowercase characters attached to the search word.

[A-Z0-9_]

is a character class and will match any character in the range A-Z

(uppercase), 0-9

(numbers), and underscore.

+

means that the previous group or character can occur 1 or more times.

Extract headwords with sub-numbers and underscores

More articles: