Match specific length words anchored without doing magic math

Question

Match specific length words anchored without doing magic math

Let's say I wanted to find all 12 letter words in /usr/share/dict/words

that started with c

and ended with er

. On top of my head, a working pattern might look something like this:

grep -E '^c.{9}er$' /usr/share/dict/words

He finds:

cabinetmaker
calcographer
calligrapher
campanologer
campylometer
...

But it worries me .{9}

. It feels too magical to subtract the total length of all anchor characters from the number defined in the original constraint.

Is there a way to rewrite this regex so that it doesn't require doing this calculation in front, while allowing the literal 12

to be used directly in the template?

+3

regex grep

smitelli 08 Aug 14 at 19:40

source to share

4 answers

You can awk

alternatively use and avoid this calculation:

awk -v len=12 'length($1)==len && $1 ~ /^c.*?er$/' file

0

anubhava 08 Aug 14 at 19:44

source to share

I don't know grep

that well, but some more advanced NEX RegEx implementations provide you with views and lookbehinds. If you can find any ways to make them available to you, you can write:

^(?=c).{12}(?<=er)$

Perhaps as a perl

one-liner?

cat /usr/share/dict/words | perl -ne "print if m/^(?=c).{12}(?<=er)$/"

0

Julian 08 Aug 14 at 19:48

source to share

One approach with GNU sed

:

$ sed -nr '/^.{12}$/{/^c.*er$/p}' words

With BSD sed

(Mac OS) this would be:

$ sed -nE '/^.{12}$/{/^c.*er$/p;}' words

0

Chris seymour 08 Aug '14 at 19:50

source to share

hwnd · Accepted Answer · 2014-08-08T19:52:58+0000

You can use an option -x

that only selects matches that exactly match the entire string.

grep -xE '.{12}' | grep 'c.*er'

Perfect demonstration

Or, use a parameter -P

that qualifies the pattern as a Perl regex and uses the lookahead assertion.

grep -P '^(?=.{12}$)c.*er$'

Perfect demonstration

Match specific length words anchored without doing magic math

More articles: