Match specific length words anchored without doing magic math

Let's say I wanted to find all 12 letter words in /usr/share/dict/words

that started with c

and ended with er

. On top of my head, a working pattern might look something like this:

grep -E '^c.{9}er$' /usr/share/dict/words

      

He finds:

cabinetmaker
calcographer
calligrapher
campanologer
campylometer
...

      

But it worries me .{9}

. It feels too magical to subtract the total length of all anchor characters from the number defined in the original constraint.

Is there a way to rewrite this regex so that it doesn't require doing this calculation in front, while allowing the literal 12

to be used directly in the template?

+3


source to share


4 answers


You can use an option -x

that only selects matches that exactly match the entire string.

grep -xE '.{12}' | grep 'c.*er'

      

Perfect demonstration



Or, use a parameter -P

that qualifies the pattern as a Perl regex and uses the lookahead assertion.

grep -P '^(?=.{12}$)c.*er$'

      

Perfect demonstration

+2


source


You can awk

alternatively use and avoid this calculation:



awk -v len=12 'length($1)==len && $1 ~ /^c.*?er$/' file

      

0


source


I don't know grep

that well, but some more advanced NEX RegEx implementations provide you with views and lookbehinds. If you can find any ways to make them available to you, you can write:

^(?=c).{12}(?<=er)$

Perhaps as a perl

one-liner?

cat /usr/share/dict/words | perl -ne "print if m/^(?=c).{12}(?<=er)$/"

0


source


One approach with GNU sed

:

$ sed -nr '/^.{12}$/{/^c.*er$/p}' words

      

With BSD sed

(Mac OS) this would be:

$ sed -nE '/^.{12}$/{/^c.*er$/p;}' words

      

0


source







All Articles