Regex matches zip code without punctuation

I have a file with a bunch of different zip codes:

12345
12345-6789
1234567890
12345:6789
12345-7890
12:1234678

      

I only want to match codes that are formatted 12345

or 12345-6789

, but ignore all other forms.

I have my regex as:

grep -E '\<[0-9]{5}\>[^[:punct:]]|\<[0-9]{5}\>-[0-9]{4}' samplefile

It matches 12345-6789

because the sentence "or" matches that particular one. I'm confused as to why it won't match the first 12345

, since my expression should say "match for 5 numbers, but ignore any punctuation."

+3


source to share


2 answers


An expression that matches your desired output:

egrep "^[0-9]{5}([-][0-9]{4})?$" samplefile

      

Breakdown of expression:

^[0-9]{5}

- Find a line starting with 5 digits. ^

means the beginning of a line, and [0-9]{5}

means exactly five digits from zero to nine.

([-][0-9]{4})?$

- May end with a dash and four digits, or nothing at all. ()

groups expressions together, [-]

represents a hatch character, [0-9]{4}

represents exactly four digits from zero to nine, ?

indicates that the grouped expression either exists entirely or does not exist, and $

denotes the end of a line.



test.dat

12345
12345-6789
1234567890
12345:6789
12345-7890
12:1234678

      

Running an expression on test data:

mike@test:~$ egrep "^[0-9]{5}([-][0-9]{4})?$" test.dat 
12345
12345-6789
12345-7890

      

Additional information: grep -E

can alternatively be written as egrep

. This also works for grep -F

which is the same as fgrep

and grep -r

which is the same as rgrep

.

+9


source


It will not match "12345" but will match "12345a". The first sentence must end with a non-punctuation character as you wrote it.



Consider Mike's answer; it's clearer.

0


source







All Articles