Regex matches zip code without punctuation
I have a file with a bunch of different zip codes:
12345
12345-6789
1234567890
12345:6789
12345-7890
12:1234678
I only want to match codes that are formatted 12345
or 12345-6789
, but ignore all other forms.
I have my regex as:
grep -E '\<[0-9]{5}\>[^[:punct:]]|\<[0-9]{5}\>-[0-9]{4}' samplefile
It matches 12345-6789
because the sentence "or" matches that particular one. I'm confused as to why it won't match the first 12345
, since my expression should say "match for 5 numbers, but ignore any punctuation."
source to share
An expression that matches your desired output:
egrep "^[0-9]{5}([-][0-9]{4})?$" samplefile
Breakdown of expression:
^[0-9]{5}
- Find a line starting with 5 digits. ^
means the beginning of a line, and [0-9]{5}
means exactly five digits from zero to nine.
([-][0-9]{4})?$
- May end with a dash and four digits, or nothing at all. ()
groups expressions together, [-]
represents a hatch character, [0-9]{4}
represents exactly four digits from zero to nine, ?
indicates that the grouped expression either exists entirely or does not exist, and $
denotes the end of a line.
test.dat
12345
12345-6789
1234567890
12345:6789
12345-7890
12:1234678
Running an expression on test data:
mike@test:~$ egrep "^[0-9]{5}([-][0-9]{4})?$" test.dat
12345
12345-6789
12345-7890
Additional information: grep -E
can alternatively be written as egrep
. This also works for grep -F
which is the same as fgrep
and grep -r
which is the same as rgrep
.