Sed cannot match 0 or 1 times
I am writing a bash script on Cent OS7. Now I need to use sed
to delete all lines that do not contain .jpg
or .jpeg
.
Here's my script:
sed -i -e '/\.jp(e)?g/!d' myfile
But it will delete all lines, which means it doesn't work as expected.
However, if I do sed -i -e '/\.jpg/!d' myfile
or sed -i -e '/\.jpeg/!d' myfile
. They both work well.
source to share
The captured group ( ()
) and quantifier ?
(matches the previous token 0 or 1 times) comes (at least) with an ERE (Extended RegEx), not a BRE (Basic RegEx).
sed
uses BRE by default, so tokens are treated literally.
To enable ERE use -E
(or -r
if available) with sed
:
sed -E '/\.jp(e)?g/!d' myfile
The capture e
is redundant here:
sed -E '/\.jpe?g/!d' myfile
Note that you can use ERE markers from BRE by escaping them with \
, so the following will work:
sed '/\.jp\(e\)\?g/!d' myfile
sed '/\.jpe\?g/!d' myfile
Again, this doesn't look as easy as just one option, i.e. -E
... The only case you want is portability.
source to share
This might work for you (GNU sed):
sed '/\.jp\(e\|\)g/!d' file
Use alternation when one of the surrogates is empty.
It might be easier to see if there are alternatives:
sed '/\.jpeg\|\.jpg/!d' file
However, as already said, use ?
:
sed '/\.jpe\?g/!d' file
NB *
is zero or more, i.e.
sed '/\.jpe*g/!d' file
will allow .jpeeeeeeeeeeeeeeeeg
source to share