Regular expression to exclude certain file extensions
similar questions were asked but they missed one thing I need to do and I can't figure it out.
I need to find all files that do not have either a tif or a tiff extension, but I need to find all the others, including those that do not have an extension. I got the first part of working with regex below, but that doesn't match files without extension.
^(.+)\.(?!tif$|tiff$).+$
This works great, but I need the following to work.
filename.ext MATCH
filename.abc MATCH
filename.tif FAIL
filename MATCH
Thank:)
source to share
This works for me:
^(?:(.+\.)((?!tif$|tiff$)[^.]*)|[^.]+)$
This regex is split into two different parts:
Part 1: (.+)\.((?!tif$|tiff$)[^.]*)
-
(.+)
(first capture group) Match filename (potentially containing dots) -
\.
Match the last point of the line (before the extension). -
((?!tif$|tiff$)[^.]*)
(second capture group) Then check if the point does not exactly match "tif" or "tiff" and if it matches the extension.
Part 2: [^.]+
If Part 1 doesn't match, check if you only have a filename that does not contain a period.
source to share
Instead of writing a negative regex, consider using a simpler positive regex, but take action when something doesn't match. This is often a great approach.
It cannot be used in every situation (for example, if you are using a command line tool that requires you to specify which matches), but I would do so whenever possible.
This is what I came up with:
^[^\.\s]+(\.|\s)(?!tiff?)
Explanation:
Beginning of a line with a dot or space, place the appropriate group around it, that is:
^(?<result>[^\.\s]+)
Then it looks for a period or a space, with a negative look at tiff (tiff? Will match both tif and tiff).
This makes the assumption that there will always be a period or space after the filename. You can change this as the end of the line if that's what you need:
^[^\.\s]+(\.(?!tiff?)|\n) linux
^[^\.\s]+(\.(?!tiff?)|\r\n) windows
source to share