Regular expression to exclude certain file extensions

similar questions were asked but they missed one thing I need to do and I can't figure it out.

I need to find all files that do not have either a tif or a tiff extension, but I need to find all the others, including those that do not have an extension. I got the first part of working with regex below, but that doesn't match files without extension.

^(.+)\.(?!tif$|tiff$).+$

      

This works great, but I need the following to work.

filename.ext MATCH
filename.abc MATCH
filename.tif FAIL
filename     MATCH

      

Thank:)

+3


source to share


5 answers


If you are not working with JS / ECMAscript regex you can use:



^.*(?<!\.tif)(?<!\.tiff)$

      

+1


source


This works for me:

^(?:(.+\.)((?!tif$|tiff$)[^.]*)|[^.]+)$

      

This regex is split into two different parts:



Part 1: (.+)\.((?!tif$|tiff$)[^.]*)

  • (.+)

    (first capture group) Match filename (potentially containing dots)
  • \.

    Match the last point of the line (before the extension).
  • ((?!tif$|tiff$)[^.]*)

    (second capture group) Then check if the point does not exactly match "tif" or "tiff" and if it matches the extension.

Part 2: [^.]+

If Part 1 doesn't match, check if you only have a filename that does not contain a period.

0


source


Instead of writing a negative regex, consider using a simpler positive regex, but take action when something doesn't match. This is often a great approach.

It cannot be used in every situation (for example, if you are using a command line tool that requires you to specify which matches), but I would do so whenever possible.

0


source


If you have multiple lines in a text file (with a newline character) :

perl -lne '/(?:tiff?)/ || print' file  

      

If you have files in a directory:

ls | perl -lne '/(?:tiff?)/ || print'  

      

Screen:

enter image description here

0


source


This is what I came up with:

^[^\.\s]+(\.|\s)(?!tiff?)

      

Explanation:

Beginning of a line with a dot or space, place the appropriate group around it, that is:

^(?<result>[^\.\s]+)

      

Then it looks for a period or a space, with a negative look at tiff (tiff? Will match both tif and tiff).

This makes the assumption that there will always be a period or space after the filename. You can change this as the end of the line if that's what you need:

^[^\.\s]+(\.(?!tiff?)|\n)   linux
^[^\.\s]+(\.(?!tiff?)|\r\n) windows

      

0


source







All Articles