Regular Expressions: Differences Between Browsers

I am increasingly realizing that there must be significant differences in the way browsers interpret regular expressions.
For example, a coworker wrote this regex to confirm that the download has a PDF extension:

^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))(.pdf)$

      

This works in Internet Explorer and Google Chrome, but does NOT work in Firefox. The test always fails, even for a real PDF. So I figured the extra stuff was irrelevant and would simplify it:

^.+\.pdf$

      

and now it works fine in Firefox and also keeps working in IE and Chrome.
Is this a quirk specific to the asp: FileUpload and RegularExpressionValidator elements in ASP.NET, or simply because of different browsers supporting regex in different ways? In any case, which of the latter did you encounter?

+1


source to share


6 answers


As far as I know firefox does not allow you to have the full download path. In this case, the interpretation of the regular expressions seems irrelevant. I have yet to see the difference between modern browsers when doing regular expressions.



+3


source


Regarding the actual question, the original regex requires the value to start with a drive letter or UNC device name. It is possible that Firefox simply does not include this with the filename. Also note that if you intend to be cross-platform, this regex will fail on any non-Windows system, regardless of the browser, since they don't use drive letters or UNC paths. Your simplified regex ("accept anything if it ends in .pdf") usually has the meaning of checking the filename as you are going to get it.

However, Jonathan's comment on the first question cannot be overstated. Never, ever, ever trust a filename as an adequate means of identifying its contents. Or a MIME type, for that matter. The client software talking to your web server (which may not even be a browser) can lie to you about everything and you will never know unless you check it. In this case, it means feeding the resulting file into some code that understands the PDF format, and that code tells you if it is a valid PDF or not. Filename validation can help prevent attempts to present explicitly invalid files, but it is not a sufficient test of the resulting files.



(I understand that you may be aware of the need for additional verification, but the next person who has a similar situation and finds your question may not be.)

+4


source


If you are using javascript without adding regex with slashes, an error occurs in Firefox.

Try to run var regex = /^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))(.pdf)$/;

+1


source


As Dave pointed out, Firefox doesn't give a path, just a filename. Also, as he noted, it does not account for differences between operating systems. I think it's better to check what you could do is check if the filename ends with PDF. In addition, it does not guarantee that it is a valid PDF, so the filename ends in PDF. Depending on your needs, you can check if it is actually a PDF by checking the content.

+1


source


I didn't notice any difference between browsers regarding the template syntax. However, I noticed a difference between C # and Javascript, as the C # implementation allows links to be returned while the Javascript implementation does not.

0


source


I believe JavaScript REs are defined by the ECMA standard and I doubt there are many differences between JS interpreters. I have not found any in my programs or seen in the article.

Your post is actually a little confusing since you are throwing ASP stuff in there. I don't understand how you conclude that this is a browser bug when you talk about server side technology or generated code. In fact, we don't even know if you're talking about JS in the browser, validating the download field (you can't do that anymore, at least in an easy way with FF3) or server-side (neither FF, nor Opera, nor Safari load the full path of the downloaded file. I'm surprised to know that Chrome really loves IE ...).

0


source







All Articles