Regex for internet media match and type checking?

I want to validate input of internet types through my API.

Does regex help you to write?

Examples of types below http://en.wikipedia.org/wiki/Internet_media_type

application/atom+xml
application/EDI-X12
application/xml-dtd
application/zip
application/vnd.openxmlformats-officedocument.presentationml.presentation
video/quicktime

      

Must comply with the standard:

type / media type name [+suffix]

      

thank

+3


source to share


3 answers


It's very simple:

\w+/[-+.\w]+

Demo: http://regex101.com/r/oH5bS7/1



And if you want to check at most one +

:

\w+/[-.\w]+(?:\+[-.\w]+)?

+2


source


I recently had to check media types a little more strictly than the existing answers. Here's what I came up with based on the intersection of the grammar from RFC 2045 Section 5.1 and RFC 7231 Section 3.1.1.1 (which prohibits {}

in tokens and whitespace except between parameters). For a C-like language with (?:)

non-capturing groups:

ows = "[ \t]*";
token = "[0-9A-Za-z!#$%&'*+.^_`|~-]+";
quotedString = "\"(?:[^\"\\\\]|\\.)*\"";
type = "(application|audio|font|example|image|message|model|multipart|text|video|x-(?:" + token + "))";
parameter = ";" + ows + token + "=" + "(?:" + token + "|" + quotedString + ")";
mediaType = type + "/" + "(" + token + ")((?:" + ows + parameter + ")*)";

      

It ends up pretty monstrous

"(application|audio|font|example|image|message|model|multipart|text|video|x-(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+))/([0-9A-Za-z!#$%&'*+.^_`|~-]+)((?:[ \t]*;[ \t]*[0-9A-Za-z!#$%&'*+.^_`|~-]+=(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+|\"(?:[^\"\\\\]|\\.)*\"))*)"

      



which captures the type, subtype and parameters, or just

"(application|audio|font|example|image|message|model|multipart|text|video|x-(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+))/([0-9A-Za-z!#$%&'*+.^_`|~-]+)"

      

omission of parameters. Note that they could be made simpler (and less restrictive) by allowing any token

for type

(as RFC 7231 does) rather than restricting "application", "sound", etc.

In practice, you can further restrict the input of IANA Registered Media Types or mailcap or specific types appropriate for your application based on intended use.

+2


source


More general regex with parameter support:

(?P<main>\w+|\*)/(?P<sub>\w+|\*)(\s*;\s*(?P<param>\w+)=\s*=\s*(?P<val>\S+))?

      

Demo: http://regex101.com/r/lQ3rX4/2

+1


source







All Articles