Regex for internet media match and type checking?
I want to validate input of internet types through my API.
Does regex help you to write?
Examples of types below http://en.wikipedia.org/wiki/Internet_media_type
application/atom+xml application/EDI-X12 application/xml-dtd application/zip application/vnd.openxmlformats-officedocument.presentationml.presentation video/quicktime
Must comply with the standard:
type / media type name [+suffix]
thank
source to share
It's very simple:
\w+/[-+.\w]+
Demo: http://regex101.com/r/oH5bS7/1
And if you want to check at most one +
:
\w+/[-.\w]+(?:\+[-.\w]+)?
source to share
I recently had to check media types a little more strictly than the existing answers. Here's what I came up with based on the intersection of the grammar from RFC 2045 Section 5.1 and RFC 7231 Section 3.1.1.1 (which prohibits {}
in tokens and whitespace except between parameters). For a C-like language with (?:)
non-capturing groups:
ows = "[ \t]*";
token = "[0-9A-Za-z!#$%&'*+.^_`|~-]+";
quotedString = "\"(?:[^\"\\\\]|\\.)*\"";
type = "(application|audio|font|example|image|message|model|multipart|text|video|x-(?:" + token + "))";
parameter = ";" + ows + token + "=" + "(?:" + token + "|" + quotedString + ")";
mediaType = type + "/" + "(" + token + ")((?:" + ows + parameter + ")*)";
It ends up pretty monstrous
"(application|audio|font|example|image|message|model|multipart|text|video|x-(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+))/([0-9A-Za-z!#$%&'*+.^_`|~-]+)((?:[ \t]*;[ \t]*[0-9A-Za-z!#$%&'*+.^_`|~-]+=(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+|\"(?:[^\"\\\\]|\\.)*\"))*)"
which captures the type, subtype and parameters, or just
"(application|audio|font|example|image|message|model|multipart|text|video|x-(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+))/([0-9A-Za-z!#$%&'*+.^_`|~-]+)"
omission of parameters. Note that they could be made simpler (and less restrictive) by allowing any token
for type
(as RFC 7231 does) rather than restricting "application", "sound", etc.
In practice, you can further restrict the input of IANA Registered Media Types or mailcap or specific types appropriate for your application based on intended use.
source to share