Should url be stored in encoded or decoded form?

My question is a little weird, but let me explain:

  • Assuming a valid URI does not allow unicode for RFC-2396, all unicode in the URI must be escaped using percent encoding.

  • A valid URL must be a valid URI, so we have to use http://example.com/%E4%BD%A0%E5%A5%BD

    instead http://example.com/你好

    when making a request, or put them in href

    (even though most browsers can handle the latter case).

  • In addition, we accept user URLs that are encoded as well (since browsers encode them when they copy the URL from the address bar).

  • So we made a decision (probably a mistake) to save them as http://example.com/%E4%BD%A0%E5%A5%BD

    , not http://example.com/你好

    after all, that is the original input and the correct url.

My question comes up when I try to display URLs like this, given that they are submitted by the user, I need to run an xss filter on this data. Some implementations, such as xss-filters , show up as encodeURI as part of the filter, i.e. %

will be double-encoded, for example. %E4

%25E4

, violates the url in the process.

So, are we supposed to store url in decoded form (although they are not valid)? It makes no sense for me to run decodeURI

on output ...

+3


source to share


2 answers


First, RFC 2396 is obsolete from RFC 3986 . Secondly, yes , you should store your URIs in decoded form if the storage engine allows it.

Update From Section 2.4



Under normal circumstances, the only time the octets in a URI are percent encoded is during the generation of the URI from its constituent parts.

Update 2 Also, the Unicode character string representing a URI is essentially an IRI. See RFC 3987

+2


source


Please note that https://url.spec.whatwg.org/#urls is what the URL defines. It replaces the RFCs you mentioned.

Ie, your premise is wrong, especially this section:



A valid URL must be a valid URI, so we have to use http://example.com/%E4%BD%A0%E5%A5%BD

instead http://example.com/你好

when making a request, or put them in an href (even though most browsers can handle the latter case).

Why do you say that? http://example.com/你好

is an absolutely correct URL.

+1


source







All Articles