Semicolon in url as query string separator

I keep hearing that the W3C recommends using ";" instead of "&" as the query string separator.

We recommend that HTTP server developers, and in particular CGI developers, support the use of ";" instead of "&" keep authors problem of escaping "&" characters that way.

Can someone explain why ";" recommended instead of "&"?

Also, I tried using ";"

instead "&"

. (example:) .com?str1=val1;str2=val2

. When reading as Request.QueryString["str1"]

I get " val1;str2=val2

". So if recommended ";"

, how do we read query strings?

+3


source to share


2 answers


As the linked document says, ;

it is recommended by &

because

using the "&" character to separate form fields interacts with its use in SGML attribute values ​​to delimit character object references.

For example, say your url should be ...?q1=v1&q2=v2



There is nothing wrong with that &

. But if you want to put this request in an HTML attribute <a href="...?q1=v1&q2=v2">

, it breaks because inside the HTML attribute it &

represents the start of a character entity. You need to avoid &

both &amp;

giving <a href="...?q1=v1&amp;q2=v2">

and it would be easier if you didn't have to.

;

does not overload, like it is at all; you can put it in an HTML attribute and not worry about it. Thus, it would be much easier if the servers recognized ;

as query parameter separator.

However, looking at things (based on your experiment) ASP.Net doesn't recognize it as such. How do I get it? I'm not sure if you can.

+2


source


In short, HTML is a big mess (due to its condescension) and the use of semicolons helps simplify this LOT.

To use semicolons as a separator, I don't know if .NET allows this setting, or if developers need to write their own methods to handle QueryString..NET gives us access to the raw QueryString and we can work with it from there. This is what I did. I wrote my own methods that weren't overly complicated, but it took a lot of testing and debugging, some of which were a Microsoft bug, not even complying with web standards when dealing with surrogate pairs. I have verified that my implementation works with the full Unicode character set, including the multilingual plane (for example, for Chinese and Japanese characters, etc.).

Before adding my own findings, I want to also validate and include more information about what Rawling, Jeevan, and BeniBela all pointed out in Rowling's answer and their comments on that answer: it's wrong in HTML that they don't avoid them, but that's usually works, but only because parsers are so tolerant. With that, I'll also explain why it can lead to errors with such incorrect encoding (which probably most developers fall victim to).

You can't depend on this indulgence of incorrectly encoding ampersands in QueryStrings, and sometimes this indulgence leads to nasty bugs. Let's say for example QueryString is passing in an arbitrary ASCII string (or user input) and they are not encoded correctly. Then 'amp;' which follows the "&" is decoded, and the unexpected consequence is that 'amp;' essentially "swallowed". (When swallowed, I mean it gets "eaten" or it goes missing.) A practical use case is when the user is prompted for an input that goes into a database and the user enters HTML (for example, here on StackOverflow), but since it is not placed correctly, then nasty errors appear.

The real advantage is ';' delimiter in simplicity: Correctly encoding ampersands and delimited QueryStrings takes two steps of complicating URL strings in an HTML page (and in XML too). The first shud keys and values ​​have to be URL encoded and then all concatenated, and then the entire QueryString or URL will be HTML encoded (or for XML, encoded with a very similar encoding for HTML encoding). Also don't forget that the coding process for HTML coding and URL coding is different and it is important that they are different. The developer must be careful between the two. And since they are similar, it's not uncommon to see beginners mix them up.

A good example of a potential problematic URL is passing two names / values ​​to a QueryString:

  • a = 'me and you' and
  • b = 'you and me'.


Here, using '&' as separator, then "? A = me +% 26 + you & b = you +% 26 + me" is the correct query sequence, but it is also HTML encoded before being written into the source code HTML. It is important to be error free. Most developers ignore this two-step process of first URL code. Key and value encoding and then HTML encoding the full URL in the HTML source. No wonder why, when I had to sit down and seriously think about this process and carefully check my findings. Image when the meaning of the name is "year = aΓ±o" or is much more complex when we need Chinese or Japanese characters that use surrogate pairs to represent them!

For the same key-value pairs for a and b when using ';' as a separator, the process is much easier. In fact, the ampersand separator makes the process more than twice as difficult as using the semicolon separator! Here is the same information provided using ';' as separator: '? a = me +% 26 + you; b = you +% 26 + me '. We notice that the only difference is that there is no '&' in the string. But using this ';' separator means no second HTML process that encodes a URL or QueryString is required. Now imagine I am writing HTML and wanted correct HTML and you need to write HTML to explain all this! All this HTML coding with & does add a lot of complications (and for many developers, quite a lot of confusion too).

Novice wud developers simply don't HTML encode the QueryString or URL, which is CORRECT when; is a separator. But it leaves room for error when the ampersand is incorrectly encoded. So the "someText = blah & blah" wud needs to be encoded correctly.

Also in .NET we can write XML documentation for our methods. Well, just today I wrote a little explanation that used the example "a = me +% 26 + you & b = you +% 26 + me". And in my XML I had to manually enter all these amp; character entities for XML. In the XML documentation, this is picky, so you need to encode the ampersands correctly. But forgiveness in HTML adds ambiguity.

Perhaps it wasn't too weird. But all the confusion or difficulty has to do with the use of a character that is HTML encoded as a separator, thus '&' is the culprit. And the semicolon removes all this complication.

One final consideration: As much as the "&" separator makes this process more complicated, it should come as no surprise to me why Microsoft's implementation of surrogate pairs in QueryStrings still doesn't follow the official specs. And if you write your own methods, you MUST be aware of Microsoft's misuse of percent-encoded surrogate pairs. Official specifications prohibit percentage encoding of surrogate pairs in UTF-8. So, anyone writing their own methods that also handle the full range of Unicode characters, beware of this.

+1


source







All Articles