IRI HTTPS preference on the Semantic Web

TL / DR . Why shouldn't we prefer IRI i https:

when defining new vocabularies for the Semantic Web?

the semantic web is built around the use of IRIs to identify various components, be they resources such as a web page or abstract concepts such as ownership. Every source I have consulted recommends using IRI http:

, for example:

This surprises me a little. The world seems to be moving away from HTTP to HTTPS, but I don't know the vocabulary that IRIs are used in https:

, and none of the documents above even discuss the issue. I can find a discussion about why ftp:

or urn:

are less appropriate, but nothing about https:

.

While IRIs on the Semantic Web are primarily identifiers and not locators, there is agreement that IRIs are a good place to look for more information about an entity, with various authorities recommending 303 redirects to documents such as RDF schemas or OWL or other descriptive documents with additional information. If the IRI is http:

at least the original request and redirection can be done over HTTP. Even though the contents of the schema are by no means confidential, they still have the following problems:

  • He is susceptible to man-in-the-middle attacks. A malicious party can inject intentionally inconsistent schema information that can influence decisions made by applications, potentially causing a DoS or otherwise disrupting the user experience.

  • ISPs can make MITM themselves add advertisements to content . They shouldn't actually do this for non-HTML content (well, they shouldn't be doing this at all, but that's a different matter), but it depends on how the ISP cared enough to get it right. This can happen over HTTPS, as Superfish has demonstrated , but it is much more complicated.

  • The request can be tracked by providers. The fact that a user is using an application that addresses a specific schema is in itself valuable customer information that can be sold to advertisers that the US Senate recently voted to make legal . People are becoming more and more inviolable to life and want to minimize this. Of course, the ISP still knows which domain you visited as the SNI field is not encrypted , but we can still minimize data leakage.

If the client supports it, HSTS can be used to provide subsequent access directly over HTTPS, but it does nothing about the original request still being made over HTTP. Attempts to put similar functionality in DNS so far have come nowhere, I suspect, in part due to the slow adoption of DNSSEC. I am not aware of any other technical measures that could alleviate the problems described above.

All these considerations suggest to me which https:

is a better choice than http:

when defining a new vocabulary. Obviously the situation is different if you have a pre-existing dictionary that is already in use http:

, but this is not the case that interests me here.

However, I'm sure I'm not the first person to think about this, so I can only think that everyone is still using and recommending http:

for some reason. If so, what are the disadvantages https:

? And can anyone point me to a good discussion of this? As far as I can see, the W3C doesn't change anything, which surprises me.

+3


source to share





All Articles