Thursday, July 13, 2006

Problems Identifying Information 

The W3C's Technical Architecture Group has published a document titled Architecture of the World Wide Web, Volume One which states that, "By design a URI identifies one resource. We do not limit the scope of what might be a resource. The term "resource" is used in a general sense for whatever might be identified by a URI. It is conventional on the hypertext Web to describe Web pages, images, product catalogs, etc. as 'resources'. The distinguishing characteristic of these resources is that all of their essential characteristics can be conveyed in a message. We identify this set as 'information resources.'"

Consider the following URI:
When put into a browser it returns the following "information resourse":

So what is the problem here? The problem is that, in spite of what is claimed in the Architecture document, the message and the information conveyed by this resource is ambiguous. It is dependent on the document on which it is displayed. It is context dependent. This is because the message it is intended to communicate is this, "To show readers that one has taken some care to create an interoperable Web page, a "W3C valid" badge may be displayed (here, the "valid XHTML 1.0" badge) on any page that validates." - W3C help page. So the information it conveys changes with each different page that it is displayed on. And the URI does nothing to identify which page it refers to.

The next problem with identifying information with this URI is that the information conveyed by the message depends on the knowledge of the recipient. To show this, consider what is conveyed by an instance of this badge.
<- Here, the first time you see it.

<- How about this second one?

<- a third one.

<- a fourth one.

The point is, once you see the first instance, you know that the author claims it is valid XHTML. That is a significant piece of information. The next time it shows, however, it doesn't contain any new information, or very little. You already know the author claims it is valid XHTML. By the third and fourth time, it is virtually useless. So which information does it identify?


Where to begin …

First, the act of embedding the image into a web page certainly does not constitute any claim. “The W3C clearly needs to hire a real designer – just look at these terrible badges that they produce: …” Did I just claim that the web page is valid XHTML?

Second, even if it was a claim, then the meaning is not in the URI alone, but arises from the *act of embedding* (in RDF speak, from making a statement ":page123 :embeds :w3c_badge"). Of course, the meaning of any statement depends on the subject. This doesn't make the property-value pair ambiguous.

Third, the URI identifies *just an image*. When we talk about the meaning or interpretation of a URI, then we talk about the part of the meaning or interpretation which is machine-accessible. An image is completely opaque to a machine.

John, I'm not convinced at all.

By Anonymous Richard Cyganiak, at 5:35 PM  

Your first objection to my argument is that the URL does not make a claim. And you show how someone may just mention it, in your example, by commenting on the quality of the design of the image that it returns. But any identifier, word, phrase, sentence, etc. can be mentioned, and in that context, it doesn't have the effect that it would in an ordinary context. For example, I could say, "The URL to this post of yours has 70 characters in it" And in that context, it is a character string and doesn't identify your post. Does that mean in ordinary use it does not identify your post? Of course not. Nor does your mention of the poor design of the badge that is returned when you access the xhtml-valid URL change its referent in ordinary use. Any URL can be mentioned. That doesn't render it useless.

Your third objection is that the URI identifies "*just an image*" (your asterisks) not a claim. Why then is it named The URI minted by the W3C is named "valid-xhtml10" for a reason. If it was just an image, they might as well have named it "image-123". They named it "valid-xhtml10" because they intended for it to be used to make a specific claim, namely that the page on which it appears uses xhtml that validates. Secondly, a machine can use (interpret, understand) the URI, "". It can be programmed conditionally based on finding that URI embedded in a page. For example, it may try to parse it as XML rather than use some lower level screen scrapping technique. The image is not the meaning of the URI, neither to a human nor to a machine. That URL's proper interpretation is spelled out quite clearly by the creators of the URI in their help document I quoted, "To show readers that one has taken some care to create an interoperable Web page, a "W3C valid" badge may be displayed (here, the "valid XHTML 1.0" badge) on any page that validates." The image is for easy recognition by a human being, it is the URI that matters.

I think your second objection is the most interesting. Recall that it is your statement, "A URI should be context-free and should identify the same thing wether it'’s in an RDF file or a database or an email message." that I am arguing with to begin with. Now in your comment you say, "Second, even if it was a claim, then the meaning is not in the URI alone, but arises from the *act of embedding* (in RDF speak, from making a statement ':page123 :embeds :w3c_badge'). Of course, the meaning of any statement depends on the subject. This doesn't make the property-value pair ambiguous." In other words, first you say "A URI should be context-free..." and then you say, "...the meaning is not in the URI alone..." which two statements seem contradictory. Is is exactly my point that the meaning of the URI depends on the subject with which it is used, and so it is not context-free. You cannot interpret the referent of the URI without knowing the context of its use. You cannot determine which author claims which page is valid unless you know the context in which that URI was embedded. Without that context, it doesn't refer to or identify anything.

By Blogger John Black, at 4:43 PM  

