Thursday, July 13, 2006

Problems Identifying Information 

The W3C's Technical Architecture Group has published a document titled Architecture of the World Wide Web, Volume One which states that, "By design a URI identifies one resource. We do not limit the scope of what might be a resource. The term "resource" is used in a general sense for whatever might be identified by a URI. It is conventional on the hypertext Web to describe Web pages, images, product catalogs, etc. as 'resources'. The distinguishing characteristic of these resources is that all of their essential characteristics can be conveyed in a message. We identify this set as 'information resources.'"

Consider the following URI:
When put into a browser it returns the following "information resourse":

So what is the problem here? The problem is that, in spite of what is claimed in the Architecture document, the message and the information conveyed by this resource is ambiguous. It is dependent on the document on which it is displayed. It is context dependent. This is because the message it is intended to communicate is this, "To show readers that one has taken some care to create an interoperable Web page, a "W3C valid" badge may be displayed (here, the "valid XHTML 1.0" badge) on any page that validates." - W3C help page. So the information it conveys changes with each different page that it is displayed on. And the URI does nothing to identify which page it refers to.

The next problem with identifying information with this URI is that the information conveyed by the message depends on the knowledge of the recipient. To show this, consider what is conveyed by an instance of this badge.
<- Here, the first time you see it.

<- How about this second one?

<- a third one.

<- a fourth one.

The point is, once you see the first instance, you know that the author claims it is valid XHTML. That is a significant piece of information. The next time it shows, however, it doesn't contain any new information, or very little. You already know the author claims it is valid XHTML. By the third and fourth time, it is virtually useless. So which information does it identify?


Where to begin …

First, the act of embedding the image into a web page certainly does not constitute any claim. “The W3C clearly needs to hire a real designer – just look at these terrible badges that they produce: …” Did I just claim that the web page is valid XHTML?

Second, even if it was a claim, then the meaning is not in the URI alone, but arises from the *act of embedding* (in RDF speak, from making a statement ":page123 :embeds :w3c_badge"). Of course, the meaning of any statement depends on the subject. This doesn't make the property-value pair ambiguous.

Third, the URI identifies *just an image*. When we talk about the meaning or interpretation of a URI, then we talk about the part of the meaning or interpretation which is machine-accessible. An image is completely opaque to a machine.

John, I'm not convinced at all.

By Anonymous Richard Cyganiak, at 5:35 PM  

Your first objection to my argument is that the URL does not make a claim. And you show how someone may just mention it, in your example, by commenting on the quality of the design of the image that it returns. But any identifier, word, phrase, sentence, etc. can be mentioned, and in that context, it doesn't have the effect that it would in an ordinary context. For example, I could say, "The URL to this post of yours has 70 characters in it" And in that context, it is a character string and doesn't identify your post. Does that mean in ordinary use it does not identify your post? Of course not. Nor does your mention of the poor design of the badge that is returned when you access the xhtml-valid URL change its referent in ordinary use. Any URL can be mentioned. That doesn't render it useless.

Your third objection is that the URI identifies "*just an image*" (your asterisks) not a claim. Why then is it named The URI minted by the W3C is named "valid-xhtml10" for a reason. If it was just an image, they might as well have named it "image-123". They named it "valid-xhtml10" because they intended for it to be used to make a specific claim, namely that the page on which it appears uses xhtml that validates. Secondly, a machine can use (interpret, understand) the URI, "". It can be programmed conditionally based on finding that URI embedded in a page. For example, it may try to parse it as XML rather than use some lower level screen scrapping technique. The image is not the meaning of the URI, neither to a human nor to a machine. That URL's proper interpretation is spelled out quite clearly by the creators of the URI in their help document I quoted, "To show readers that one has taken some care to create an interoperable Web page, a "W3C valid" badge may be displayed (here, the "valid XHTML 1.0" badge) on any page that validates." The image is for easy recognition by a human being, it is the URI that matters.

I think your second objection is the most interesting. Recall that it is your statement, "A URI should be context-free and should identify the same thing wether it'’s in an RDF file or a database or an email message." that I am arguing with to begin with. Now in your comment you say, "Second, even if it was a claim, then the meaning is not in the URI alone, but arises from the *act of embedding* (in RDF speak, from making a statement ':page123 :embeds :w3c_badge'). Of course, the meaning of any statement depends on the subject. This doesn't make the property-value pair ambiguous." In other words, first you say "A URI should be context-free..." and then you say, "...the meaning is not in the URI alone..." which two statements seem contradictory. Is is exactly my point that the meaning of the URI depends on the subject with which it is used, and so it is not context-free. You cannot interpret the referent of the URI without knowing the context of its use. You cannot determine which author claims which page is valid unless you know the context in which that URI was embedded. Without that context, it doesn't refer to or identify anything.

By Blogger John Black, at 4:43 PM  

Post a Comment


Atom RSS feed for this site

This page is powered by Blogger. Isn't yours?

Prior Art


"We are in the habit, I take it, of positing a single idea or form in the case of the various multiplicities to which we give the same name. Do you not understand?" "I do." "In the present case, then, let us take any multiplicity you please; for example, there are many couches and tables." "Of course." "But these utensils imply, I suppose, only two ideas or forms, one of a couch and one of a table." "Yes." "And are we not also in the habit of saying that the craftsman who produces either of them fixes his eyes on the idea or form, and so makes in the one case the couches and in the other the tables that we use, and similarly of other things? For surely no craftsman makes the idea itself. How could he?" "By no means."
Plato, Republic X, page 596a

David Hume

"This convention is not of the nature of a promise: For even promises themselves, as we shall see afterwards, arise from human conventions. It is only a general sense of common interest; which sense all the members of the society express to one another, and which induces them to regulate their conduct by certain rules. I observe, that it will be for my interest to leave another in the possession of his goods, provided he will act in the same manner with regard to me. He is sensible of a like interest in the regulation of his conduct. When this common sense of interest is mutually expressed, and is known to both, it produces a suitable resolution and behaviour. And this may properly enough be called a convention or agreement betwixt us, though without the interposition of a promise; since the actions of each of us have a reference to those of the other, and are performed upon the supposition, that something is to be performed on the other part. Two men, who pull the oars of a boat, do it by an agreement or convention, though they have never given promises to each other. Nor is the rule concerning the stability of possession the less derived from human conventions, that it arises gradually, and acquires force by a slow progression, and. by our repeated experience of the inconveniences of transgressing it. On the contrary, this experience assures us still more, that the sense of interest has become common to all our fellows, and gives us a confidence of the future regularity of their conduct: And it is only on the expectation of this, that our moderation and abstinence are founded. In like manner are languages gradually established by human conventions without any promise. ..." - A Treatise of Human Nature, Chapter 74 by David Hume

John Locke

"...Semeiotike, or the doctrine of signs; the most usual whereof being words, it is aptly enough termed also Logike, logic: the business whereof is to consider the nature of signs, the mind makes use of for the understanding of things, or conveying its knowledge to others. For, since the things the mind contemplates are none of them, besides itself, present to the understanding, it is necessary that something else, as a sign or representation of the thing it considers, should be present to it: and these are ideas. And because the scene of ideas that makes one man's thoughts cannot be laid open to the immediate view of another, nor laid up anywhere but in the memory, a no very sure repository: therefore to communicate our thoughts to one another, as well as record them for our own use, signs of our ideas are also necessary: those which men have found most convenient, and therefore generally make use of, are articulate sounds. The consideration, then, of ideas and words as the great instruments of knowledge, makes no despicable part of their contemplation who would take a view of human knowledge in the whole extent of it. And perhaps if they were distinctly weighed, and duly considered, they would afford us another sort of logic and critic, than what we have been hitherto acquainted with." - AN ESSAY CONCERNING HUMAN UNDERSTANDING by John Locke 1690