Sunday, April 22, 2007

The Semantic Web: Global Agreements? Or Symbolic Theft and the Ubiquitous Copying of URI 

There is still controversy, but most agree that big globally accepted ontologies are not likely in the near future, if ever. Critics of the Semantic Web (upper case) say such ontologies are required and that this will be the downfall of the technology, since the required global agreements will never happen in the real, factious world. Semantic Web proponents come back and say this criticism is due to a misunderstanding of the way the technology is designed to work. They claim the semantic web (lower case) is designed for the organic growth of a beautiful fractal tangle of interlocking local ontologies. But proponents usually finish with a mumbled bit about agreement, 'there must be some", if just to avoid a cacophony of chaotic islands of solipsisms. So what gives? What sort of agreement, if any, is going to be necessary, accepting that it will not be the one big, global, uber-ontology?

In Symbol Grounding and the Symbolic Theft Hypothesis the authors, Angelo Cangelosi, Alberto Greco and Stevan Harnad, write, "There are two opposite ways of acquiring categories. First, we can use “sensorimotor toil”, in which new categories are acquired through real-time, feedback-corrected, trial and error experience. Secondly, we can use “symbolic theft”, in which new categories are acquired through language, based on hearsay from propositions (e.g., through boolean combinations of symbols describing them). In competition, symbolic theft always outperforms sensorimotor toil. It is more efficient than toil because only one propositional description of a new category is enough to learn it. In contrast, repeated experience is required to learn a category by sensorimotor toil. Due to this significant advantage, it has been hypothesized that symbolic theft is the basis of the adaptive advantage of language (Harnad, 1996). However, some basic categories must still be learned by toil to avoid an infinite regress in the symbol grounding problem."

I believe such symbolic theft (which is a victimless 'crime') is a critical part of the semantic web. The reason is that most of the semantics in the semantic web comes from natural languages like English. Most URIs have recognizable English words embedded in them and that gives them the same shared semantics as the English words themselves. In other words, most URI ride on the back of the common knowledge of natural language words for their meaning. There is no need to come to agreement about such a URI, not because common knowledge is not necessary, but because it already exists about the embedded natural language words that compose the URI. In short, there is no need for a globally agreed upon ontology because the semantic web is grounded in natural language, which is already a widely accepted, if not a globally agreed upon, uber-something, with ontological elements. So I propose ubiquitous copying, or symbolic theft, of word meaning is the actual route to common knowledge, rather than global agreement.

But there is another way that symbolic theft can serve the semantic web. That is through the copying and reuse of URI. The more a URI is copied and reused, the more common knowledge of it accumulates in the communities that use it. If this is true, then the semantic web depends on programmers, database designers, and authors getting into the habit of selecting their URIs for things from a common naming source - the web. And they will do such stealing when it becomes easier and cheaper to copy than it is to invent new terms. When all the creators of data copy the same name for an element up front, there will be no need to translate, merge, integrate, or otherwise massage data to work together. Like in natural languages, these copied terms will form the very essence of communication. As the semantic web really takes off, programmers should and will be able to steal variable and data element names, relational database table and column names, xml schema element names, object attribute names, and more from the web. This will be something available in Eclipse and in other editors where developers actually need to create the names of data elements.

But what about the Symbol Grounding Problem? It is real. It is what makes computers spit out errors so preposterous that even most three year olds could never make them. In persons, it is what causes superstitions, prejudice, and the madness of crowds. In societies, it leads to nationalism, ethnic cleansing, and genocide. Ungrounded symbols run amok, in humans as well as machines. For humans, the main antidote is science and the rational investigation of experience, i.e., sensorimotor toil. Is there a solution to the symbol grounding problem for computing machinery that can be used in the semantic web? That is one of the continuing topics of this blog.

Labels: , , ,


1 Comments:

Hi,
in a related paper [1] we have analyzed whether Wikipedia works as a consensus mechanism for URIs - and, surprisingly, our sample shows clearly that Wikipedia URIs are authoritative, i.e., almost always reflect the same meaning.

Martin
http://www.heppnetz.de

[1] Harvesting Wiki Consensus - Using Wikipedia Entries as Ontology Elements

PDF and Citation details are at

http://www.heppnetz.de/publications.htm#37

or

http://www.heppnetz.de/files/SemWiki2006-Harvesting%20Wiki%20Consensus-LNCS-final.pdf

By Martin Hepp, at 12:05 AM  

Post a Comment

Author
Backmatter
Archives

Atom RSS feed for this site

RSS 1.0 (rdf) feed for this site

This page is powered by Blogger. Isn't yours?

Prior Art

Socrates

"We are in the habit, I take it, of positing a single idea or form in the case of the various multiplicities to which we give the same name. Do you not understand?” “I do.” “In the present case, then, let us take any multiplicity you please; for example, there are many couches and tables.” “Of course.” “But these utensils imply, I suppose, only two ideas or forms, one of a couch and one of a table.” “Yes.” “And are we not also in the habit of saying that the craftsman who produces either of them fixes his eyes on the idea or form, and so makes in the one case the couches and in the other the tables that we use, and similarly of other things? For surely no craftsman makes the idea itself. How could he?” “By no means.”
Plato, Republic X, page 596a


David Hume

"This convention is not of the nature of a promise: For even promises themselves, as we shall see afterwards, arise from human conventions. It is only a general sense of common interest; which sense all the members of the society express to one another, and which induces them to regulate their conduct by certain rules. I observe, that it will be for my interest to leave another in the possession of his goods, provided he will act in the same manner with regard to me. He is sensible of a like interest in the regulation of his conduct. When this common sense of interest is mutually expressed, and is known to both, it produces a suitable resolution and behaviour. And this may properly enough be called a convention or agreement betwixt us, though without the interposition of a promise; since the actions of each of us have a reference to those of the other, and are performed upon the supposition, that something is to be performed on the other part. Two men, who pull the oars of a boat, do it by an agreement or convention, though they have never given promises to each other. Nor is the rule concerning the stability of possession the less derived from human conventions, that it arises gradually, and acquires force by a slow progression, and. by our repeated experience of the inconveniences of transgressing it. On the contrary, this experience assures us still more, that the sense of interest has become common to all our fellows, and gives us a confidence of the future regularity of their conduct: And it is only on the expectation of this, that our moderation and abstinence are founded. In like manner are languages gradually established by human conventions without any promise. ..." - A Treatise of Human Nature, Chapter 74, 1739–40 by David Hume


John Locke

"...Semeiotike, or the doctrine of signs; the most usual whereof being words, it is aptly enough termed also Logike, logic: the business whereof is to consider the nature of signs, the mind makes use of for the understanding of things, or conveying its knowledge to others. For, since the things the mind contemplates are none of them, besides itself, present to the understanding, it is necessary that something else, as a sign or representation of the thing it considers, should be present to it: and these are ideas. And because the scene of ideas that makes one man's thoughts cannot be laid open to the immediate view of another, nor laid up anywhere but in the memory, a no very sure repository: therefore to communicate our thoughts to one another, as well as record them for our own use, signs of our ideas are also necessary: those which men have found most convenient, and therefore generally make use of, are articulate sounds. The consideration, then, of ideas and words as the great instruments of knowledge, makes no despicable part of their contemplation who would take a view of human knowledge in the whole extent of it. And perhaps if they were distinctly weighed, and duly considered, they would afford us another sort of logic and critic, than what we have been hitherto acquainted with." - AN ESSAY CONCERNING HUMAN UNDERSTANDING by John Locke 1690