Sunday, November 27, 2005

The Semantic Web as a Webized Database 

The comparison between database technology and semantic web technology is both common and enlightening.

Tim Berners-Lee touched on the topic in 1998, in his Webizing Existing Systems. In my opinion, Tim's thought experiment on webizing databases suffers from some an unwarranted assumption that renders it suspect. His example of creating namespaces to qualify the table and column names assumes two shared, global ontologies, namely "http://weather.org/current" and "http://places.org/usa". But this completely sidesteps a crucial problem, almost all databases are designed without regard for any global or shared schema. There is no market of standard database schemas, or shared schema templates. Most databases are worlds unto themselves and are tightly coupled to the applications, developers, and users that make use of them. It is far more likely that when webizing an existing database, the namespace chosen would have to be based on the application context in which it was created in. This is in fact what happens in modern enterprises and gives rise to the huge industry surrounding the problem of EAI (Enterprise Application Integration). More likely namespaces required when webizing actual existing databases would be something like, "http://www.accuweather.com" and http://www.mapquest.com/usa". This is due to the likelihood of semantic collisions in any two database schemas about the same domain created for any two different applications.

There are other differences. John Mylopoulos made an interesting presentation titled, "Data Semantics Revisited: Databases and the Semantic Web" on this topic at SEBD'05, June 19, 2005. As John points out, modern database are built so that the data semantics are factored out, "That is, the database system processes at run-time tables with no regard for their meaning." He goes on to say that the effectiveness such systems relies, "...on a stable environment of users and application programs to know the semantics!". Again this works in cases where the database is used in a single application designed as a world unto itself. But "Factoring out the semantics of data won't work in ever-changing, distributed, open environments, such as the web. In such settings, access to the data can not be restricted to a small set of users and applications programs. This means that new users will have no clue what the data means; and the application programs that process the data may not have been designed specifically for these data." The solution is what the semantic web tries to do, "...bundle together data and its semantics!" Unfortunately, John's discussion appears to skip a step here. When describing current database semantics, he implicitly acknowledges the importance of the user and applications to act as the semantic agents of the database. But when he brings in semantic web technologies, he seems to arrive at the conclusion that these technologies can replace what users and applications were doing in the case of database semantics. He goes on to say semantic web technologies create a new environment where "Machine-processable web data has come to mean '...having semantic metadata and ontologies for web content to enable information access, integration, interoperation and consistency...' - Katia Sycara ODBASE'03". But I do not see how metadata and ontologies become global and shared where the former database schemas were not. As the webizing exercise shows, it is likely that ontologies, when forced to step up to the global stage, will either need to be created by standards groups in advance of use, or else they will need to be qualified with namespaces that are based on local, application oriented contexts. Just as with databases, the tendency for data design to focus on local application oriented contexts is due to the ease with data models can be designed using centralized control.

Recently Adam Bosworth made related comments in "Learning from the Web: What Does This Mean for Databases?" Adam draws more distinctions between databases and the web. Among them are that, on the web, as opposed to database design, lesson "4. The wisdom of crowds works amazingly well. Successful systems on the Web are bottom-up. They don'’t mandate much in a top-down way. Instead, they control themselves through tipping points." RDF and other semantic web technologies are supposed to work this way. But as far as I know, the semantic web standards don't contain any explanation about how you reach a tipping point for a particular ontology. "4. Do databases let schemas evolve for a set of items using a bottom-up consensus/tipping point? Obviously not. They typically are extremely rigid about the schema. This is regarded as a feature. Do databases let users “'tag'” data? Not at all easily...".

On the other hand, as one commenter, David vun Kannon, says, "The well documented loss of the Mars Observer (use of English vs metric units at two NASA labs exchanging data) testifies that there are places where we can't wait for tipping points and the wisdom of the masses, we have to agree from the beginning about what our terms mean." I can see his point. And come to think of it, I don't think I would want to fly in an airplane if the database schema Boeing used to design and construct jets was based on the 'the wisdom of crowds' operating through 'tipping points'. For the purpose of building jets, I think I would still prefer a database schema that specified the meaning of its values in a rigid manner and was created and maintained by a centralized authority.

The comparison of semantic web and database technologies brings up this question in my mind. In the absense of the close coupling of designers, developers, users, and applications that is found in successful database implementations, what do the semantic web technologies offer in the way of establishing a shared view of the corespondence between the data and real world?

2 Comments:

*gasp*

Mr. Black, such a nice piece of writing ... and such a fine question, ". In the absense of the close coupling [...] what do the semantic web technologies offer in the way of ..."

And no comments?

This is regretable. Talk about "absence of coupling"!

Alas ...

By Ben, at 9:21 PM  

Part of the solution to your question comes from understanding that Language is a Virus, or rather a biological category, that it reproduces itself, that structures are formed through the interaction of consumers and producers.

You need URIs as the semantic web presents them to ground your vocabulary, to create a chain of responsibility, to help find out the official meaning of the words used. Well you could use natural language processing, but that is certainly not going to make the work of making it machine processeable easier. Or else we the semantic web of distributed data would allready be here.

Another thing to understand is how URLs enable us to get hyperdata. It's the linking of data that is completely novel.


There is nothing like trying and playing with this to get an understanding of it btw.

By bblfish, at 7:14 AM  

Post a Comment

Author
Backmatter
Archives

Atom RSS feed for this site

RSS 1.0 (rdf) feed for this site

This page is powered by Blogger. Isn't yours?

Prior Art

Socrates

"We are in the habit, I take it, of positing a single idea or form in the case of the various multiplicities to which we give the same name. Do you not understand?” “I do.” “In the present case, then, let us take any multiplicity you please; for example, there are many couches and tables.” “Of course.” “But these utensils imply, I suppose, only two ideas or forms, one of a couch and one of a table.” “Yes.” “And are we not also in the habit of saying that the craftsman who produces either of them fixes his eyes on the idea or form, and so makes in the one case the couches and in the other the tables that we use, and similarly of other things? For surely no craftsman makes the idea itself. How could he?” “By no means.”
Plato, Republic X, page 596a


David Hume

"This convention is not of the nature of a promise: For even promises themselves, as we shall see afterwards, arise from human conventions. It is only a general sense of common interest; which sense all the members of the society express to one another, and which induces them to regulate their conduct by certain rules. I observe, that it will be for my interest to leave another in the possession of his goods, provided he will act in the same manner with regard to me. He is sensible of a like interest in the regulation of his conduct. When this common sense of interest is mutually expressed, and is known to both, it produces a suitable resolution and behaviour. And this may properly enough be called a convention or agreement betwixt us, though without the interposition of a promise; since the actions of each of us have a reference to those of the other, and are performed upon the supposition, that something is to be performed on the other part. Two men, who pull the oars of a boat, do it by an agreement or convention, though they have never given promises to each other. Nor is the rule concerning the stability of possession the less derived from human conventions, that it arises gradually, and acquires force by a slow progression, and. by our repeated experience of the inconveniences of transgressing it. On the contrary, this experience assures us still more, that the sense of interest has become common to all our fellows, and gives us a confidence of the future regularity of their conduct: And it is only on the expectation of this, that our moderation and abstinence are founded. In like manner are languages gradually established by human conventions without any promise. ..." - A Treatise of Human Nature, Chapter 74, 1739–40 by David Hume


John Locke

"...Semeiotike, or the doctrine of signs; the most usual whereof being words, it is aptly enough termed also Logike, logic: the business whereof is to consider the nature of signs, the mind makes use of for the understanding of things, or conveying its knowledge to others. For, since the things the mind contemplates are none of them, besides itself, present to the understanding, it is necessary that something else, as a sign or representation of the thing it considers, should be present to it: and these are ideas. And because the scene of ideas that makes one man's thoughts cannot be laid open to the immediate view of another, nor laid up anywhere but in the memory, a no very sure repository: therefore to communicate our thoughts to one another, as well as record them for our own use, signs of our ideas are also necessary: those which men have found most convenient, and therefore generally make use of, are articulate sounds. The consideration, then, of ideas and words as the great instruments of knowledge, makes no despicable part of their contemplation who would take a view of human knowledge in the whole extent of it. And perhaps if they were distinctly weighed, and duly considered, they would afford us another sort of logic and critic, than what we have been hitherto acquainted with." - AN ESSAY CONCERNING HUMAN UNDERSTANDING by John Locke 1690