Thursday, December 02, 2004

Curing the Web's Identity Crisis 

Curing the Web's Identity Crisis
"Abstract
This paper describes the crisis of identity facing the World Wide Web and, in particular, the RDF community. It shows how that crisis is rooted in a lack of clarity about the nature of "resources" and how concepts developed during the XML Topic Maps effort can provide a solution that works not only for Topic Maps, but also for RDF and semantic web technologies in general.

1. Introduction

In an important recent article on XML.com entitled "Identity
Crisis" [Clark 2002], Kendall Clark addresses the issue
of "identity" as it pertains to the World Wide Web. Clark quotes the
description of the Web by the W3C's Technical Architecture Group (TAG)
in Architecture of the World Wide Web [Jacobs 2002], as a "universe of resources", where "resource" is
to be understood according to the definition given in [RFC 2396] as being "anything that has identity". Clark points
out that the concept of "identity" itself is nowhere defined and
moreover is severely problematic.

Clark's article is part of a long-standing and on-going discussion
in the Web community. As Sandro Hawke points out: "This is an old issue,
and people are tired of it, but the issue continues to complicate the
lives of RDF users". Tim Berners-Lee, after finding himself in a
minority in the W3C TAG, has found it important enough to justify a
position paper of his own, entitled What do HTTP URIs Identify?
[Berners-Lee 2003]. Other important contributions have
been David Booth's Four Uses of a URL [Booth 2003] and Sandro Hawke's Disambiguating RDF
Identifiers
[Hawke 2002], among many others.

The heart of the matter is the question "What do URIs identify?"
Today there is no consistent answer to this question, as Hawke
notes:
To date, RDF has not been clear about whether a URI like
"http://www.w3.org/Consortium" identifies the W3C or a web page about
the W3C. Throughout RDF, strings like
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type" are used with no
consistent explanation of how they relate to the web.

Clark broadens the discussion to cover the whole issue of "What is
a resource?" His example is different, but his point is the same:
URIs may well identify one resource each, but which one? Or,
rather, if this is the case, why do developers tend to confuse or
conflate resources? A URI like http://clark.dallas.tx.us/kendall
cannot, if we take [Jacobs 2002] seriously, identify the
resource we might call "Kendall Clark's home page" and the resource we
might call "the natural person Kendall Clark". And yet there are
perpetual conversations in the development community about, say, which
resource one's home page identifies, about overloading the URI of one's
home page to identify both oneself and one's home page, and so
on.

Why is this important? Because without clarity on this issue, it
is impossible to solve the challenge of the Semantic Web, and it is
impossible to implement scaleable Web Services. It is impossible to
achieve the goals of "global knowledge federation" and impossible even
to begin to enable the aggregation of information and knowledge by human
and software agents on a scale large enough to control infoglut.

Ontologies and taxonomies will not be reusable unless they are
based on a reliable and unambiguous identification mechanism for the
things about which they speak. The same applies to classifications,
thesauri, registries, catalogues, and directories. Applications
(including agents) that capture, collate or aggregate information and
knowledge will not scale beyond a closely controlled environment unless
the identification problem is solved. And technologies like RDF and
Topic Maps that use URIs heavily to establish identity will simply not
work (and certainly not interoperate) unless they can rely on
unambiguous identifiers.

A solution to the "identity crisis of the Web" is clearly
essential. The purpose of this paper is to offer an explanation of the
root causes of the problem and to show how concepts originally developed
as part of XML Topic Maps (XTM) [Pepper 2001] offer a
solution that can be applied to the semantic web in general.

Bibliography

Berners-Lee 2001

Berners-Lee, Tim, James Hendler and Ora Lassila: "The Semantic
Web", Scientific American, May 2001,
http://www.sciam.com/2001/0501issue/0501berners-lee.html

Berners-Lee 2003

Berners-Lee, Tim: What do HTTP URIs Identify?, February 15
2003, http://www.w3.org/DesignIssues/HTTP-URI

Booth 2003

Booth, David: Four Uses of a URL: Name, Concept, Web Location
and Document Instance
, January 28 2003,
http://www.w3.org/2002/11/dbooth-names/dbooth-names_clean.htm

Clark 2002

Clark, Kendall Grant: Identity Crisis, XML.com, September
11, 2002, http://www.xml.com/pub/a/2002/09/11/deviant.html

Garshol 2003a

Garshol, Lars Marius and Graham Moore (eds):The Standard
Application Model for Topic Maps
, March 9 2003,
http://www.isotopicmaps.org/sam/sam-model/

Garshol 2003b

Garshol, Lars Marius:Living with topic maps and RDF,
Proceedings of XML Europe 2003,
http://www.ontopia.net/topicmaps/materials/tmrdf.html

Hawke 2002

Hawke, Sandro: Disambiguating RDF Identifiers, January 4
2003, http://www.w3.org/2002/12/rdf-identifiers/

ISO 13250

ISO/IEC 13250:2002 Topic Maps, International Organization for
Standardization,
http://www.y12.doe.gov/sgml/sc34/document/0322_files/iso13250-2nd-ed-v2.pdf

Jacobs 2002

Jacobs, Ian (ed.): Architecture of the World Wide Web, W3C
Working Draft 15 November 2002,
http://www.w3.org/TR/2002/WD-webarch-20021115/

Manola 2003

Manola, Frank and Eric Miller: RDF Primer, W3C Working Draft
23 January 2003, http://www.w3.org/TR/rdf-primer/

Pepper 2001

Pepper, Steve and Graham Moore (eds): XML Topic Maps (XTM) 1.0
Specification
, TopicMaps.Org, March 2003,
http://www.topicmaps.org/xtm/1.0/

Pepper 2003

Pepper, Steve: Published Subjects: Introduction and Basic
Requirements
, OASIS Published Subjects Draft Recommendation,
http://www.ontopia.net/tmp/pubsubj-gentle-intro.htm

RFC 2396

Uniform Resource Identifiers (URI): Generic Syntax, IETF,
August 1998, http://www.ietf.org/rfc/rfc2396.txt


0 Comments:

Post a Comment

This page is powered by Blogger. Isn't yours?