Saturday, September 11, 2004
Friday, September 10, 2004
Free software is mainly developed on mailing lists. Mailing lists have many advantages over other forms of communication, but they have two weaknesses: It's difficult to follow discussions in a sensible way, and mailing list archives (when they exist) have a tendency to disappear over time.
Several mailing list archives exist, but these are all hidden under a web interface. Reading mail that way is not convenient. Reading mail as if it were news is convenient.
This is what Gmane offers. Mailing lists are funneled into news groups. This isn't a new idea; several mail-to-news gateways exist. What's new with Gmane is that no messages are ever expired from the server, and the gateway is bidirectional. You can post to some of these mailing lists without being subscribed to them yourself."
This is just a preview. NG4J V0.1 will be released 09/15/04.
The features of NG4J V0.1 are:
- graph-centric methods for manipulating sets of Named Graphs
- quad-centric methods for manipulating sets of Named Graphs
- viewing graphsets as Jena models and Jena graphs
- provenance-enabled Jena statements
- integrated parser for the basic TriX syntax
- integrated serializer for the basic TriX syntax
- in-memory graphset storage
- directory reader for importing existing RDF files.
- directory writer for serializing graph sets a classic RDF files."
Thursday, September 09, 2004
What is OLIF?
OLIF, the Open Lexicon Interchange Format, is a user-friendly vehicle for exchanging terminological and lexical data.
Why choose OLIF?
Designed for users of language technology, OLIF is an open, XML-compliant standard that can streamline the exchange of terminological and lexical data. With its flexible design and representative array of terminological and linguistic features, OLIF can help the user address language data management needs ranging from basic terminology exchange to managing lexicons for natural language processing (NLP) systems, such as machine translation.
How was OLIF designed? Who supports OLIF?
OLIF was designed by members of the OLIF Consortium, an organization of major NLP technology suppliers, corporate users of NLP, and research institutions. Headed by SAP, the OLIF Consortium designed and released the official version 2 of OLIF and provides support for its implementation by users worldwide.
Automatic identification of related words and automatic detection of word senses are two long-standing goals of researchers in natural language processing. Word class information and word sense identification may enhance the performance of information retrieval systems. Large online corpora and increased computational capabilities make new techniques based on corpus linguistics feasible. Corpus-based analysis is especially needed for corpora from specialized fields for which no electronic dictionaries or thesauri exist. The methods described here use a combination of mutual information and word context to establish word similarities. Then, unsupervised classification is done using clustering in the word space, identifying word classes without pretagging. We also describe an extension of the method to handle the difficult problems of disambiguation and of determining part-of-speech and semantic information for low-frequency words. The method is powerful enough to produce high-quality results on a small corpus of 200,000 words from abstracts in a field of molecular biology."
"The web site of the book, "Link Analysis: An Information Science Approach" by Mike Thelwall, to be published by Academic Press in 2005. The web site will be finished by the time the book is in print. Here are some of the web site contents: Updated URLs, Instructions for link analysis and related techniques (mostly in Parts IV and V), Links to relevant resources and additional information, An expanded glossary"
The culture of lay indexing has been created by the aggregation strategy employed by Web search engines such as Google. Meaning is constructed in this culture by harvesting semantic content from Web pages and using hyperlinks as a plebiscite for the most important Web pages. The characteristic tension of the culture of lay indexing is between genuine information and spam. Google's success requires maintaining the secrecy of its parsing algorithm despite the efforts of Web authors to gain advantage over the Googlebot. Legacy methods of asserting meaning such as the META keywords tag and Dublin Core are inappropriate in the lawless meaning space of the open Web. A writing guide is urged as a necessary aid for Web authors who must balance enhancing expression versus the use of technologies that limit the aggregation of their work."
But especially important, consider this quote, "Google's continued success depends on its ability to collect unaffected Web content, which means that it must avoid the single individual's assertion of meaning. This strategy implies that any metadata scheme for the Web that promotes the meaning assertion of a single Web author (i.e., My Web page means this) will be avoided by aggregators. The strategy of aggregation, the enlistment of Web authors as lay indexers, and the temptation of bad faith points to the importance of maintaining the ignorance of lay indexers."
This makes Google the worst enemy of the Semantic Web. It implies that Google's success depends on ignoring the metadata added by web authors to their own pages.
Wednesday, September 08, 2004
Budgetted at over £2.6 million, the Enterprise project is the UK government's major initiative to promote the use of knowledge-based systems in enterprise modelling, aiming to support organisations effectively in the Management of Change. The project focused on management innovation and the strategic use of IT to help manage change. It supports the use of enterprise modelling methods which capture various aspects of how a business works and how it is organised. The aim of enterprise modelling is to obtain an enterprise-wide view of an organisation which can then be used as a basis for taking decisions. During the Enterprise project, the Enterprise Toolset was developed. The Toolset uses executable process models to help users to perform their tasks. It is implemented using an agent-based architecture to integrate off-the-shelf tools in a plug-and-play style. The approach of the Enterprise project addresses the key problems of communication, process consistency, impacts of change, IT systems, and responsiveness."
SenseClusters is a complete Word Sense Discrimination system that takes users from preprocessing of raw text to actual discrimination that involves selection of most discriminating features, context representations, clustering, followed by extensive analysis and performance evaluation."
Ontology alignment is a foundational problem area for semantic interoperability. We discuss the complexity faced by automated alignment solutions and describe an ontology-based approach for describing and evaluating alignments."
"Purpose of this Website
Ontology alignment is the automated resolution of semantic correspondences between the representational elements of heterogenous sytems. Ontology alignment (including ontology/schema matching/mapping) is a critical technical challenge for the dynamic semantic integration of information resources as well as for ontology-mediated cognitive agent learning.
The Ontology Alignment Source supports the ontology alignment research community by providing a forum that promotes the sharing and comparison of research data. This site provides access to tools, test data, and metrics for ontology alignment algorithm development and evaluation. We also publish ontology alignment experiment data contributed by members of the research community. This will allow for comparson of various ontology alignment opproaches, aided by visualization tools."
Tuesday, September 07, 2004
Re: [VM,ALL] Revised VM Task Force description from Jeremy Carroll on 2004-06-24 (firstname.lastname@example.org from June 2004)
[VM,ALL] Revised scope statement from Thomas Baker on 2004-06-13 (email@example.com from June 2004)
"Abstract. Entailment, as defined by RDF's model-theoretical seman-
tics, is a basic requirement for processing RDF, and represents the kind
of semantic interoperability" that RDF-based systems have been antic-
ipated to have to realize the vision of the Semantic Web". In this paper
we give some results in our investigation of a practical implementation
of the entailment rules, based on the graph-walking query mechanism of
the Wilbur RDF toolkit."
"cwm will operate in a number of modes with respect to looking
stuff up as a function of the URIs loaded into the knowledge base.
where flags are a combination of
p - look up predicates
s - look up subjects
o - look up objects
t - look up the object only where the predicate is rdf:type
An interesting mode is cwm --closure=pt.
Lets call this "ontological closure".
It is a reasonable thing to do, as it adds to the KB the
information which is assumed shared by the writer and reader of
a document. If people do this a lot, then it useful to write
ontological closure is of a manageable size. This is the case with any
real RDF files I've tried. It is what you might expect - people define
ontologies using ontologies but only to a limited level.
Contrast with --closure=spo which pulls in the whole contiguous
semantic web starting at the given document. This is not practical.
This is interesting, as it highlights a difference between p and s and
not only in the spec but in the topology of the web."
Wordnets are valuable resources both as lexical repositories and as sources of ontological distinctions. This documents presents a framework and workplan for porting wordnets to Semantic Web languages, like RDFS and OWL. Some phases are distinguished, and preliminary resources are referenced."