Thursday, September 09, 2004

Experiments in Automatic Word Class and Word Sense Identification for Information Retrieval 

Experiments in Automatic Word Class and Word Sense Identification for Information Retrieval
"Abstract

Automatic identification of related words and automatic detection of word senses are two long-standing goals of researchers in natural language processing. Word class information and word sense identification may enhance the performance of information retrieval systems. Large online corpora and increased computational capabilities make new techniques based on corpus linguistics feasible. Corpus-based analysis is especially needed for corpora from specialized fields for which no electronic dictionaries or thesauri exist. The methods described here use a combination of mutual information and word context to establish word similarities. Then, unsupervised classification is done using clustering in the word space, identifying word classes without pretagging. We also describe an extension of the method to handle the difficult problems of disambiguation and of determining part-of-speech and semantic information for low-frequency words. The method is powerful enough to produce high-quality results on a small corpus of 200,000 words from abstracts in a field of molecular biology."

0 Comments:

Post a Comment

This page is powered by Blogger. Isn't yours?