Viewing the Dictionary as a Classification System
DOI:
https://doi.org/10.7152/acro.v1i1.12467Abstract
Information retrieval is one of the earliest applications of computers. Starting with the speculative wode of Vannevar Bush on Memex [Bush 45], to the development of Key Word in Context (KWIC) indexing by H.P. Luhn [Luhn 60] and Boolean retrieval by John Horty [Horty 62], to the statistical techniques for automatic indexing and document retrieval done in the 1960's and continuing to the present [Salton and McGill 83], Information Retrieval has continued to develop and progress. However, there is a growing consensus that current generation statistical techniques have gone about as far as they can go, and that further improvement requires the use of natural language processing and knowledge representation. We believe that the best place to start is by focusing on the lexicon, and to index documents not by words, but by word senses. Why use word senses? Conventional approaches advocate either indexing by the words themselves, or by manual indexing using a controlled vocabulary. Manual indexing offers some of the advantage of word senses, in that the terms are not ambiguous, but it suffers from problems of consistency. In addition, as text data bases continue to grow, it will only be possible to index a fraction of them by hand. In advocating word senses as indices we are not suggesting that they are the ultimate answer. There is much more to the meaning of a document then the senses of the words it contains; we are just saying that senses are a good start. Any approach to providing a semantic analysis must deal with the problem of word meaning. Existing retrieval systems try to go beyond single words by using a thesaurus,l but this has the problem that words are not synonymous in all contexts. The word 'term' may be synonymous with 'word' (as in a vocabulary term), 'sentence' (as in a prison term), or 'condition' (as in 'terms of agreement'). If we expand the query with words from a thesaurus, we must be careful to use the right senses of those words. We not only have to know the sense of the word in the query (in this example, the sense of the word 'term '), but the sense of the word that is being used to augment it (e.g., the appropriate sense of the word 'sentence'). The thesaurus we use should be one in which the senses of words are explicitly indicated [Chodorow et al. 88]. We contend that the best place to obtain word senses is a machine-readable dictionary. Although it is possible that another list of senses might be manually constructed, this strategy might cause some senses to be overlooked, and the task will entail a great degree of effort.Downloads
Published
1990-10-06
Issue
Section
Papers
License
Authors who publish with this journal agree to the following terms:- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).