Use of Subject Field Codes from a Machine-Readable Dictionary for Automatic Classification of Documents
DOI:
https://doi.org/10.7152/acro.v3i1.12598Abstract
We are currently eveloping a system whose goal is to emulate a human classifier who peruses a large set of documents and sons them into richly defined classes based solely on the subject content of the documents. To accomplish this task, our system tags each word in a document with the appropriate Subject Field Code (SFC) from a machine-readable dictionary. The within- document SFCs are then summed and normalized and each document is represented as a vector of the SFCs occurring in that document. These vectors are clustered using Ward's agglomerative clustering algorithm (Ward, 1963) to form classes in a document database. For retrieval, queries are likewise represented as SFC vectors and then matched to the prototype SFC vector of each cluster in the database. Clusters whose prototype SFC vectors exhibit a predetermined criterion of similarity to the query SFC vector are passed on to other system components for more computationally expensive representation and matching.Downloads
Published
1992-10-25
Issue
Section
Articles
License
Authors who publish with this journal agree to the following terms:- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).