Representing Aboutness: Automatically Indexing 19th- Century Encyclopedia Britannica Entries

Sam Grabus; Jane Greenberg; Peter Logan; Jane Boone

doi:10.7152/nasko.v7i1.15635

Representing Aboutness: Automatically Indexing 19th- Century Encyclopedia Britannica Entries

Authors

Sam Grabus Drexel University
Jane Greenberg Drexel University
Peter Logan Temple University
Jane Boone Drexel University

DOI:

https://doi.org/10.7152/nasko.v7i1.15635

Abstract

Representing aboutness is a challenge for humanities documents, given the linguistic indeterminacy of the text. The challenge is even greater when applying automatic indexing to historical documents for a multidisciplinary collection, such as encyclopedias. The research presented in this paper explores this challenge with an automatic indexing comparative study examining topic relevance. The setting is the NEH-funded 19th-Century Knowledge Project, where researchers in the Digital Scholarship Center, Temple University, and the Metadata Research Center, Drexel University, are investigating the best way to index entries across four historical editions of the Encyclopedia Britannica (3rd, 7th, 9th, and 11th editions). Individual encyclopedia entry entries were processed using the Helping Interdisciplinary Vocabulary Engineering (HIVE) system, a linked-data, automatic indexing terminology application that uses controlled vocabularies. Comparative topic relevance evaluation was performed for three separate keyword extraction algorithms: RAKE, Maui, and Kea++. Results show that RAKE performed the best, with an average of 67% precision for RAKE, and 28% precision for both Maui and Kea++. Additionally, the highest-ranked HIVE results with both RAKE and Kea++ demonstrated relevance across all sample entries, while Maui’s highest-ranked results returned zero relevant terms. This paper reports on background information, research objectives and methods, results, and future research prospects for further optimization of RAKE’s algorithm parameters to accommodate for encyclopedia entries of different lengths, and evaluating the indexing impact of correcting the historical Long S.

Downloads

Published

2019-09-23

Issue

Vol 7 (2019): Proceedings from North American Symposium on Knowledge Organization

Section

Papers

License

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.

Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.

Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

Representing Aboutness: Automatically Indexing 19th- Century Encyclopedia Britannica Entries

Authors

DOI:

Abstract

Downloads

Published

Issue

Section

License

Information