Implications of the Recursive Representation Problem for Automatic Concept Identifcation in On-line Governmental lnformation

Miles  Efron; Gary Marchionini; Julinang Zhiang

doi:10.7152/acro.v14i1.14111

Implications of the Recursive Representation Problem for Automatic Concept Identifcation in On-line Governmental lnformation

Authors

Miles Efron University of North Carolina
Gary Marchionini University of North Carolina
Julinang Zhiang University of North Carolina

DOI:

https://doi.org/10.7152/acro.v14i1.14111

Abstract

This paper describes ongoing research into the application of unsupervised learning techniques for improving access to governmental information on the Web. Under the auspices of the GovStat Project (http://www.ils.unc.edu/govstat), our goal is to identify a small number of semantically valid and mutually exclusive "concepts" that adequately span the intellectual domain of a web site. While this is a classic instance of the clustering problem [14] the task is complicated by the dual-representational nature of term-document relationships. Since documents are defined in term-space and vice versa, we may approach this as a document-or term-clustering problem. The current study explores the implications of pursuing both term- and document-centered representations. Based on initial work, we argue for a document clustering-based approach. Describing completed research, we suggest that term clustering yields semantically valid categories, but that these categories are not suitably broad. To improve the coverage of the clustering, we describe a process based on document clustering.

Downloads

Published

2003-10-01

Issue

14th ASIS SIG/CR Classification Research Workshop

Section

Articles

License

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.

Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.

Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

Implications of the Recursive Representation Problem for Automatic Concept Identifcation in On-line Governmental lnformation

Authors

DOI:

Abstract

Downloads

Published

Issue

Section

License

Developed By