Genre-Based In-Document Content Type Classification
DOI:
https://doi.org/10.7152/acro.v14i1.14110Abstract
This paper presents an in-document content classification approach that combines genre analysis and shallow natural language processing techniques to do document segment-level content classification. Given a document in a particular genre, we can classify the content of each segment (e.g. a paragraph) based on the recognized content type and typical linguistic features of the genre. The informal evaluative document genre is chosen as the test genre, and the online consumer review is used as the test data set. The classification results support our hypothesis that the content type of segment in a document of a particular genre could be predicted from the linguistic features. This approach may be used as a component in faceted search, multi-document summarization and many other information processing applications.Downloads
Published
2003-10-01
Issue
Section
Articles
License
Authors who publish with this journal agree to the following terms:- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).