Automatic Categorization of Statute Documents
AbstractAutomatic classification offers publishers of large document collections the possibility of improved production efficiencies in print and online environments. In this paper we explore the possibility of automating the classification of statutory legal materials through the application of machine learning software designed to generate automatic text categorization. Our investigations focus on a specific methodology. Our plan aimed to train classifications from a pre-classified dataset of statute documents and associated index references. Accordingly, we observed that each index feature I like 'insurance', or 'corporations' appended a set of document locators. These locators make up the local collection for that index feature. The total of all documents in the dataset, whether assigned an index feature or not, makes up the global collection. The fundamental idea was to develop an algorithm based on text features whose frequency in the local collection was high but whose frequency in the global collection was moderate to low. The system would be provided with a set of descriptors taken from the text of statute documents from which it generates, by algorithm, a lexicon. The lexicon is evaluated by domain experts who assess its relationship to the semantic content of the index feature sought to be modeled. Once a satisfying lexicon has been created, machine learning software is used to generate classification rules from the lexicon. The rules in turn .generate classifications for documents in a test collection.
LicenseAuthors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).