Combining Machine Learning and Hierarchical Indexing Structures for Text Categorization

Miguel E. Ruiz, Padmini Srinivasan


This paper presents a method that exploits the hierarchical structure of an indexing vocabulary to guide the development and training of machine learning methods for automatic text categorization. We present the design of a hierarchical classifier based on the divide and conquer principle. The method is evaluated using backpropagation neural networks, as the machine learning algorithm, that learn to assign MeSH categories to a subset ofMEDLINE records. Comparisons with traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers are provided. The results indicate that the use ofhierarchical structures improves performance significantly.

Full Text: