Boolean Query Reformulation with the Query Tree Classifier

Nam-Ho Kim, James C. French, Donald E. Brown


One of the difficulties in using the current Boolean-based information retrieval systems is that it is hard for a user, especially a novice, to formulate an effective Boolean query. Query reformulation can be even more difficult and complex than formulation since the user can have difficulty in incorporating the new information gained from the previous search into his/her next query. In this research, query reformulation is viewed as a classification problem (i.e., classifying documents as either relevant or nonrelevant), and a new reformulation algorithm is proposed which builds a treestructured classifier (named the query tree) at each reformulation from a set of feedback documents retrieved from the previous search. The query tree can be easily transformed into a Boolean query. The query tree and two of the most important current query reformulation algorithms were compared on benchmark test sets (CACM. CISI, and MedJars). The query tree showed significant improvements over the current algorithms in most experiments. We attribute this improved performance to the ability of the query tree algorithm to select good search terms and to represent the relationships among search terms into a tree structure.

Full Text: