A Semantic Similarity Approach for Linking Tweet Messages to Library of Congress Subject Headings using Linked Resources: A Pilot Study


  • Kwan Yi College of Education, Eastern Kentucky University




The objective of this study is to propose, implement, and test a framework of assigning relevant Library of Congress (LC) subject headings to tweet messages. In this study, the task of assigning LC headings is considered an automatic classification task that identifies relevant LC subject headings for given tweets. The classification task is conducted in two stages. In the first stage, tweets are clustered so that similar tweets are grouped together. In the second stage, the degree of similarity between a cluster of tweets and LC subject headings is measured by a popular similarity metric, Jaccard Coefficient (JC). In this pilot study, five selected tweet clusters and nine LC subject headings were carefully chosen and used. This pilot study demonstrates a positive result forthe proposed approach of identifying subject headings for tweets. In three cluster cases out of the five, JC selected the most relevant headings as the largest degrees of similarity. For the other two cases, JC was not successful in ranking the most relevant within the top three headings. In the next step, a more sophisticated clustering method will be explored and applied. Also, all possible LC subject headings will be employed to identify LC subjects for tweets in the next steps of this study.