Cosine Similarity Indexing of Word Embeddings Using Knowledge Organization Systems

Authors

  • John Kausch Western University

DOI:

https://doi.org/10.7152/nasko.v7i1.95650

Abstract

This paper proposes a new technique for cosine similarity indexing in the era of large language models (LLMs). It investigates how knowledge organization systems (KOS) can be used to index the latent spaces which LLMs produce. A latent space is a multidimensional feature space used by a model to encode the context of data items. In the case of an LLM, a typical latent space is a word embedding, which gives every word a “position” in a multidimensional feature space, where the features are opaque, and not human-readable. This work asks: can indexing such latent spaces with KOSs help make LLMs more explainable? It builds on previous work in latent semantic indexing for information retrieval models to see if similar techniques can be used to bridge KOSs and LLMs. It also investigates how this method can be applied to improving the performance of multilingual information retrieval. A cross-lingual ontology (called Horapollo) is used to index two latent spaces containing Wikipedia articles written in English and Arabic. Then, the distance between equivalent articles in both spaces are taken, raising questions about the use of KOSs for multilingual and transdisciplinary information retrieval tasks in the era of semantic search.

Downloads

Published

2025-09-05