Cosine Similarity Indexing of Word Embeddings Using Knowledge Organization Systems

John Kausch

doi:10.7152/nasko.v7i1.95650

Cosine Similarity Indexing of Word Embeddings Using Knowledge Organization Systems

Authors

John Kausch Western University

DOI:

https://doi.org/10.7152/nasko.v7i1.95650

Abstract

This paper proposes a new technique for cosine similarity indexing in the era of large language models (LLMs). It investigates how knowledge organization systems (KOS) can be used to index the latent spaces which LLMs produce. A latent space is a multidimensional feature space used by a model to encode the context of data items. In the case of an LLM, a typical latent space is a word embedding, which gives every word a “position” in a multidimensional feature space, where the features are opaque, and not human-readable. This work asks: can indexing such latent spaces with KOSs help make LLMs more explainable? It builds on previous work in latent semantic indexing for information retrieval models to see if similar techniques can be used to bridge KOSs and LLMs. It also investigates how this method can be applied to improving the performance of multilingual information retrieval. A cross-lingual ontology (called Horapollo) is used to index two latent spaces containing Wikipedia articles written in English and Arabic. Then, the distance between equivalent articles in both spaces are taken, raising questions about the use of KOSs for multilingual and transdisciplinary information retrieval tasks in the era of semantic search.

Downloads

Published

2025-09-05

Issue

Vol. 10 (2025): Proceedings from North American Symposium on Knowledge Organization

Section

Articles

License

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.

Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.

Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

Cosine Similarity Indexing of Word Embeddings Using Knowledge Organization Systems

Authors

DOI:

Abstract

Downloads

Published

Issue

Section

License

Information