CC BY-NC-ND 4.0Strauß, OliverOliverStraußKett, Holger JoachimHolger JoachimKett2024-01-192024-01-192023-01-01https://publica.fraunhofer.de/handle/publica/459065https://doi.org/10.24406/publica-246910.5220/001223920000358410.24406/publica-24692-s2.0-85179586014Finding good representations for documents in the context of semantic search is a relevant problem with applications in domains like medicine, research or data search. In this paper we propose to represent each document in a search index by a number of different contextual embeddings. We define and evaluate eight different strategies to combine embeddings of document title, document passages and relevant user queries by means of linear combinations, averaging, and clustering. In addition we apply an agent-based approach to search whereby each data item is modeled as an agent that tries to optimize its metadata and presentation over time by incorporating information received via the users' interactions with the search system. We validate the document representation strategies and the agent-based approach in the context of a medical information retrieval dataset and find that a linear combination of the title embedding, mean passage embedding and the mean over the clustered embeddings of relevant queries offers the best trade-off between search-performance and index size. We further find, that incorporating embeddings of relevant user queries can significantly improve the performance of representation strategies based on semantic embeddings. The agent-based system performs slightly better than the other representation strategies but comes with a larger index size.enAgent-Based RetrievalDataset ResearchSemantic SearchDocuments as Intelligent Agentsconference paper