Stable Search Radar

CC BY 4.0Kohlhammer, JörnLücke-Tieke, HendrikStromberg, JonasJonasStromberg2023-12-202023-12-202022https://doi.org/10.24406/publica-323https://publica.fraunhofer.de/handle/publica/425491https://doi.org/10.24406/publica-32310.24406/publica-323High-dimensional data sets are no longer an exception in the world of data scientists or even ordinary users. Nowadays nearly everybody gets in touch with high-dimensional data, sometimes without realizing it. Thinking of your favourite web search engine where entering a query leads to thousands of (somehow) ranked results like documents, tables, pictures or more complex data items, it is easy to see that each of these results can be interpreted as a high-dimensional data point. The more important it becomes to support users to get a quick and robust understanding of the result landscape. Therefore normally some sort of ranking, like estimating the relevance of the results in relation to the search query, is applied and the results are displayed in this order, starting with the presumably most relevant one [25] [57]. To support users even more it is possible to make clear which results are more similar to each other and which differ. By doing this, the users are able to cluster the result space and to efficiently work out their own understanding of this space. Thereby they can search either in the direction of one cluster to dive deeper into this special result set or they diversify their search by looking at a few results in each cluster. For that, two different metrics are employed: 1.) one metric to measure the relevance of each single result in relation to the query and 2.) another metric to measure the similarity between the found results [51]. Regardless of whether the users want to examine one special cluster or all clusters in general, in both cases it is an iterative search. Starting with the initial search query, the users select the result data point they are interested in most and adapt the search query by it. This way, the result space changes and they get new and more detailed information about the result set with respect to the newly selected data point. Following the saying, ”A picture tells more than ten thousand words” it is preferred to present data in a graphical way over presenting it as plain text [35]. This way, the viewers can more easily recognise the structure of the data space and its properties and can infer further information from this. To support the users in their understanding and sense making of and in the high-dimensional result space, we need to visualize it in an intuitive, simple and human-understandable manner [9]. Beyond that, it is the users’ goal to explore the result set. Hence it is necessary to allow them to adapt the visualization to their needs. In addition, the visualization allows a more intuitive interaction with the data than with a simple text representation. This way the iterative search becomes much easier, due to less cognitive overhead. In this master thesis a new interactive radial visualization for high-dimensional data sets is presented. The visualization displays in an intuitive way two different user-defined metrics on the data set: inter-data-point similarity along the angle and relevance to the reference point along the radius. For that, several dimensionality reduction techniques are evaluated, a framework for this evaluation is provided and an interactive visualization tool for users to explore the resulting data set is created. This work is organized as follows: Chapter 2 presents background information and related work. Chapter 3 shows the concept of the novel visualization, explains the Gap and different approaches to avoid it. Chapter 4 covers the implementation of the experiments, which will later prove the existence of the Gap, the implementation of the visualization and its backend. Chapter 5 evaluates the existence of the Gap by analysing the experiment results and discusses the comprehensibility of the proposed visualization. Chapter 6 concludes this work and Chapter 7 shows what to do with it in the future.enLead Topic: Digitized WorkResearch Line: Computer graphics (CG)Research Line: Human computer interaction (HCI)Research Line: Machine Learning (ML)Multidimensional data visualizationCircular similarity metricsInteractive visualizationSearch result visualizationSimilarity functionsSimilarity measuresSimilarity searchGraphical interactive user interfacesObject centered navigationStable Search Radarmaster thesis