Now showing 1 - 10 of 156
No Thumbnail Available
Publication

Efficient computation of comprehensive statistical information of large OWL datasets: A scalable approach

2023 , Mohamed, H. , Fathalla, S. , Lehmann, Jens , Jabeen, H.

Computing dataset statistics is crucial for exploring their structure, however, it becomes challenging for large-scale datasets. This has several key benefits, such as link target identification, vocabulary reuse, quality analysis, big data analytics, and coverage analysis. In this paper, we present the first attempt of developing a distributed approach (OWLStats) for collecting comprehensive statistics over large-scale OWL datasets. OWLStats is a distributed in-memory approach for computing 50 statistical criteria for OWL datasets utilizing Apache Spark. We have successfully integrated OWLStats into the SANSA framework. Experiments results prove that OWLStats is linearly scalable in terms of both node and data scalability.

No Thumbnail Available
Publication

Improving Inductive Link Prediction Using Hyper-Relational Facts (Extended Abstract)

2022-07-01 , Ali, Mehdi , Berrendorf, Max , Galkin, Mikhail , Thost, Veronika , Ma, Tengfei , Tresp, Volker , Lehmann, Jens

For many years, link prediction on knowledge. graphs has been a purely transductive task, not allowing for reasoning on unseen entities. Recently, increasing efforts are put into exploring semi- and fully inductive scenarios, enabling inference over unseen and emerging entities. Still, all these approaches only consider triple-based KGs, whereas their richer counterparts, hyper-relational KGs (e.g., Wikidata), have not yet been properly studied. In this work, we classify different inductive settings and study the benefits of employing hyper-relational KGs on a wide range of semi- and fully inductive link prediction tasks powered by recent advancements in graph neural networks. Our experiments on a novel set of benchmarks show that qualifiers over typed edges can lead to performance improvements of 6% of absolute gains (for the Hits@10 metric) compared to triple-only baselines. Our code is available at https://github.com/mali-git/hyper_relational_ilp.

No Thumbnail Available
Publication

Survey on English Entity Linking on Wikidata: Datasets and approaches

2022-01-27 , Möller, Cedric , Lehmann, Jens , Usbeck, Ricardo

Wikidata is a frequently updated, community-driven, and multilingual knowledge graph. Hence, Wikidata is an attractive basis for Entity Linking, which is evident by the recent increase in published papers. This survey focuses on four subjects: (1) Which Wikidata Entity Linking datasets exist, how widely used are they and how are they constructed? (2) Do the characteristics of Wikidata matter for the design of Entity Linking datasets and if so, how? (3) How do current Entity Linking approaches exploit the specific characteristics of Wikidata? (4) Which Wikidata characteristics are unexploited by existing Entity Linking approaches? This survey reveals that current Wikidata-specific Entity Linking datasets do not differ in their annotation scheme from schemes for other knowledge graphs like DBpedia. Thus, the potential for multilingual and time-dependent datasets, naturally suited for Wikidata, is not lifted. Furthermore, we show that most Entity Linking approaches use Wikidata in the same way as any other knowledge graph missing the chance to leverage Wikidata-specific characteristics to increase quality. Almost all approaches employ specific properties like labels and sometimes descriptions but ignore characteristics such as the hyper-relational structure. Hence, there is still room for improvement, for example, by including hyper-relational graph embeddings or type information. Many approaches also include information from Wikipedia, which is easily combinable with Wikidata and provides valuable textual information, which Wikidata lacks.

No Thumbnail Available
Publication

Bringing Light Into the Dark: A large-scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework

2022 , Ali, Mehdi , Berrendorf, Max , Hoyt, Charles Tapley , Vermue, Laurent , Galkin, Mikhail , Sharifzadeh, Sahand , Fischer, Asja , Tresp, Volker , Lehmann, Jens

The heterogeneity in recently published knowledge graph embedding models' implementations, training, and evaluation has made fair and thorough comparisons difficult. To assess the reproducibility of previously published results, we re-implemented and evaluated 21 models in the PyKEEN software package. In this paper, we outline which results could be reproduced with their reported hyper-parameters, which could only be reproduced with alternate hyper-parameters, and which could not be reproduced at all, as well as provide insight as to why this might be the case. We then performed a large-scale benchmarking on four datasets with several thousands of experiments and 24,804 GPU hours of computation time. We present insights gained as to best practices, best configurations for each model, and w here improvements could be made over previously published best configurations. Our results highlight that the combination of model architecture, training approach, loss function, and the explicit modeling of inverse relations is crucial for a model's performance and is not only determined by its architecture. We provide evidence that several architectures can obtain results competitive to the state of the art when configured carefully. We have made all code, experimental configurations, results, and analyses available at https://github.com/pykeen/pykeen and https://github.com/pykeen/benchmarking.

No Thumbnail Available
Publication

Climate Bot: A Machine Reading Comprehension System for Climate Change Question Answering

2022-07-01 , Rony, Md Rashad Al Hasan , Zuo, Ying , Kovriguina, Liubov , Teucher, Roman , Lehmann, Jens

Climate change has a severe impact on the overall ecosystem of the whole world, including humankind. This demo paper presents Climate Bot - a machine reading comprehension system for question answering over documents about climate change. The proposed Climate Bot provides an interface for users to ask questions in natural language and get answers from reliable data sources. The purpose of the climate bot is to spread awareness about climate change and help individuals and communities to learn about the impact and challenges of climate change. Additionally, we open-sourced an annotated climate change dataset CCMRC to promote further research on the topic. This paper describes the dataset collection, annotation, system design, and evaluation.

No Thumbnail Available
Publication

RoMe: A Robust Metric for Evaluating Natural Language Generation

2022-05 , Rony, Md Rashad Al Hasan , Kovriguina, Liubov , Chaudhuri, Debanjan , Usbeck, Ricardo , Lehmann, Jens

Evaluating Natural Language Generation (NLG) systems is a challenging task. Firstly, the metric should ensure that the generated hypothesis reflects the reference’s semantics. Secondly, it should consider the grammatical quality of the generated sentence. Thirdly, it should be robust enough to handle various surface forms of the generated sentence. Thus, an effective evaluation metric has to be multifaceted. In this paper, we propose an automatic evaluation metric incorporating several core aspects of natural language understanding (language competence, syntactic and semantic variation). Our proposed metric, RoMe, is trained on language features such as semantic similarity combined with tree edit distance and grammatical acceptability, using a self-supervised neural network to assess the overall quality of the generated sentence. Moreover, we perform an extensive robustness analysis of the state-of-the-art methods and RoMe. Empirical results suggest that RoMe has a stronger correlation to human judgment over state-of-the-art metrics in evaluating system-generated sentences across several NLG tasks.

No Thumbnail Available
Publication

Spatial concept learning and inference on geospatial polygon data

2022 , Westphal, Patrick , Grubenmann, T. , Collarana Vargas, Diego , Bin, S. , Bühmann, L. , Lehmann, Jens

Geospatial knowledge has always been an essential driver for many societal aspects. This concerns in particular urban planning and urban growth management. To gain insights from geospatial data and guide decisions usually authoritative and open data sources are used, combined with user or citizen sensing data. However, we see a great potential for improving geospatial analytics by combining geospatial data with the rich terminological knowledge, e.g., provided by the Linked Open Data Cloud. Having semantically explicit, integrated geospatial and terminological knowledge, expressed by means of established vocabularies and ontologies, cross-domain spatial analytics can be performed. One analytics technique working on terminological knowledge is inductive concept learning, an approach that learns classifiers expressed as logical concept descriptions. In this paper, we extend inductive concept learning to infer and make use of the spatial context of entities in spatio-terminological data. We propose a formalism for extracting and making spatial relations explicit such that they can be exploited to learn spatial concept descriptions, enabling ‘spatially aware’ concept learning. We further provide an implementation of this formalism and demonstrate its capabilities in different evaluation scenarios.

No Thumbnail Available
Publication

DialoKG: Knowledge-Structure Aware Task-Oriented Dialogue Generation

2022-07 , Rony, Md Rashad Al Hasan , Usbeck, Ricardo , Lehmann, Jens

Task-oriented dialogue generation is challenging since the underlying knowledge is often dynamic and effectively incorporating knowledge into the learning process is hard. It is particularly challenging to generate both human-like and informative responses in this setting. Recent research primarily focused on various knowledge distillation methods where the underlying relationship between the facts in a knowledge base is not effectively captured. In this paper, we go one step further and demonstrate how the structural information of a knowledge graph can improve the system’s inference capabilities. Specifically, we propose DialoKG, a novel task-oriented dialogue system that effectively incorporates knowledge into a language model. Our proposed system views relational knowledge as a knowledge graph and introduces (1) a structure-aware knowledge embedding technique, and (2) a knowledge graph-weighted attention masking strategy to facilitate the system selecting relevant information during the dialogue generation. An empirical evaluation demonstrates the effectiveness of DialoKG over state-of-the-art methods on several standard benchmark datasets.

No Thumbnail Available
Publication

Time-aware Entity Alignment using Temporal Relational Attention

2022-04-25 , Xu, Chengjin , Su, Fenglong , Xiong, Bo , Lehmann, Jens

Knowledge graph (KG) alignment is to match entities in different KGs, which is important to knowledge fusion and integration. Temporal KGs (TKGs) extend traditional Knowledge Graphs (KGs) by associating static triples with specific timestamps (e.g., temporal scopes or time points). Moreover, open-world KGs (OKGs) are dynamic with new emerging entities and timestamps. While entity alignment (EA) between KGs has drawn increasing attention from the research community, EA between TKGs and OKGs still remains unexplored. In this work, we propose a novel Temporal Relational Entity Alignment method (TREA) which is able to learn alignment-oriented TKG embeddings and represent new emerging entities. We first map entities, relations and timestamps into an embedding space, and the initial feature of each entity is represented by fusing the embeddings of its connected relations and timestamps as well as its neighboring entities. A graph neural network (GNN) is employed to capture intra-graph information and a temporal relational attention mechanism is utilized to integrate relation and time features of links between nodes. Finally, a margin-based full multi-class log-loss is used for efficient training and a sequential time regularizer is used to model unobserved timestamps. We use three well-established TKG datasets, as references for evaluating temporal and non-temporal EA methods. Experimental results show that our method outperforms the state-of-the-art EA methods.

No Thumbnail Available
Publication

A Simulated Annealing Meta-heuristic for Concept Learning in Description Logics

2022 , Westphal, Patrick , Vahdati, Sahar , Lehmann, Jens

Ontologies - providing an explicit schema for underlying data - often serve as background knowledge for machine learning approaches. Similar to ILP methods, concept learning utilizes such ontologies to learn concept expressions from examples in a supervised manner. This learning process is usually cast as a search process through the space of ontologically valid concept expressions, guided by heuristics. Such heuristics usually try to balance explorative and exploitative behaviors of the learning algorithms. While exploration ensures a good coverage of the search space, exploitation focuses on those parts of the search space likely to contain accurate concept expressions. However, at their extreme ends, both paradigms are impractical: A totally random explorative approach will only find good solutions by chance, whereas a greedy but myopic, exploitative attempt might easily get trapped in local optima. To combine the advantages of both paradigms, different meta-heuristics have been proposed. In this paper, we examine the Simulated Annealing meta-heuristic and how it can be used to balance the exploration-exploitation trade-off in concept learning. In different experimental settings, we analyse how and where existing concept learning algorithms can benefit from the Simulated Annealing meta-heuristic.