Now showing 1 - 10 of 95
  • Publication
    Multi-modal Emotion Categorization in Oral History Interviews
    ( 2023-07)
    Viswanath, Anargh
    ;
    Plöger, Paul Gerhard
    ;
    Houben, Sebastian
    ;
    ;
    Hassan, Teena
    This thesis proposes a multi-label classification approach using the Multimodal Transformer (MulT) [80] to perform multi-modal emotion categorization on a dataset of oral histories archived at the Haus der Geschichte (HdG). Prior uni-modal emotion classification experiments conducted on the novel HdG dataset provided less than satisfactory results. They uncovered issues such as class imbalance, ambiguities in emotion perception between annotators, and lack of representative training data to perform transfer learning [28]. Hence, the objectives of this thesis were to achieve better results by performing a multi-modal fusion and resolving the problems arising from class imbalance and annotator-induced bias in emotion perception. A further objective was to assess the quality of the novel HdG dataset and benchmark the results using SOTA techniques. Through a literature survey on the challenges, models, and datasets related to multi-modal emotion recognition, we created a methodology utilizing the MulT along with a multi-label classification approach. This approach produced a considerable improvement in the overall emotion recognition by obtaining an average AUC of 0.74 and Balanced-accuracy of 0.70 on the HdG dataset, which is comparable to state-of-the-art (SOTA) results on other datasets. In this manner, we were also able to benchmark the novel HdG dataset as well as introduce a novel multi-annotator learning approach to understand each annotator’s relative strengths and weaknesses for emotion perception. Our evaluation results highlight the potential benefits of the novel multi-annotator learning approach in improving overall performance by resolving the problems arising from annotator-induced bias and variation in the perception of emotions. Complementing these results, we performed a further qualitative analysis of the HdG annotations with a psychologist to study the ambiguities found in the annotations. We conclude that the ambiguities in annotations may have resulted from a combination of several socio-psychological factors and systemic issues associated with the process of creating these annotations. As these problems are also present in most multi-modal emotion recognition datasets, we conclude that the domain could benefit from a set of annotation guidelines to create standardized datasets.
  • Publication
    Continual Learning in Object Detection
    ( 2023-03)
    Tran Tien, Huy
    ;
    ;
    Plöger, Paul
    ;
    Houben, Sebastian
    Object detection concerns the classification and localization of objects in an image. To cope with changes in the environment, such as when new classes are added or a new domain is encountered, the detector needs to update itself with the new information while retaining knowledge learned in the past. Previous works have shown that training the detector solely on new data would produce a severe "forgetting" effect, in which the performance on past tasks deteriorates through each new learning phase. However, in many cases, storing and accessing past data is not possible due to privacy concerns or storage constraints. This project aims to investigate promising continual learning strategies for object detection without storing and accessing past training images and labels. We show that by utilizing the pseudo-background trick to deal with missing labels, and knowledge distillation to deal with missing data, the forgetting effect can be significantly reduced in both class-incremental and domain-incremental scenarios. Furthermore, an integration of a small latent replay buffer can result in a positive backward transfer, indicating the enhancement of past knowledge when new knowledge is learned.
  • Publication
    Retrieval Augmented Generative Task-oriented Dialogue Systems
    ( 2022-07-22) ;
    Rony, Md Rashad Al Hasan
    In this thesis, we study task-oriented dialogue systems that rely on efficient knowledge inference from background knowledge sources and dialogue history to satisfy a user goal. While the conventional modular architectures and the new age end-to-end architectures are the more established design choices for task-oriented dialogue systems, in this thesis, we propose a relatively simple - retrieve-and-generate strategy for task-oriented dialogue systems. Our proposed approach enjoys the best of both architectures while addressing their respective limitations. We experiment with different retrieval techniques using sparse representations and dense embeddings. Considering the diversity of task-oriented dialogue datasets, we experiment with SMD, Camrest, and MultiWOZ-2.1. Furthermore, in view of the entity-rich nature of task-oriented dialogue systems, we question the typical process of introducing auxiliary objectives for better capturing entity awareness, with a simple alternative: adding a syntax embedding layer on top of the standard token embedding and position embedding layers, thereby explicitly adding syntactic knowledge into the model parameters. We propose to use a syntax-infused transformer, a model that explicitly leverages syntactic information by augmenting readily available entity-level metadata, e.g. part-of-speech tags. Despite its simplicity, the syntax-infused transformer is effective. On standard evaluation benchmarks for task-oriented dialogue systems, our proposed syntax-infused model exceeds our base model by an average of 13 Entity-F1 points and 2.8 BLEU points across the three datasets. At the same time, experimental results further confirm that our proposed model outperforms existing state-of-the-art models on the Entity-F1 metric. The empirical analysis further confirms the efficacy of our approach. Overall, our work proposes a relatively more interpretable, easily reproducible and lightweight model in terms of trainable parameters while achieving comparable performance with state-ofthe-art models. Additionally, we conduct robust error analysis for the generated responses together with the evaluation metrics and propose a handful of future research directions.
  • Publication
    Feature Attribution for Automatic Medical Coding
    ( 2022-03-31)
    Sauvant, Marie Pauline
    ;
    ; ;
    Assigning medical codes to clinical reports is an important task in the healthcare domain, primarily for documentation, billing, and research purposes. Multipage reports and a high-dimensional label space cause the manual mapping process to be time-consuming and error-prone. While deep neural networks have recently achieved promising performance in automatic medical coding, the opaque decisionmaking process of these models hampers their adoption in real-world scenarios. This points to the need for explanations revealing the reasons behind model predictions. In this master’s thesis, we address a specific type of explanations, namely feature attributions for text classification models. These explanations indicate which words in the input are relevant for the model to produce a particular output. Several existing models for automatic medical coding provide such feature attributions. However, they all lack an evaluation of faithfulness, i.e. of how accurately the attributions reflect the model’s reasoning process. In high-stakes decision scenarios such as medical coding, the faithfulness of attributions is crucial to account for the legal, financial, and ethical relevance of the code assignment. The contribution of this thesis is threefold: First, we analyze recent literature on feature attribution to examine the theoretical adequacy of the most commonly used methods with respect to our use case. Second, we investigate in experiments the applicability of a popular gradient-based attribution method, called Integrated Gradients, to two medical coding models. Third, we discuss the suitability of a recent faithfulness metric, called infidelity, from a theoretical perspective, and scrutinize its applicability to the medical coding task in experiments. Our attribution results indicate that Integrated Gradients cannot be readily applied to the current state-of-the art medical coding model, implying the need for further research in this area. Our discussion of infidelity suggests that this metric represents a reasonable notion of faithfulness. Our experiments appear to confirm its suitability for the medical coding task. Based on these results, we believe that this metric has the potential to become a standard tool for evaluating the faithfulness of feature attributions.
  • Publication
    Evaluation of Drift Detection Techniques for Automated Machine Learning Pipelines
    ( 2022-01)
    Abdelwahab, Hammam
    Machine learning-based solutions are frequently adapted in several applications that require big data in operations. The performance of a model that is deployed into operations is subject to degradation due to unanticipated changes in the flow of input data. Hence, monitoring data drift becomes essential to maintain the model’s desired performance. Based on the conducted review of the literature on drift detection, statistical hypothesis testing enables to investigate whether incoming data is drifting from training data. Because Maximum Mean Discrepancy (MMD) and Kolmogorov-Smirnov (KS) have shown to be reliable distance measures between multivariate distributions in the literature review, both were selected from several existing techniques for experimentation. For the scope of this work, the image classification use case was experimented with using the Stream-51 dataset. Based on the results from different drift experiments, both MMD and KS showed high Area Under Curve values. However, KS exhibited faster performance than MMD with fewer false positives. Furthermore, the results showed that using the pre-trained ResNet-18 for feature extraction maintained the high performance of the experimented drift detectors. Furthermore, the results showed that the performance of the drift detectors highly depends on the sample sizes of the reference (training) data and the test data that flow into the pipeline’s monitor. Finally, the results also showed that if the test data is a mixture of drifting and non-drifting data, the performance of the drift detectors does not depend on how the drifting data are scattered with the non-drifting ones, but rather their amount in the test set
  • Publication
    Design and evaluation of a subsystem (mircoservice) to create a deployment client for AI pipelines based on docker containers using gRPC
    ( 2021-06)
    Naeem, Sajid
    ;
    ;
    Before docker container technology, manual deployment of an application is complex, resource, and time-consuming. With the help of Kubernetes, it is possible to automatically deploy and manage the AI pipelines on a standard cluster. The thesis aims to provide the link between the AI4EU experiment platform catalog from the execution environment to make the system more scalable. In this thesis, we design a solution and implement the Kubernetes client, which takes the AI pipelines topology as input from the catalog and constructs the deployment and service for all the nodes of the AI pipeline for the execution environment. Kubernetes client also generates a container specification based on the pipelines topology, which the orchestrator uses to execute the pipeline. Different AI pipeline s are deployed in separate namespaces with the help of a generic deployment script supporting standard Kubernetes cluster and minikube. The Kubernetes client tested on simple, advance, and hybrid AI pipelines and is also integrated with the production environment of the AI4EU experiment platform and gets feedback from the AI community of this platform.
  • Publication
    Evaluation and re-usable implementation of DL-based approaches for Entity Recognition
    Named-entity recognition (NER) aims to identify and label instances of predefined entities in a chunk of text. Even though conceptually simple, it is a challenging task that requires some amount of context and a good understanding of what constitutes an entity. Being a precursor to other natural language applications such as question answering and text summarization, it is essential to have high-quality NER systems. For a long time, they have relied on domain-specific knowledge or resources such as gazetteers to perform well. In the past decade, deep learning (DL) techniques have been applied to NER. They do not resort to any external resources and have achieved state-of-the-art results. This thesis investigates several aspects of DL-based approaches for NER. Recent improvements mainly com e from utilizing unsupervised language model pretraining to produce representations depending on a words contextual use. Intuitively, more informative embeddings lead to better generalization, that is, detecting mentions that do not appear in the training data for NER models. Several word representations are evaluated for German, using the two biggest available datasets CoNLL-03 and GermEval. The results show that recent contextualized representations improve the entity extraction performance on both datasets, due to being more robust against entity type ambiguities (e.g. Is "Washington" a person or a location?) or lengthy entities (e.g publication titles).Such embeddings are useful for multilingual NER too. Two approaches for generating multilingual embeddings are pitted against each other, in order to find out which is the most useful for extracting entities in a mix of German, English and Dutch data. Another investigated aspect is improving the performance on "low-data" domains through transfer learning, using finetuning. This is motivated by the fact that neural models tend to underperform, due to lack of sufficient data. Finetuning a pre-trained model using contextualized embeddings significantly improves the performance on a relatively small annotated German dataset from Europarl. The final step of this work is providing a re-usable implementation of a DL-based NER model within a framework for building NLP pipelines such as DKPro Core. Challenges of integrating an external Python-based model in a Java-based framework are investigated.
  • Publication
    Design and Evaluation of a Generic Orchestrator that executes an AI pipeline
    The IT industry has seen significant adoption of microservice architecture in recent years. Microservices interact with each other using different protocols. SOAP was integrated with an ever-growing set of protocols called WS-I to promote interoperability. This large set of protocols with SOAP quickly became unpopular as it caused a considerable effort on the user side to make it interoperable. JSON replaced the XML as the serialization technology, and REST replaced SOAP as the communication protocol between web services. REST provided more flexibility, was more efficient and faster as compared to SOAP. However, it uses third-party tools such as swagger to auto-generate code for API calls in various languages, lacks support for streaming, needs semantic versioning whenever the API contract changes. This lead to its limited interoperability. As a variant of RPC architecture, Google created gRPC as a new communication protocol that solved most of the SOAP and REST issues. gRPC uses protocol buffers, usually called protobuf, for the serialization of data. It made it possible to clearly define the clean interfaces between services and supported in-built code generation for various programming languages. Automatic code generation makes it possible to use stubs and skeleton to call the services implemented on the server-side. In this thesis work, we use an open-source framework called Acumos, designed to make it easy to build, share, and deploy AI apps. Acumos has a design studio where users can compose an AI pipeline. Acumos has different orchestrators for different programming languages. However, there is no functionality to execute a generic pipeline that is implemented in multiple programming languages. Using gRPC communication, docker as a containerization tool, Kubernetes as a deployment environment, We propose to design a generic orchestrator capable of running any generic pipeline composed according to AI4EU container specification. This design of a generic orchestrator capable of executing pipelines, it has never seen before, is cutting edge and goes beyond state of the art.
  • Publication
    Extracting and Utilizing templates for Question Answering over Knowledge Graph
    ( 2021)
    Kumar, Uttam
    Knowledge Graphs have been used in many Artificial Intelligence applications worldwide. In this thesis we firstly see how to map structured and semi-structured data from heterogeneous sources to a Resource Data Framework(RDF) based Knowledge Graph(KG) using a unique generic process. Having built an understanding of the ontologies, we move our focus to one of the applications of Knowledge Graphs to answer real world natural language questions. As the main contribution of this thesis, we establish a template based approach to Question Answering over Wikidata Knowledge Graph. We extract question templates from some common QA benchmark datasets and then build a classification model to fetch resultant template for a user question. This method aids the QA System with more faithful semantic interpretation of the user questions into formal query. The resultant template helps relation linking and query building modules of our system thereby reducing the propagation of error in NLP pipelines that can occur for a non-template based traditional QA System. To the best of our knowledge, we are the first to use this template based approach using LCQuAD2[1] and CFQ[2] datasets, that focus on more complex questions and compositional generalization respectively.