Now showing 1 - 7 of 7
  • Publication
    From Open Set Recognition Towards Robust Multi-class Classification
    The challenges and risks of deploying deep neural networks (DNNs) in the open-world are often overlooked and potentially result in severe outcomes. With our proposed informer approach, we leverage autoencoder-based outlier detectors with their sensitivity to epistemic uncertainty by ensembling multiple detectors each learning a different one-vs-rest setting. Our results clearly show informer’s superiority compared to DNN ensembles, kernel-based DNNs, and traditional multi-layer perceptrons (MLPs) in terms of robustness to outliers and dataset shift while maintaining a competitive classification performance. Finally, we show that informer can estimate the overall uncertainty within a prediction and, in contrast to any of the other baselines, break the uncertainty estimate down into aleatoric and epistemic uncertainty. This is an essential feature in many use cases, as the underlying reasons for the uncertainty are fundamentally different and can require different actions.
  • Publication
    Decoupling Autoencoders for Robust One-vs-Rest Classification
    One-vs-Rest (OVR) classification aims to distinguish a single class of interest from other classes. The concept of novelty detection and robustness to dataset shift becomes crucial in OVR when the scope of the rest class extends from the classes observed during training to unseen and possibly unrelated classes. In this work, we propose a novel architecture, namely Decoupling Autoencoder (DAE) to tackle the common issue of robustness w.r.t. out-of-distribution samples which is prevalent in classifiers such as multi-layer perceptrons (MLP) and ensemble architectures. Experiments on plain classification, outlier detection, and dataset shift tasks show DAE to achieve robust performance across these tasks compared to the baselines, which tend to fail completely, when exposed to dataset shift. W hile DAE and the baselines yield rather uncalibrated predictions on the outlier detection and dataset shift task, we found that DAE calibration is more stable across all tasks. Therefore, calibration measures applied to the classification task could also improve the calibration of the outlier detection and dataset shift scenarios for DAE.
  • Publication
    Utilizing Representation Learning for Robust Text Classification Under Datasetshift
    Within One-vs-Rest (OVR) classification, a classifier differentiates a single class of interest (COI) from the rest, i.e. any other class. By extending the scope of the rest class to corruptions (dataset shift), aspects of outlier detection gain relevancy. In this work, we show that adversarially trained autoencoders (ATA) representative of autoencoder-based outlier detection methods, yield tremendous robustness improvements over traditional neural network methods such as multi-layer perceptrons (MLP) and common ensemble methods, while maintaining a competitive classification performance. In contrast, our results also reveal that deep learning methods solely optimized for classification, tend to fail completely when exposed to dataset shift.
  • Publication
    Automatic Indexing of Financial Documents via Information Extraction
    ( 2021) ; ;
    Bell , Thiago
    Gebauer, Michael
    Ulusay, Bilge
    Uedelhoven, Daniel
    Dilmaghani, Tim
    Loitz, Rüdiger
    ; ;
    The problem of extracting information from large volumes of unstructured documents is pervasive in the domain of financial business. Enterprises and investors need automatic methods that can extract information from these documents, particularly for indexing and efficiently retrieving information. To this end, we present a scalable end-to-end document processing system for indexing and information retrieval from large volumes of financial documents. While we show our system works for the use case of financial document processing, the entire system itself is agnostic of the document type and machine learning model type. Thus, it can be applied to any large-scale document processing task involving domain-specific extractors.
  • Publication
    Toxicity Detection in Online Comments with Limited Data: A Comparative Analysis
    We present a comparative study on toxicity detection, focusing on the problem of identifying toxicity types of low prevalence and possibly even unobserved at training time. For this purpose, we train our models on a dataset that contains only a weak type of toxicity, and test whether they are able to generalize to more severe toxicity types. We find that representation learning and ensembling exceed the classification performance of simple classifiers on toxicity detection, while also providing significantly better generalization and robustness. All models benefit from a larger training set size, which even extends to the toxicity types unseen during training.
  • Publication
    Supervised autoencoder variants for end to end anomaly detection
    Despite the success of deep learning in various domains such as natural language processing, speech recognition, and computer vision, learning from a limited amount of samples and generalizing to unseen data still pose challenges. Notably, in the tasks of outlier detection and imbalanced dataset classification, the label of interest is either scarce or its distribution is skewed, causing aggravated generalization problems. In this work, we pursue the direction of multi-task learning, specifically the idea of using supervised autoencoders (SAE), which allows us to combine unsupervised and supervised objectives in an end to end fashion. We extend this approach by introducing an adversarial supervised objective to enrich the representations which are learned for the classification task. We conduct thorough experiments on a broad range of tasks, including outlier detection, novelty detection, and imbalanced classification, and study the efficacy of our method against standard baselines using autoencoders. Our work empirically shows that the SAE methods outperform one class autoencoders, adversarially trained autoencoders and multi layer perceptrons in terms of AUPR score comparison. Additionally, our analysis of the obtained representations suggests that the adversarial reconstruction loss functions enforce the encodings to separate into class-specific clusters, which was not observed for non-adversarial reconstruction loss functions.
  • Publication
    From Imbalanced Classification to Supervised Outlier Detection Problems: Adversarially Trained Auto Encoders
    Imbalanced datasets pose severe challenges in training well performing classifiers. This problem is also prevalent in the domain of outlier detection since outliers occur infrequently and are generally treated as minorities. One simple yet powerful approach is to use autoencoders which are trained on majority samples and then to classify samples based on the reconstruction loss. However, this approach fails to classify samples whenever reconstruction errors of minorities overlap with that of majorities. To overcome this limitation, we propose an adversarial loss function that maximizes the loss of minorities while minimizing the loss for majorities. This way, we obtain a well-separated reconstruction error distribution that facilitates classification. We show that this approach is robust i n a wide variety of settings, such as imbalanced data classification or outlier- and novelty detection.