Now showing 1 - 10 of 49
  • Publication
    Grundlagen des Maschinellen Lernens
    Zu definieren, was die menschliche Intelligenz sowie intelligentes Handeln – und da­mit auch die Künstliche Intelligenz – ausmacht, ist außerordentlich schwer und be­schäftigt Philosophen und Psychologen seit Jahrtausenden. Allgemein anerkannt istaber, dass die Fähigkeit zu lernen ein zentrales Merkmal vonIntelligenzist. So ist auchdas Forschungsgebiet desMaschinellen Lernens(engl.machine learning, ML) ein zen­traler Teil der Künstlichen Intelligenz, das hinter vielen aktuellen Erfolgen von KI-Sys­temen steckt.
  • Publication
    Visual Analytics in the Aviation and Maritime Domains
    ( 2020)
    Andrienko, Gennady
    ;
    Andrienko, Natalia
    ;
    ; ;
    Cordero Garcia, Jose Manuel
    ;
    Scarlatti, David
    ;
    Vouros, George A.
    ;
    Herranz, Ricardo
    ;
    Marcos, Rodrigo
    Visual analytics is a research discipline that is based on acknowledging the power and the necessity of the human vision, understanding, and reasoning in data analysis and problem solving. It develops a methodology of analysis that facilitates human activities by means of interactive visual representations of information. By examples from the domains of aviation and maritime transportation, we demonstrate the essence of the visual analytics methods and their utility for investigating properties of available data and analysing data for understanding real-world phenomena and deriving valuable knowledge. We describe four case studies in which distinct kinds of knowledge have been derived from trajectories of vessels and airplanes and related spatial and temporal data by human analytical reasoning empowered by interactive visual interfaces combined with computational operations.
  • Publication
    Data science in healthcare. Benefits, challenges and opportunities
    ( 2019)
    Abedjan, Z.
    ;
    Boujemaa, N.
    ;
    Campbell, S.
    ;
    Casla, P.
    ;
    Chatterjea, S.
    ;
    Consoli, S.
    ;
    Costa-Soria, C.
    ;
    Czech, P.
    ;
    Despenic, M.
    ;
    Garattini, C.
    ;
    Hamelinck, D.
    ;
    Heinrich, A.
    ;
    Kraaij, W.
    ;
    Kustra, J.
    ;
    Lojo, A.
    ;
    Sanchez, M.M.
    ;
    Mayer, M.A.
    ;
    Melideo, M.
    ;
    Menasalvas, E.
    ;
    Aarestrup, F.M.
    ;
    Artigot, E.N.
    ;
    Petkovic, M.
    ;
    Recupero, D.R.
    ;
    Gonzalez, A.R.
    ;
    Kerremans, G.R.
    ;
    Roller, R.
    ;
    Romao, M.
    ;
    ;
    Sasaki, F.
    ;
    Spek, W.
    ;
    Stojanovic, N.
    ;
    Thoms, J.
    ;
    Vasiljevs, A.
    ;
    Verachtert, W.
    ;
    Wuyts, R.
    The advent of digital medical data has brought an exponential increase in information available for each patient, allowing for novel knowledge generation methods to emerge. Tapping into this data brings clinical research and clinical practice closer together, as data generated in ordinary clinical practice can be used towards rapid-learning healthcare systems, continuously improving and personalizing healthcare. In this context, the recent use of Data Science technologies for healthcare is providing mutual benefits to both patients and medical professionals, improving prevention and treatment for several kinds of diseases. However, the adoption and usage of Data Science solutions for healthcare still require social capacity, knowledge and higher acceptance. The goal of this chapter is to provide an overview of needs, opportunities, recommendations and challenges of using (Big) Data Science technologies in the healthcare sector. This contribution is based on a recent whitepaper (http://www.bdva.eu/sites/default/files/Big%20Data%20Technologies%20in%20Healthcare.pdf) provided by the Big Data Value Association (BDVA) (http://www.bdva.eu/), the private counterpart to the EC to implement the BDV PPP (Big Data Value PPP) programme, which focuses on the challenges and impact that (Big) Data Science may have on the entire healthcare chain.
  • Publication
    Improving Word Embeddings Using Kernel PCA
    Word-based embedding approaches such as Word2Vec capture the meaning of words and relations between them, particularly well when trained with large text collections; however, they fail to do so with small datasets. Extensions such as fastText reduce the amount of data needed slightly, however, the joint task of learning meaningful morphology, syntactic and semantic representations still requires a lot of data. In this paper, we introduce a new approach to warm-start embedding models with morphological information, in order to reduce training time and enhance their performance. We use word embeddings generated using both word2vec and fastText models and enrich them with morphological information of words, derived from kernel principal component analysis (KPCA) of word similarity matrices. This can be seen as explicitly feeding the network morphological similarities and letting it learn semantic and syntactic similarities. Evaluating our models on word similarity and analogy tasks in English and German, we find that they not only achieve higher accuracies than the original skip-gram and fastText models but also require significantly less training data and time. Another benefit of our approach is that it is capable of generating a high-quality representation of infrequent words as, for example, found in very recent news articles with rapidly changing vocabularies. Lastly, we evaluate the different models on a downstream sentence classification task in which a CNN model is initialized with our embeddings and find promising results.
  • Publication
    A review of machine learning for the optimization of production processes
    Due to the advances in the digitalization process of the manufacturing industry and the resulting available data, there is tremendous progress and large interest in integrating machine learning and optimization methods on the shop floor in order to improve production processes. Additionally, a shortage of resources leads to increasing acceptance of new approaches, such as machine learning to save energy, time, and resources, and avoid waste. After describing possible occurring data types in the manufacturing world, this study covers the majority of relevant literature from 2008 to 2018 dealing with machine learning and optimization approaches for product quality or process improvement in the manufacturing industry. The review shows that there is hardly any correlation between the used data, the amount of data, the machine learning algorithms, the used optimizers, and the respective problem from the production. The detailed correlations between these criteria and the recent progress made in this area as well as the issues that are still unsolved are discussed in this paper.
  • Publication
    Noise Reduction in Distant Supervision for Relation Extraction Using Probabilistic Soft Logic
    The performance of modern relation extraction systems is to a great degree dependent on the size and quality of the underlying training corpus and in particular on the labels. Since generating these labels by human annotators is expensive, Distant Supervision has been proposed to automatically align entities in a knowledge base with a text corpus to generate annotations. However, this approach suffers from introducing noise, which negatively affects the performance of relation extraction systems. To tackle this problem, we propose a probabilistic graphical model which simultaneously incorporates different sources of knowledge such as domain experts knowledge about the context and linguistic knowledge about the sentence structure in a principled way. The model is defined using the declarati ve language provided by Probabilistic Soft Logic. Experimental results show that the proposed approach, compared to the original distantly supervised set, not only improves the quality of such generated training data sets, but also the performance of the final relation extraction model. The performance of modern relation extraction systems is to a great degree dependent on the size and quality of the underlying training corpus and in particular on the labels. Since generating these labels by human annotators is expensive, Distant Supervision has been proposed to automatically align entities in a knowledge base with a text corpus to generate annotations. However, this approach suffers from introducing noise, which negatively affects the performance of relation extraction systems. To tackle this problem, we propose a probabilistic graphical model which simultaneously incorporates different sources of knowledge such as domain experts knowledge about the context and linguistic knowledge about the sentence structure in a principled way. The model is defined using the declarati ve language provided by Probabilistic Soft Logic. Experimental results show that the proposed approach, compared to the original distantly supervised set, not only improves the quality of such generated training data sets, but also the performance of the final relation extraction model.
  • Publication
    Big Data in Gesundheitswesen und Medizin
    In Medizin und Gesundheitswesen sind immer größere Mengen immer vielfältigerer Daten verfügbar, die zunehmend schneller generiert werden. Dieser allgemeine Trend wird als Big Data bezeichnet. Die Analyse von Big Data mit Methoden des maschinellen Lernens führt zur Entwicklung innovativer Lösungen, die neue medizinische Einsichten generieren und die Qualität und Effizienz im Gesundheitssystem erhöhen können. Prototypische Beispiele existieren im Bereich der Analyse klinischer Texte, der klinischen Entscheidungsunterstützung, der Analyse von Daten aus öffentlichen Datenquellen oder Wearables und in Form der Entwicklung persönlicher Assistenten. Diese Potenziale bringen aber auch neue Herausforderungen im Bereich Datenschutz und in der Transparenz bzw. Nachvollziehbarkeit der Ergebnisse für den medizinischen Experten mit sich.
  • Publication
    Robust End-User-Driven Social Media Monitoring for Law Enforcement and Emergency Monitoring
    Nowadays social media mining is broadly used in the security sector to support law enforcement and to increase response time in emergency situations. One approach to go beyond the manual inspection is to use text mining technologies to extract latent topics, analyze their geospatial distribution and to identify the sentiment from posts. Although widely used, this approach has proven to be technically difficult for end-users: the language used on social media platforms rapidly changes and the domain varies according to the use case. This paper presents a monitoring architecture that analyses streams from social media, combines different machine learning approaches and can be easily adapted and enriched by user knowledge without the need for complex tuning. The framework is modeled based on the requirements of two H2020-projects in the area of community policing and emergency response.
  • Publication
  • Publication
    E2mC: Improving Emergency Management Service Practice through Social Media and Crowdsourcing Analysis in Near Real Time
    ( 2017)
    Havas, C.
    ;
    Resch, B.
    ;
    Francalanci, C.
    ;
    Pernici, B.
    ;
    Scalia, G.
    ;
    Fernandez-Marquez, J.L.
    ;
    Achte, T. Van
    ;
    Zeug, G.
    ;
    Mondardini, M.R.R.
    ;
    Grandoni, D.
    ;
    ;
    Kalas, M.
    ;
    Lorini, V.
    ;
    In the first hours of a disaster, up-to-date information about the area of interest is crucial for effective disaster management. However, due to the delay induced by collecting and analysing satellite imagery, disaster management systems like the Copernicus Emergency Management Service (EMS) are currently not able to provide information products until up to 48-72 h after a disaster event has occurred. While satellite imagery is still a valuable source for disaster management, information products can be improved through complementing them with user-generated data like social media posts or crowdsourced data. The advantage of these new kinds of data is that they are continuously produced in a timely fashion because users actively participate throughout an event and share related information. The research project Evolution of Emergency Copernicus services (E2mC) aims to integrate these novel data into a new EMS service component called Witness, which is presented in this paper. Like this, the timeliness and accuracy of geospatial information products provided to civil protection authorities can be improved through leveraging user-generated data. This paper sketches the developed system architecture, describes applicable scenarios and presents several preliminary case studies, providing evidence that the scientific and operational goals have been achieved.