Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Integration of UIMA text mining components into an event-based asynchronous microservice architecture

: Hodapp, Sven; Madan, Sumit; Fluck, Juliane; Zimmermann, Marc

Volltext (PDF; )

Eckart de Castilho, R. ; Interop Company Inc.; European Language Resources Association -ELRA-, Paris:
LREC 2016, Workshop "Cross-Platform Text Mining and Natural Language Processing Interoperability". Proceedings : 23 May 2016, Portorož, Slovenia
Portorož/Slovenia, 2016
Language Resources and Evaluation Conference (LREC) <10, 2016, Portoroz/Slovenia>
Workshop "Cross-Platform Text Mining and Natural Language Processing Interoperability" <2016, Portoroz/Slovenia>
European Commission EC
H2020; 654021; OpenMinTeD
Open Mining INfrastructure for TExt and Data
Konferenzbeitrag, Elektronische Publikation
Fraunhofer SCAI ()

Distributed compute resources are necessary for compute-intensive information extraction tasks processing large collections of heterogeneous documents (e.g. patents). For optimal usage of such resources, the breaking down of complex workflows and document sets into independent smaller units is required. The UIMA framework facilitates implementation of modular workflows, which represents an ideal structure for parallel processing. Although UIMA AS already includes parallel processing functionality, we tested two other approaches for distributed computing. First, we integrated UIMA workflows into the grid middleware UNICORE, which allows high performance distributed computing using control structures like loops or branching. While good distribution management and performance is a key requirement, portability, flexibility, interoperability, and easy usage are also desired features. Therefore, as an alternative, we deployed UIMA applications in a microservice architecture that supports all these aspects. We show that UIMA applications are well-suited to run in a microservice architecture while using an event-based asynchronous communication method. These applications communicate through a standardized STOMP message protocol via a message broker. Within this architecture, new applications can easily be integrated, portability is simple, and interoperability also with non-UIMA components is given. Markedly, a first test shows an increase of processing performance in comparison to the UNICORE-based HPC solution.