Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Layout-Aware Semi-automatic Information Extraction for Pharmaceutical Documents

: Harmata, S.; Hofer-Schmitz, K.; Nguyen, P.-H.; Quix, C.; Bakiu, B.


Silveira, M. da:
Data Integration in the Life Sciences. 12th International Conference, DILS 2017 : Luxembourg, November 14-15, 2017, Proceedings
Cham: Springer International Publishing, 2017 (Lecture Notes in Computer Science 10649)
ISBN: 978-3-319-69750-5 (Print)
ISBN: 978-3-319-69751-2 (Online)
ISBN: 3-319-69750-1
International Conference on Data Integration in the Life Sciences (DILS) <12, 2017, Luxembourg>
Fraunhofer FIT ()

Pharmaceutical companies and regulatory authorities are also affected by the current digitalization process and transform their paper-based, document-oriented communication to a structured, digital information exchange. The documents exchanged so far contain a huge amount of information that needs to be transformed into a structured format to enable a more efficient communication in the future. In such a setting, it is important that the information extracted from documents is very accurate as the information is used in a legal, regulatory process and also for the identification of unknown adverse effects of medicinal products that might be a threat to patients’ health. In this paper, we present our layout-aware semi-automatic information extraction system LASIE that combines techniques from rule-based information extraction, flexible data management, and semantic information management in a user-centered design. We applied the system in a case study with an industrial partner and achieved very satisfying results.