Fraunhofer-Gesellschaft

Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Towards information extraction from ISR reports for decision support using a two-stage learning-based approach

 
: Mühlenberg, Dirk; Kuwertz, Achim; Schenkel, P.; Müller, Wilmuth

:
Postprint urn:nbn:de:0011-n-5521250 (817 KByte PDF)
MD5 Fingerprint: 16596cf36c3909538917747e77c49e55
Copyright Society of Photo-Optical Instrumentation Engineers. One print or electronic copy may be made for personal use only. Systematic reproduction and distribution, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited.
Erstellt am: 19.7.2019


Suresh, Raja (Ed.) ; Society of Photo-Optical Instrumentation Engineers -SPIE-, Bellingham/Wash.:
Open Architecture/Open Business Model Net-Centric Systems and Defense Transformation : 16–18 April 2019 Baltimore, Maryland, United States
Bellingham, WA: SPIE, 2019 (Proceedings of SPIE 11015)
ISBN: 978-1-5106-2695-9
ISBN: 978-1-5106-2696-6
Paper 110150P, 12 S.
Conference "Defense and Commercial Sensing" (DCS) <2019, Baltimore/Md.>
Englisch
Konferenzbeitrag, Elektronische Publikation
Fraunhofer IOSB ()
text understanding; deep learning; transfer learning; AMR; data sparsity

Abstract
The main challenge of computer linguistics is to represent the meaning of text in a computer model. Statistics based methods with manually created features have been used for more than 30 years with a divide and conquer approach to mark interesting features in free text. Around 2010, deep learning concepts found their way into the text-understanding research community. Deep learning is very attractive and easy to apply but needs massive pools of annotated and high quality data from every target domain, which is generally not available especially for the military domain. When changing the application domain one needs additional or new data to adopt the language models to the new domain. To overcome the everlasting "data problem" we chose a novel two-step approach by first using formal representations of the meaning and then applying a rule-based mapping to the target domain. As an intermediate language representation, we used abstract meaning representation (AMR) and trained a general base model. This base model was then trained with additional data from the intended domains (transfer learning) evaluating the quality of the parser with a stepwise approach in which we measured the parser performance against the amount of training data. This approach answered the question of how much data we need to get the required quality when changing an application domain. The mapping of the meaning representation to the target domain model gave us more control over specifics of the domain, which are not generally representable by a machine learning approach with self-learned feature vectors.

: http://publica.fraunhofer.de/dokumente/N-552125.html