Towards information extraction from ISR reports for decision support using a two-stage learning-based approach
The main challenge of computer linguistics is to represent the meaning of text in a computer model. Statistics based methods with manually created features have been used for more than 30 years with a divide and conquer approach to mark interesting features in free text. Around 2010, deep learning concepts found their way into the text-understanding research community. Deep learning is very attractive and easy to apply but needs massive pools of annotated and high quality data from every target domain, which is generally not available especially for the military domain. When changing the application domain one needs additional or new data to adopt the language models to the new domain. To overcome the everlasting "data problem" we chose a novel two-step approach by first using formal representations of the meaning and then applying a rule-based mapping to the target domain. As an intermediate language representation, we used abstract meaning representation (AMR) and trained a general base model. This base model was then trained with additional data from the intended domains (transfer learning) evaluating the quality of the parser with a stepwise approach in which we measured the parser performance against the amount of training data. This approach answered the question of how much data we need to get the required quality when changing an application domain. The mapping of the meaning representation to the target domain model gave us more control over specifics of the domain, which are not generally representable by a machine learning approach with self-learned feature vectors.