Semantic Extraction of Key Figures and their Properties from Tax Legal Texts Using Neural Models

CC BY 4.0Steinigen, DanielDanielSteinigenNamysl, MarcinMarcinNamyslHepperle, MarkusMarkusHepperleKrekeler, JanJanKrekelerLandgraf, SusanneSusanneLandgraf2023-08-222023-08-222023https://publica.fraunhofer.de/handle/publica/448529https://doi.org/10.24406/publica-179010.24406/publica-1790Applying information extraction to legislative texts is a challenging task that requires a specification to distinguish the relevant parts from the less relevant parts of the text. Moreover, there is still a lack of appropriate language- and domain-specific data in the field of information extraction. This work investigates the extraction and modeling of key figures from legal texts. We introduce a universally applicable annotation scheme together with a semantic model for key figures and their logically connected properties in legal texts. Moreover, we release KeyFiTax, a dataset with key figures based on paragraphs of German tax acts manually annotated by tax experts together with a knowledge graph populated from these paragraphs based on our semantic model. Using our dataset, we also evaluate and compare state-of-the-art entity extraction models in terms of long entity spans and low-resource data. Furthermore, we present a transformer-based approach for relation extraction using entity markers to obtain a logical formulation of the key figures. Finally, we introduce task triggers for training a combined resource-efficient entity and relation extraction model. We make our dataset together with the semantic model and the knowledge graph, as well as the implementation of the entity and relation extraction approaches investigated in this work public.eninformation extractionSemantic Extraction of Key Figures and their Properties from Tax Legal Texts Using Neural Modelsconference paper