Options
2010
Conference Paper
Title
A Tajik extension of the multilingual information extraction system ZENON
Abstract
The new deployments of the German Federal Armed Forces create the necessity to analyze large quantities of intelligence reports and other documents written in different languages. We set up the research project ZENON, in which a multilingual information extraction approach is used for (partial) content analysis from texts written in different languages. At the moment the ZENON system is able to (partially) process English documents and documents written in Dari that are restricted in structure and vocabulary. In this paper, we present the functionality to do information extraction for Tajik texts as well. The Tajik module created will further extend our research system. We show how named entities, verbs and compound verb phrases from documents written in Tajik are extracted and represented. We also show how a simple word-to-word translation is integrated into the system. After a short introduction, the current multilingual information extraction project ZENON is explained. In the main part of the paper, our approach to information extraction from Tajik texts is described in detail.