Fine-grained named entities for Corona news

CC BY 4.0Efeoglu, SefikaSefikaEfeogluPaschke, AdrianAdrianPaschke2023-07-202023-07-202023https://publica.fraunhofer.de/handle/publica/445837https://doi.org/10.24406/publica-166310.24406/publica-1663Information resources such as newspapers have produced unstructured text data in various languages related to the corona outbreak since December 2019. Analyzing these unstructured texts is time-consuming without representing them in a structured format; therefore, representing them in a structured format is crucial. An information extraction pipeline with essential tasks-named entity tagging and relation extraction-to accomplish this goal might be applied to these texts. This study proposes a data annotation pipeline to generate training data from corona news articles, including generic and domain-specific entities. Named entity recognition models are trained on this annotated corpus and then evaluated on test sentences manually annotated by domain experts evaluating the performance of a trained model. The code base and demonstration are available at https://github.com/sefeoglu/coronanews-ner.git.encorona newsnamed entity recognitionfine-grained entitiescontextual embeddingFine-grained named entities for Corona newsconference paper