Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Biological and chemical entity recognition from text using conditional random fields (CRF)

Poster at the German Conference on Bioinformatics (GBC '05), Hamburg, October 5th-7th, 2005
: Revillion, T.; Friedrich, C.M.; Fluck, J.; Hofmann, M.

Volltext urn:nbn:de:0011-n-1116865 (27 KByte PDF)
MD5 Fingerprint: fb4c797d61b119e65d7f08741066fc03
Erstellt am: 1.12.2009

2005, 1 Folie
German Conference on Bioinformatics (GCB) <2005, Hamburg>
Poster, Elektronische Publikation
Fraunhofer SCAI ()
information extraction; biological and chemical entity recognition; Conditional Random Field (CRF); machine learning

Most of the information in the biomedical domain is present as unstructured text. Amongst other uses, this wealth of information can be used to interpret the results of expression experiments or to derive pathways of biological or chemical interactions. Text mining is a possible solution to obtain this information. The first step to efficiently extract information from text is to accurately assign meaningful tags from a well defined ontology to certain entities. For biological entity recognition (BER) tasks, problems arise from the fact that there is no unified nomenclature for protein and gene names that is used by all scientists. Further problems lie in the ambiguity and in the occurrence of multiword terms. Here, we present our work on applying machine learning (ML) techniques for biological and chemical entity recognition (CER) from scientific text with a rich set of features. Our process follows the conventions and data sets provided for the shared task of the 'International Joint Workshop on Natural Language Processing in Biomedicine and its Application 2004' (JNLPBA) [Kim et al. 2004]. The presented work uses the GENIA corpus 3.02 [Kim et al. 2003] containing 2000 MEDLINE abstracts with 400000 words and nearly 100000 hand-coded annotations for biological terms.