
Publica
Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten. Fusing Multi-label Classification and Semantic Tagging
| Trabold, Daniel (Ed.): Conference "Lernen, Wissen, Daten, Analysen", LWDA 2020. Proceedings. Online resource : Online, September 9-11, 2020 Online im WWW, 2020 (CEUR Workshop Proceedings 2738) http://ceur-ws.org/Vol-2738/ S.88-99 |
| Conference "Lernen, Wissen, Daten, Analysen" (LWDA) <2020, Online> |
| Bundesministerium für Bildung und Forschung BMBF (Deutschland) 01IS18038B Kompetenzzentrum Maschinelles Lernen Rhein-Ruhr |
|
| Englisch |
| Konferenzbeitrag, Elektronische Publikation |
| Fraunhofer IAIS () |
| multi-label classification; semantic tagging; prediction-based embedding spaces; patents |
Abstract
Companies have an increasing demand for enriching documents with metadata. In an applied setting, we present a three-part workflow for the combination of multi-label classification and semantic tagging using a collection of key-phrases. The workflow is illustrated on the basis of patent abstracts with the CPC scheme. The key-phrases are drawn from a training set collection of documents without manual interaction. The union of CPC labels and key-phrases provides a label set on which a multi-label classifier model is generated by supervised training. We show learning curves for both key-phrases and classification categories, and a semantic graph generated from cosine similarities. We conclude that, given sufficient training data, the number of label categories is highly scalable.