Fusing Multi-label Classification and Semantic Tagging
Companies have an increasing demand for enriching documents with metadata. In an applied setting, we present a three-part workflow for the combination of multi-label classification and semantic tagging using a collection of key-phrases. The workflow is illustrated on the basis of patent abstracts with the CPC scheme. The key-phrases are drawn from a training set collection of documents without manual interaction. The union of CPC labels and key-phrases provides a label set on which a multi-label classifier model is generated by supervised training. We show learning curves for both key-phrases and classification categories, and a semantic graph generated from cosine similarities. We conclude that, given sufficient training data, the number of label categories is highly scalable.