Name Disambiguation

Rasheed, Muhammad Irtaza Bin

2020

Master Thesis

Abstract

Name ambiguity is a challenge and critical problem in many applications, such as scientific literature management, trend analysis etc. The main reason of this is due to different name abbreviations, identical names, name misspellings in publications and bibliographies. An author may have multiple names and multiple authors may have the same name. So when we look for a particular name, many documents containing that person's name may be returned or missed because of the author's different style of writing their name. This can produce name ambiguity which affects the performance of document retrieval, web search, database integration, and may result improper classification of authors. Previously, many clustering based algorithm have been proposed, but the problem still remains largely unsolved for both research and industry communities, specially with the fast growth of information available. The aim of this thesis is the implementation of a universal name disambiguation approach that considers almost any existing property to identify authors. After an author of a paper is identified, the normalized name writing form on the paper is used to refine the author model and even give an overview about the different writing forms of the author's name. This can be achieved by first examine the research on Human-Computer Interaction specifically with focus on (Visual) Trend Analysis. Furthermore, a research on different name disambiguation techniques. After that, building a concept and implementing a generalized method to identify author name and affiliation disambiguation while evaluating different properties.

Thesis Note

Darmstadt, TU, Master Thesis, 2020

Author(s)

Rasheed, Muhammad Irtaza Bin

Advisor(s)

Kuijper, Arjan

Fraunhofer-Institut für Graphische Datenverarbeitung IGD

Burkhardt, Dirk

Hochschule Darmstadt

Publishing Place

Darmstadt

Options

Name Disambiguation