• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Artikel
  4. Connecting firm's web scraped textual content to body of science: Utilizing microsoft academic graph hierarchical topic modeling
 
  • Details
  • Full
Options
2022
Journal Article
Title

Connecting firm's web scraped textual content to body of science: Utilizing microsoft academic graph hierarchical topic modeling

Abstract
This paper demonstrates a method to transform and link textual information scraped from companies’ websites to the scientific body of knowledge. The method illustrates the benefit of Natural Language Processing (NLP) in creating links between established economic classification systems with novel and agile constructs that new data sources enable. Therefore, we experimented on the European classification of economic activities (known as NACE) on sectoral and company levels. We established a connection with Microsoft Academic Graph hierarchical topic modeling based on companies’ website content. Central to the operationalization of our method are a web scraping process, NLP and a data transformation/linkage procedure. The method contains three main steps: data source identification, raw data retrieval, and data preparation and transformation. These steps are applied to two distinct data sources.
Author(s)
Hajikhani, Arash
VTT Technical Research Center of Finland
Pukelis, Lukas
Public Policy and Management Institute, Lithuania
Suominen, Arho
VTT Technical Research Centre of Finland  
Ashouri, Sajad
VTT Technical Research Centre of Finland  
Schubert, Torben  orcid-logo
Fraunhofer-Institut für System- und Innovationsforschung ISI  
Notten, Ad
Maastricht University School of Business and Economics
Cunningham, Scott W.
University of Strathclyde, Department of Government
Journal
MethodsX  
Open Access
DOI
10.1016/j.mex.2022.101650
Additional full text version
Landing Page
Language
English
Fraunhofer-Institut für System- und Innovationsforschung ISI  
Keyword(s)
  • Natural language processing

  • Economic classification scheme

  • Knowledge transformation

  • Web scraping

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024