• English
  • Deutsch
  • Log In
    Password Login
    or
  • Research Outputs
  • Projects
  • Researchers
  • Institutes
  • Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Artikel
  4. Connecting firm's web scraped textual content to body of science: Utilizing microsoft academic graph hierarchical topic modeling
 
  • Details
  • Full
Options
2022
Journal Article
Titel

Connecting firm's web scraped textual content to body of science: Utilizing microsoft academic graph hierarchical topic modeling

Abstract
This paper demonstrates a method to transform and link textual information scraped from companies’ websites to the scientific body of knowledge. The method illustrates the benefit of Natural Language Processing (NLP) in creating links between established economic classification systems with novel and agile constructs that new data sources enable. Therefore, we experimented on the European classification of economic activities (known as NACE) on sectoral and company levels. We established a connection with Microsoft Academic Graph hierarchical topic modeling based on companies’ website content. Central to the operationalization of our method are a web scraping process, NLP and a data transformation/linkage procedure. The method contains three main steps: data source identification, raw data retrieval, and data preparation and transformation. These steps are applied to two distinct data sources.
Author(s)
Hajikhani, Arash
VTT Technical Research Center of Finland
Pukelis, Lukas
Public Policy and Management Institute, Lithuania
Suominen, Arho
VTT Technical Research Centre of Finland
Ashouri, Sajad
VTT Technical Research Centre of Finland
Schubert, Torben
Fraunhofer-Institut fĂĽr System- und Innovationsforschung ISI
Notten, Ad
Maastricht University School of Business and Economics
Cunningham, Scott W.
University of Strathclyde, Department of Government
Zeitschrift
MethodsX
Thumbnail Image
DOI
10.1016/j.mex.2022.101650
Language
English
google-scholar
Fraunhofer-Institut fĂĽr System- und Innovationsforschung ISI
Tags
  • Natural language proc...

  • Economic classificati...

  • Knowledge transformat...

  • Web scraping

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Send Feedback
© 2022