Fraunhofer-Gesellschaft

Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

A lightweight approach for extracting product records from the web

 
: Horch, Andrea; Kett, Holger; Weisbecker, Anette

Monfort, Valérie (Ed.) ; Institute for Systems and Technologies of Information, Control and Communication -INSTICC-, Setubal:
11th International Conference on Web Information Systems and Technologies 2015. Proceedings : Lisbon, Portugal, 20 - 22 May, 2015
SciTePress, 2015
ISBN: 978-989-758-106-9
S.420-430
International Conference on Web Information Systems and Technologies (WEBIST) <11, 2015, Lisbon>
European Commission EC
FP7-SME; 315637; SME E-COMPASS
E-COMmerce Proficient Analytics in Security and Sales for SMEs
Englisch
Konferenzbeitrag
Fraunhofer IAO ()

Abstract
Gathering product records from the Web is very important to both shoppers and on-line retailers for the purpose of comparing products and prices. For consumers, the reason for doing this is to find the best price for a product, whereas on-line retailers want to compare their offers with those of their competitors in order to remain competitive. Due to the huge number and vast array of product offers in the Web an automated approach for collecting product data is needed. In this paper we propose a lightweight approach to automatically identify and extract product records from arbitrary e-shop websites. For this purpose we have adopted and extended the existing technique called Tag Path Clustering for clustering similar HTML tag paths and developed a novel filtering mechanism especially for extracting product records from websites.

: http://publica.fraunhofer.de/dokumente/N-375161.html