Options
2015
Conference Paper
Titel
A lightweight approach for extracting product records from the web
Abstract
Gathering product records from the Web is very important to both shoppers and on-line retailers for the purpose of comparing products and prices. For consumers, the reason for doing this is to find the best price for a product, whereas on-line retailers want to compare their offers with those of their competitors in order to remain competitive. Due to the huge number and vast array of product offers in the Web an automated approach for collecting product data is needed. In this paper we propose a lightweight approach to automatically identify and extract product records from arbitrary e-shop websites. For this purpose we have adopted and extended the existing technique called Tag Path Clustering for clustering similar HTML tag paths and developed a novel filtering mechanism especially for extracting product records from websites.