• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Ad-hoc queries over document collections - A case study
 
  • Details
  • Full
Options
2010
Conference Paper
Title

Ad-hoc queries over document collections - A case study

Abstract
We discuss the novel problem of supporting analytical business intelligence queries over web-based textual content, e.g., BI-style reports based on 100.000's of documents from an ad-hoc web search result. Neither conventional search engines nor conventional Business Intelligence and ETL tools address this problem, which lies at the intersection of their capabilities. "Google Squared" or our system GOOLAP.info, are examples of these kinds of systems. They execute information extraction methods over one or several document collections at query time and integrate extracted records into a common view or tabular structure. Frequent extraction and object resolution failures cause incomplete records which could not be joined into a record answering the query. Our focus is the identification of join-reordering heuristics maximizing the size of complete records answering a structured query. With respect to given costs for document extraction we propose two novel join-operations: The multi-way CJ-operator joins records from multiple relationships extracted from a single document. The two-way join-operator DJ ensures data density by removing incomplete records from results. In a preliminary case study we observe that our join-reordering heuristics positively impact result size, record density and lower execution costs.
Author(s)
Löser, A.
Lutter, S.
Düssel, P.
Markl, V.
Mainwork
Enabling Real-Time Business Intelligence  
Conference
International Workshop Enabling Real-Time Business Intelligence (BIRTE) 2009  
International Conference on Very Large Databases (VLDB) 2009  
DOI
10.1007/978-3-642-14559-9_4
Language
English
FIRST
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024