Automatic Gaussian Process Model Retrieval for Big Data
Gaussian Process Models (GPMs) are widely regarded as a prominent tool for capturing the inherent characteristics of data. These bayesian machine learning models allow for data analysis tasks such as regression and classification. Usually a process of automatic GPM retrieval is needed to find an optimal model for a given dataset, despite prevailing default instantiations and existing prior knowledge in some scenarios, which both shortcut the way to an optimal GPM. Since non-approximative Gaussian Processes only allow for processing small datasets with low statistical versatility, we propose a new approach that allows to efficiently and automatically retrieve GPMs for large-scale data. The resulting model is composed of independent statistical representations for non-overlapping segments of the given data. Our performance evaluation of the new approach demonstrates the quality of resulting models, which clearly outperform default GPM instantiations, while maintaining reasonable model training time.