• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Anderes
  4. On minimizing the training set fill distance in machine learning regression
 
  • Details
  • Full
Options
2023
Paper (Preprint, Research Paper, Review Paper, White Paper, etc.)
Title

On minimizing the training set fill distance in machine learning regression

Title Supplement
Published on arXiv
Abstract
For regression tasks one often leverages large datasets for training predictive machine learning models. However, using large datasets may not be feasible due to computational limitations or high data labelling costs. Therefore, suitably selecting small training sets from large pools of unlabelled data points is essential to maximize model performance while maintaining efficiency. In this work, we study Farthest Point Sampling (FPS), a data selection approach that aims to minimize the fill distance of the selected set. We derive an upper bound for the maximum expected prediction error, conditional to the location of the unlabelled data points, that linearly depends on the training set fill distance. For empirical validation, we perform experiments using two regression models on three datasets. We empirically show that selecting a training set by aiming to minimize the fill distance, thereby minimizing our derived bound, significantly reduces the maximum prediction error of various regression models, outperforming alternative sampling approaches by a large margin. Furthermore, we show that selecting training sets with the FPS can also increase model stability for the specific case of Gaussian kernel regression approaches.
Author(s)
Climaco, Paolo
Universität Bonn, Institut für Numerische Simulation
Garcke, Jochen  
Fraunhofer-Institut für Algorithmen und Wissenschaftliches Rechnen SCAI  
DOI
10.48550/arXiv.2307.10988
Language
English
Fraunhofer-Institut für Algorithmen und Wissenschaftliches Rechnen SCAI  
Fraunhofer Group
Fraunhofer-Verbund IUK-Technologie  
Keyword(s)
  • Fill distance

  • Farthest Point Sampling

  • Regression

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024