Using public domain metrics to estimate software development effort
In this paper we investigate the accuracy of cost estimates when using different modeling techniques. The techniques applied in this study are ordinary least squares regression (OLS), Analogy-based estimation, stepwise ANOVA, regression trees (CART), and robust regression. This is the first test of robust regression on a large-scale industrial data set. We compare the accuracy of the estimates for one organization when using their own data with the accuracy when using carefully matched data from the rest of the ISBSG data set. This analysis reveals that when using the ISBSG data set for the one organization, the most accurate techniques are robust regression and OLS. When using the organizations own data as the basis for the estimates we found that most of the techniques performed equally well (except for two variants of the techniques). In contrast to previous studies, the accuracy when using the companies own data for estimation is significantly higher than when using the rest of the ISBSG data set. The only non-significant difference occurred with a CART variation. The implications of these results are that when a company collects its own data, it does not really matter which technique to apply to achieve reasonable accuracy. This will of course be influenced by the variability in the company s own data set. If a company contributes to the ISBSG data set, the use of a robust technique, such as robust regression is recommended from these results. As the used data set is relatively large, these results are expected to be somewhat generalizable. However, some of the above results are different when compared to the results from previous studies.