Options
2000
Conference Paper
Title
A comparative Study of Cost Modelling Techniques using Public Domain multi-organisational and company-specific Data
Abstract
This research examines the use of the International Software Benchmarking Standards Group (ISBSG) repository, which is a large database of completed software projects from different organizations, for estimating the required effort for new software projects. The accuracy of the estimates based on this repository is compared with the results obtained from using a one-company data set from a company called Megatec. This study investigates two questions: (1) What are the differences in accuracy between a traditional technique such as ordinary least-squares (OLS) regression and Analogy-based estimation? (2) Is there a difference between estimates derived from multi-company data and estimates derived from company-specific data? Regarding the first question, our results show that OLS regression performs as well (when based on one-company data) and significantly better than (when based on multi-organizational data) Analogy-based estimation. This result is in contrast to previous studies that showed promising results applying Analogy on software engineering data. On the other hand, the result confirms the outcomes of investigating Analogy on another large multi-organizational database (called Laturi) from the business applications domain. Addressing the second question, we found two results. When applying Analogy, significantly more accurate models could be built based on company-specific data than based on multi-organizational data. The results reveal that Analogy-based procedures do not seem as robust when using data external to the organization for which the model is built. When applying OLS regression, no significant advantage was found when using local, company-specific data opposed to multi-organizational data. Again, this result is consistent with a previously performed comprehensive comparison on the Laturi database as well as on the ESA database. We plan to further investigate the reasons for consistencies and inconsistencies in the current and previous results to derive generalizable conclusions.