Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

N-way Diff: Set-based Comparison of Software Variants

: Duszynski, Slawomir; Tenev, Vasil L.; Becker, Martin


Anslow, C. ; Institute of Electrical and Electronics Engineers -IEEE-:
8th IEEE Working Conference on Software Visualization, VISSOFT 2020. Proceedings : 28-29 September 2020, Adelaide, SA, Australia
Piscataway, NJ: IEEE, 2020
ISBN: 978-1-72819-914-6
ISBN: 978-1-72819-915-3
Working Conference on Software Visualization (VISSOFT) <8, 2020, Online>
Fraunhofer IESE ()
software comparison; software reuse; similarity; set model; set visualization; software variability; product lines

Software is frequently developed in many similar copies, called forks or cloned software variants. During this development, pairwise comparison is routinely used for finding differences between the cloned copies, assessing their similarity, and merging the content. However, analyzing the similarity of a large group of variants using pairwise comparison is a relatively difficult task, as the number of compared pairs grows quadratically with the number of variants. Furthermore, the result of such group of pairwise comparisons is difficult to visualize. In this paper, we discuss the problem of N-way comparison of cloned software variants. We represent the N-way comparison result as a model of N intersecting sets. By aggregating the sets along the system decomposition hierarchy, we construct the sets at every level of the system structure (files, folders, and whole systems). We define a generalized approach for set model construction, and instantiate it for an N-way diff on the textual code representation. We propose set-based visualizations for the N-way comparison, which scale for more than ten component variants and MLOC-sized components. We evaluate the approach by applying it to several groups of industrial software system variants and by performing a controlled experiment with a comparison of 5 software forks. In the experiment, the group using set-based comparison solved the tasks in 58% less time and with 92% fewer incorrect answers than the group using pairwise comparison. Finally, we propose a generalization of the approach beyond software, to enable set-based comparison and similarity visualization for hierarchically structured models and data, for example genomes.