• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Scopus
  4. Bytewise approximate matching: Evaluating common scenarios for executable files
 
  • Details
  • Full
Options
2025
Journal Article
Title

Bytewise approximate matching: Evaluating common scenarios for executable files

Abstract
This research explores the application of bytewise approximate matching algorithms on executable files, evaluating the effectiveness of ssdeep, sdhash, TLSH, and MRSHv2 across various scenarios, where approximate matching seems to be a natural tool to employ. Previous works already underlined that approximate matching is often used for tasks where the algorithms have not been thoroughly and systematically evaluated. Pagani et al. (2018), in particular, highlighted the shortcomings of previous research and tried to improve current knowledge about the applicability of approximate matching in the context of executable files by evaluating typical use cases. We extend their work by taking a closer look at further common scenarios that are not covered in their article. Specifically, we examine use cases such as different versions of the same software and comparisons between on-disk and in-memory representations of the same program, both for malicious and benign software. Our findings reveal that the considered algorithms’ performance across all evaluated scenarios was generally unsatisfactory. Notably, they struggle with size-related and localized modifications introduced during the loading stage. Furthermore, executables with no functional similarity may be mismatched due to shared byte-level similarity caused by embedded resources or inherent to certain programming languages or runtime environments. Consequently, these algorithms should be used cautiously and regarded as assisting tools rather than reliable methods for indicating similarity between executable files, as both false positives and false negatives can occur, and users should be aware of them. Moreover, while some of the unfavored results stem from design decisions, we observed unexpected behavior in some experiments that we could trace back to issues in the reference implementations of the algorithms. After fixing the implementations, the strange effects in our results indeed disappeared. It is still an open question if and to what extent previous experiments and evaluations were affected by these issues.
Author(s)
Jakobs, Carlo
Fraunhofer-Institut für Kommunikation, Informationsverarbeitung und Ergonomie FKIE  
Mahr, Axel
Fraunhofer-Institut für Kommunikation, Informationsverarbeitung und Ergonomie FKIE  
Lambertz, Martin  
Fraunhofer-Institut für Kommunikation, Informationsverarbeitung und Ergonomie FKIE  
Rybalka, Mariia
Fraunhofer-Institut für Kommunikation, Informationsverarbeitung und Ergonomie FKIE  
Plohmann, Daniel  
Fraunhofer-Institut für Kommunikation, Informationsverarbeitung und Ergonomie FKIE  
Journal
Forensic Science International Digital Investigation  
Open Access
DOI
10.1016/j.fsidi.2025.301927
Additional link
Full text
Language
English
Fraunhofer-Institut für Kommunikation, Informationsverarbeitung und Ergonomie FKIE  
Keyword(s)
  • Bytewise approximate matching

  • Executable files

  • Fuzzy hashing

  • MRSHv2

  • Sdhash

  • Similarity hashing

  • Ssdeep

  • TLSH

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024