Options
2016
Master Thesis
Title
Version specific recognition of libraries in binaries for identifying vulnerabilities caused by these using the example of ARM-32
Abstract
Third party libraries used in applications pose an invisible risk for customers using these applications. When security vulnerabilities in libraries are detected, the library manufacturer should react and provide patches for the vulnerability. The customer using the application has to install the provided patches. These responsibilities are often neglected leading to applications remaining vulnerable. The library identification is a task performed on applications where the source code is not available. This is often the case when scanning software repositories such as the Google Playstore for Android with over 2 Million applications. The applications found in the Playstore frequently contain ARM-32 native code. One of the most well known examples of a severe vulnerability in a third party library is the Heartbleed vulnerability in all OpenSSL versions until 1.0.1f. By performing library detection on many applications, an automatic report can be generated when a library containing a known vulnerability has been found. Library detection can also be used for various other use cases including the detection of copyright violations and reverse engineering. In this thesis methods have been developed to identify libraries in ARM-32 binaries. ARM CPU chips are increasing in numbers since smartphones became popular. The market analysis of Bloomberg states that 99% of the worlds smartphones and tables are shipped with an ARM CPU. ARM has an equivalent impact in the growing Internet of Things (IoT) market which makes library detection on this very specific platform important. The methods presented in this thesis are based on signatures derived from static code analysis. They are designed to be able to identify a specific version of a library for distinguishing between a library containing a vulnerability and a newer patched version of the same library. The proposed identification system also performs the identification of specific library functions in the ARM-32 binaries as well as the identification of the callers of library functions. The signature comparison system is based on set theory. The way signatures are constructed allows for the identification of libraries which are only included partially in the binary. Partial inclusion is caused through compiler optimizations or by dead code elimination through the linker. A core problem when having signature based schemes on native code level is the presence of different compilers. Different compilers such as GCC and CLANG often produce completely different machine language instructions for the same high level language source code. Signatures generated from such binaries might differ depending on the used compiler. When comparing a library signature against a binary which contains the same library generated by another compiler the comparison often fails. This thesis provides solutions to generate signatures of libraries which are similar to signatures generated from the same libraries compiled with different compilers. The thesis investigated different features for signature generation and provides solution for binaries generated with different compilers. Some features ar e Mnemonic based providing a very accurate result when the compiler is known beforehand. Other features are designed in such a way that they are independent of the used compiler. The investigated and developed features have been evaluated in different environments with different challenges. The correctness in terms of identification has been tested in carefully created examples as well as on real world applications. The last part of the evaluation involves handpicked examples where the analysis result has been verified manually by reverse engineering. This thesis presents a generically usable signature based system for library identification. The used features are suitable for the library identification for binaries compiled for the ARM-32 CPU architecture. Solutions for general problems such as compiler independence and partial inclusion of libraries in programs are addressed specifically.
Thesis Note
Darmstadt, TU, Master Thesis, 2016
Publishing Place
Darmstadt