A multi-codec audio dataset for codec analysis and tampering detection
In this paper, a multi-codec tampered dataset is presented. The doctored speech content does not contain audible artifacts or changes of semantic meanings, but tampered regions which can be detected via lossy compression traces, e.g. using framing grid analysis. Possible applications, content and annotations included, and the steps required to generate the dataset are described. The dataset can be accessed online and is meant to be used for evaluation purposes by anyone interested. Moreover, the creation of derived modified datasets is encouraged, and supported by the choice of a respective Creative Commons license.