Automated extraction of behavioural profiles from document usage
Both human analysts and particularly automated tool suites are capable of deriving sensitive information and conclusions from collections of data items that individually cannot be considered critical or sensitive. This activity of analysing and correlating material that is not immediately related is, in fact, highly desirable in many application areas and cannot be controlled precisely in advance. The decision whether a program or an analyst is performing searches and correlations beyond the scope of his authorisation or current mission can frequently be determined only ex post based on a heuristic analysis of documents accessed. In this paper we describe a mechanism for the instrumentation of operating systems to obtain information on the documents and resources accessed by arbitrary processes. Such a mechanism could be an important component of the infrastructure of an operational risk management system, generating an audit trail for compliance and forensic investigation, and acting as a sensor generating data for analysis. Addressing the latter application, the paper also outlines an approach for extracting textual information and metadata from accessed documents, regardless of the application program and workflow mechanisms used, without unduly impeding either workflows or operator performance. This information can then be subjected to an heuristic analysis based on natural language processing to extract the semantic context of each document or segment. Clustering this content and extracting the conceptual patterns that a user has accessed can then allow abnormal behaviour to be identified. This can then be refined further to determine heuristically whether the authorised remit of the user has been breached and whether an investigation is warranted. We argue that the risk of misbehaviour can be reduced while at the same time increasing productivity. This is made possible by enhancing the degree of freedom for individual users to act in the interest of their mission objectives and at the same time providing automated mechanisms for analysing user behaviour.