A distributed stream-processing infrastructure for computational models
Decision support systems (DSS) that rely on time-sensitive information are demanding on the integration of computational models. Scientific models are commonly developed and tested with offline data coming from files and databases, but in a real-time DSS models have to deal with low-latency data streams, transmission faults and other imperfections. In practice, models need to process data from multiple data streams and various formats and require mechanisms to deal with delayed, missing and out-of-order data. It is desirable to handle data adaption, fault tolerance and other bookkeeping in a robust framework and allow domain experts to implement computational models in a mathematical language such as R, MATLAB or Fortran. We present a platform that allows modellers to deploy R scripts and execute then in a distributed environment with online data. The platform is written in Java, dynamically sets up R sessions on distributed computers, manages the execution and deals with the input/output of models. An adapter strategy makes it possible to change data sources and formats without affecting the implementation of the computational model. In addition, fail-over mechanisms are implemented to guarantee processing in the face of a hardware or software fault. In summary, the platform enables domain experts to implement concise computational models in a mathematical programming language (R), to test them offline in their accustomed environment and then to let them run online without modification in a fault-tolerant, distributed system. New models can therefore be easily added and the results are immediately usable by a real-time DSS.