Now showing 1 - 10 of 16
  • Publication
    Simplex Volume Maximization (SiVM): A matrix factorization algorithm with non-negative constrains and low computing demands for the interpretation of full spectral X-ray fluorescence imaging data
    ( 2017)
    Alfeld, M.
    ;
    ; ; ;
    Snickt, G. van der
    ;
    Noble, P.
    ;
    Janssens, K.
    ;
    Wellenreuther, G.
    ;
    Falkenberg, G.
    Technological progress allows for an ever-faster acquisition of hyperspectral data, challenging the users to keep up with interpreting the recorded data. Matrix factorization, the representation of data sets by bases (or loads) and coefficient (or score) images is long used to support the interpretation of complex data sets. We propose in this publication Simplex Volume Maximization (SiVM) for the analysis of X-ray fluorescence (XRF) imaging data sets. SiVM selects archetypical data points that represents the data set and thus provides easily understandable bases, preserves the non-negative character of XRF data sets and has low demands concerning computing resources. We apply SiVM on an XRF data set of Hans Memling's Portrait of a man from the Lespinette family from the collection of the Mauritshuis (The Hague, NL) and discuss capabilities and shortcomings of SiVM.
  • Publication
    Feeding the world with big data: Uncovering spectral characteristics and dynamics of stressed plants
    ( 2016) ; ; ;
    Mahlein, A.-K.
    ;
    Steiner, U.
    ;
    Oerke, E.-C.
    ;
    Römer, C.
    ;
    Plümer, Lutz
    Modern communication, sensing, and actuator technologies as well as methods from signal processing, pattern recognition, and data mining are increasingly applied in agriculture, ultimately helping to meet the challenge of ""How to feed a hungry world?"" Developments such as increased mobility, wireless networks, new environmental sensors, robots, and the computational cloud put the vision of a sustainable agriculture for anybody, anytime, and anywhere within reach. Unfortunately, data-driven agriculture also presents unique computational problems in scale and interpretability: (1) Data is gathered often at massive scale, and (2) researchers and experts of complementary skills have to cooperate in order to develop models and tools for data intensive discovery that yield easy-to-interpret insights for users that are not necessarily trained computer scientists.
  • Publication
    Non-negative matrix factorization for the near real-time interpretation of absorption effects in elemental distribution images acquired by X-ray fluorescence imaging
    ( 2016)
    Alfeld, M.
    ;
    ; ; ;
    Wellenreuther, G.
    ;
    Barriobero-Vila, P.
    ;
    Requena, G.
    ;
    Boesenberg, U.
    ;
    Falkenberg, G.
    Elemental distribution images acquired by imaging X-ray fluorescence analysis can contain high degrees of redundancy and weakly discernible correlations. In this article near real-time non-negative matrix factorization (NMF) is described for the analysis of a number of data sets acquired from samples of a bi-modal alpha + beta Ti-6Al-6V-2Sn alloy. NMF was used for the first time to reveal absorption artefacts in the elemental distribution images of the samples, where two phases of the alloy, namely alpha and beta, were in superposition. The findings and interpretation of the NMF results were confirmed by Monte Carlo simulation of the layered alloy system. Furthermore, it is shown how the simultaneous factorization of several stacks of elemental distribution images provides uniform basis vectors and consequently simplifies the interpretation of the representation.
  • Publication
    Plant phenotyping using probabilistic topic models: Uncovering the hyperspectral language of plants
    ( 2016) ;
    Mahlein, A.-K.
    ;
    ;
    Steiner, U.
    ;
    Oerke, E.-C.
    ;
    Modern phenotyping and plant disease detection methods, based on optical sensors and information technology, provide promising approaches to plant research and precision farming. In particular, hyperspectral imaging have been found to reveal physiological and structural characteristics in plants and to allow for tracking physiological dynamics due to environmental effects. In this work, we present an approach to plant phenotyping that integrates non-invasive sensors, computer vision, as well as data mining techniques and allows for monitoring how plants respond to stress. To uncover latent hyperspectral characteristics of diseased plants reliably and in an easy-to-understand way, we "wordify" the hyperspectral images, i.e., we turn the images into a corpus of text documents. Then, we apply probabilistic topic models, a well-established natural language processing technique that identifies content and topics of documents. Based on recent regularized topic models, we demonstrate that one can track automatically the development of three foliar diseases of barley. We also present a visualization of the topics that provides plant scientists an intuitive tool for hyperspectral imaging. In short, our analysis and visualization of characteristic topics found during symptom development and disease progress reveal the hyperspectral language of plant diseases.
  • Publication
    Metro maps of plant disease dynamics-automated mining of differences using hyperspectral images
    ( 2015) ;
    Mahlein, A.-K.
    ;
    ;
    Steiner, U.
    ;
    Oerke, E.-C.
    ;
    Understanding the response dynamics of plants to biotic stress is essential to improve management practices and breeding strategies of crops and thus to proceed towards a more sustainable agriculture in the coming decades. In this context, hyperspectral imaging offers a particularly promising approach since it provides non-destructive measurements of plants correlated with internal structure and biochemical compounds. In this paper, we present a cascade of data mining techniques for fast and reliable data-driven sketching of complex hyperspectral dynamics in plant science and plant phenotyping. To achieve this, we build on top of a recent linear time matrix factorization technique, called Simplex Volume Maximization, in order to automatically discover archetypal hyperspectral signatures that are characteristic for particular diseases. The methods were applied on a data set of barley leaves (Hordeum vulgare) diseased with foliar plant pathogens Pyrenophora teres, Puccinia hordei and Blumeria graminis hordei. Towards more intuitive visualizations of plant disease dynamics, we use the archetypal signatures to create structured summaries that are inspired by metro maps, i.e. schematic diagrams of public transport networks. Metro maps of plant disease dynamics produced on several real-world data sets conform to plant physiological knowledge and explicitly illustrate the interaction between diseases and plants. Most importantly, they provide an abstract and interpretable view on plant disease progression.
  • Publication
    Non-negative factor analysis supporting the interpretation of elemental distribution images acquired by XRF
    ( 2014)
    Alfeld, M.
    ;
    ; ; ;
    Wellenreuther, G.
    ;
    Falkenberg, G.
    Stacks of elemental distribution images acquired by XRF can be difficult to interpret, if they contain high degrees of redundancy and components differing in their quantitative but not qualitative elemental composition. Factor analysis, mainly in the form of Principal Component Analysis (PCA), has been used to reduce the level of redundancy and highlight correlations. PCA, however, does not yield physically meaningful representations as they often contain negative values. This limitation can be overcome, by employing factor analysis that is restricted to non-negativity. In this paper we present the first application of the Python Matrix Factorization Module (pymf) on XRF data. This is done in a case study on the painting Saul and David from the studio of Rembrandt van Rijn. We show how the discrimination between two different Co containing compounds with minimum user intervention and a priori knowledge is supported by Non-Negative Matrix Factorization (NMF).
  • Publication
    Early drought stress detection in cereals: Simplex volume maximization for hyperspectral image analysis
    ( 2012)
    Römer, Christoph
    ;
    ;
    Ballvora, Agim
    ;
    Pinto, Francisco
    ;
    Rossini, Micol
    ;
    Cinzia, Panigada
    ;
    Behmann, Jan
    ;
    Léon, Jens
    ;
    ; ; ;
    Rascher, Uwe
    ;
    Plümer, Lutz
    Early water stress recognition is of great relevance in precision plant breeding and production. Hyperspectral imaging sensors can be a valuable tool for early stress detection with high spatio-temporal resolution. They gather large, high dimensional data cubes posing a significant challenge to data analysis. Classical supervised learning algorithms often fail in applied plant sciences due to their need of labelled datasets, which are difficult to obtain. Therefore, new approaches for unsupervised learning of relevant patterns are needed. We apply for the first time a recent matrix factorisation technique, simplex volume maximisation (SiVM), to hyperspectral data. It is an unsupervised classification approach, optimised for fast computation of massive datasets. It allows calculation of how similar each spectrum is to observed typical spectra. This provides the means to express how likely it is that one plant is suffering from stress. The method was tested for drought stress, applied to potted barley plants in a controlled rain-out shelter experiment and to agricultural corn plots subjected to a two factorial field setup altering water and nutrient availability. Both experiments were conducted on the canopy level. SiVM was significantly better than using a combination of established vegetation indices. In the corn plots, SiVM clearly separated the different treatments, even though the effects on leaf and canopy traits were subtle.
  • Publication
    Descriptive matrix factorization for sustainability Adopting the principle of opposites
    Climate change, the global energy footprint, and strategies for sustainable development have become topics of considerable political and public interest. The public debate is informed by an exponentially growing amount of data and there are diverse partisan interest when it comes to interpretation. We therefore believe that data analysis methods are called for that provide results which are intuitively understandable even to non-experts. Moreover, such methods should be efficient so that non-experts users can perform their own analysis at low expense in order to understand the effects of different parameters and influential factors. In this paper, we discuss a new technique for factorizing data matrices that meets both these requirements. The basic idea is to represent a set of data by means of convex combinations of extreme data points. This often accommodates human cognition. In contrast to established factorization methods, the approach presented in this paper can al so determine over-complete bases. At the same time, convex combinations allow for highly efficient matrix factorization. Based on techniques adopted from the field of distance geometry, we derive a linear time algorithm to determine suitable basis vectors for factorization. By means of the example of several environmental and developmental data sets we discuss the performance and characteristics of the proposed approach and validate that significant efficiency gains are obtainable without performance decreases compared to existing convexity constrained approaches.
  • Publication
    Pre-symptomatic prediction of plant drought stress using Dirichlet-aggregation regression on hyperspectral images
    ( 2012) ; ; ; ; ;
    Roemer, C.
    ;
    Ballvora, A.
    ;
    Rascher, U.
    ;
    Leon, J.
    ;
    Plümer, L.
    Pre-symptomatic drought stress prediction is of great relevance in precision plant protection, ultimately helping to meet the challenge of "How to feed a hungry world?". Unfortunately, it also presents unique computational problems in scale and interpretability: it is a temporal, large-scale prediction task, e.g., when monitoring plants over time using hyperspectral imaging, and features are 'things' with a 'biological' meaning and interpretation and not just mathematical abstractions computable for any data. In this paper we propose Dirichlet-aggregation regression (DAR) to meet the challenge. DAR represents all data by means of convex combinations of only few extreme ones computable in linear time and easy to interpret. Then, it puts a Gaussian process prior on the Dirichlet distributions induced on the simplex spanned by the extremes. The prior can be a function of any observed meta feature such as time, location, type of fertilization, and plant species. We evaluate d DAR on two hyperspectral image series of plants over time with about 2 (resp. 5.8) Billion matrix entries. The results demonstrate that DAR can be learned efficiently and predicts stress well before it becomes visible to the human eye.
  • Publication
    Simplex distributions for embedding data matrices over time
    ( 2012) ; ;
    Römer, C.
    ;
    ;
    Ballvora, A.
    ;
    Rascher, U.
    ;
    Leon, J.
    ;
    ;
    Plümer, L.
    Early stress recognition is of great relevance in precision plant protection. Pre-symptomatic water stress detection is of particular interest, ultimately helping to meet the challenge of "How to feed a hungry world?". Due to the climate change, this is of considerable political and public interest. Due to its large-scale and temporal nature, e.g., when monitoring plants using hyperspectral imaging, and the demand of physical meaning of the results, it presents unique computational problems in scale and interpretability. However, big data matrices over time also arise in several other real-life applications such as stock market monitoring where a business sector is characterized by the ups and downs of each of its companies per year or topic monitoring of document collections. Therefore, we consider the general problem of embedding data matrices into Euclidean space over time without making any assumption on the generating distribution of each matrix. To do so, we repre sent all data samples by means of convex combinations of only few extreme ones computable in linear time. On the simplex spanned by the extremes, there are then natural candidates for distributions inducing distances between and in turn embeddings of the data matrices. We evaluate our method across several domains, including synthetic, text, and financial data as well as a large-scale dataset on water stress detection in plants with more than 3 billion matrix entries. The results demonstrate that the embeddings are meaningful and fast to compute. The stress detection results were validated by a domain expert and conform to existing plant physiological knowledge.