Projects per year
Abstract
The general aim of the thesis was to contribute to the improvement of data analytical techniques within the chemometric field. Regardless the multivariate structure of the data, it is still common in some fields to perform univariate data analysis using only simple statistics such as sample mean and variance. Recent instrumental developments in chemometrics often result in highorder data, for which univariate tools do not suffice and multivariate data analysis is required. Moreover, many multivariate models assume normality of the residuals (which in many cases is far from reality) and are not resistant towards outliers (which are known to be more the rule than the exception for empirical data). That is the reason for robust methods being a valuable tool for both semiautomated detection of outliers and model building.
The approach adapted in this thesis, can be split in two main parts: 1. applying a multivariate and multiway data analytical framework in fields where less
sophisticated data analysis methods are currently used, and 2. developing new, more robust alternatives to already existing multivariate tools.
The first part of the study was realised by applying two and threeway chemometrical methods, such as PCA and PARAFAC models for analysing spatial
and depth profiles of sea water samples, defined by three data modes: depth, variables and geographical location. Emphasis was also put on predicting fluorescence values, as being a natural measure of biological activity, by applying and comparing the Partial Least Squares (PLS) regression technique with its multiway alternative, NPLS. Results of the analysis indicated superiority of the threeway framework, potentially constituting a novel assessment of the sea water measurements. Particularly in the case of regression models there is a clear preference towards the more complex model, delivering more reliable predictions than a classical 2way PLS. Therefore, using multiway data analysis tools is recommended, in order to extract the full information from multiway data structures.
The second part of the thesis targeted qualitative properties of the analysed data. The broad theoretical background of robust procedures was given as a very useful supplement to the classical methods, and a new tool, based on robust PCA, aiming at identifying Rayleigh and Raman scatters in excitationmission (EEM) data was developed. The results show clearly that robust methods can significantly contribute to the improvement of existing analytical techniques used commonly in chemometrics, for example by providing excellent outlier detection tools. It is therefore advised to apply robust and classical procedures simultaneously, at least to determine if contamination in the data is present. For this becoming a standard procedure, further work is required, aiming at implementing reliable robust algorithms into standard statistical programs.
The approach adapted in this thesis, can be split in two main parts: 1. applying a multivariate and multiway data analytical framework in fields where less
sophisticated data analysis methods are currently used, and 2. developing new, more robust alternatives to already existing multivariate tools.
The first part of the study was realised by applying two and threeway chemometrical methods, such as PCA and PARAFAC models for analysing spatial
and depth profiles of sea water samples, defined by three data modes: depth, variables and geographical location. Emphasis was also put on predicting fluorescence values, as being a natural measure of biological activity, by applying and comparing the Partial Least Squares (PLS) regression technique with its multiway alternative, NPLS. Results of the analysis indicated superiority of the threeway framework, potentially constituting a novel assessment of the sea water measurements. Particularly in the case of regression models there is a clear preference towards the more complex model, delivering more reliable predictions than a classical 2way PLS. Therefore, using multiway data analysis tools is recommended, in order to extract the full information from multiway data structures.
The second part of the thesis targeted qualitative properties of the analysed data. The broad theoretical background of robust procedures was given as a very useful supplement to the classical methods, and a new tool, based on robust PCA, aiming at identifying Rayleigh and Raman scatters in excitationmission (EEM) data was developed. The results show clearly that robust methods can significantly contribute to the improvement of existing analytical techniques used commonly in chemometrics, for example by providing excellent outlier detection tools. It is therefore advised to apply robust and classical procedures simultaneously, at least to determine if contamination in the data is present. For this becoming a standard procedure, further work is required, aiming at implementing reliable robust algorithms into standard statistical programs.
Original language  English 

Place of Publication  Kgs. Lyngby 

Publisher  Technical University of Denmark 
Number of pages  174 
Publication status  Published  2012 
Series  IMMPHD2012 

Number  284 
ISSN  09093192 
Projects
 1 Finished

Analysis and Modelling of Chain Data
Kotwa, E. K., Brockhoff, P. B., Kulahci, M., Rinnan, Å. & Westad, F. O.
01/10/2007 → 05/09/2014
Project: PhD